# Scraping TripAdvisor Reviews using Selenium 

In this project, we will be scraping the reviews from hotel guests on TripAdvisor for Row NYC Hotel in New York City. There are a total of 9040 reviews and the date of review, review title, review as well as ratings will be extracted from the webpage. 
This is done through the use of Selenium for web scraping. 

The flow of this project would be:
1. Import relevant libraries 
2. Extract the individual elements through Selenium 
3. Build web scraper to automate the scraping of 9040 reviews 

## 1. Import relevant libraries 

In [1]:
import time
import pandas as pd
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager


Open a chrome driver in incognito mode and access the URL of the webpage we are going to srape reviews from. 

In [2]:
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=options)



Current google-chrome version is 101.0.4951
Get LATEST chromedriver version for 101.0.4951 google-chrome
Trying to download new driver from https://chromedriver.storage.googleapis.com/101.0.4951.41/chromedriver_win32.zip
Driver has been saved in cache [C:\Users\yeosi\.wdm\drivers\chromedriver\win32\101.0.4951.41]
  driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=options)
  driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=options)


In [3]:
# Access webpage that we want to scrape from
driver.get('https://www.tripadvisor.com/Hotel_Review-g60982-d209422-Reviews-Hilton_Waikiki_Beach-Honolulu_Oahu_Hawaii.html#REVIEWS')

## 2. Extract the individual elements through Selenium

Each review has a 'read more' button. To expand the view and extract full review, the 'read more' button has to be clicked. 

In [4]:
# To click on read more button
driver.find_elements(By.XPATH, "//span[@class='eljVo _S Z']")[0].click()

In [5]:
# To get the reviews 
review = driver.find_elements(By.XPATH, "//q[@class='XllAv H4 _a']")

In [6]:
#Check if the extraction of review is successfully extracted
review[0].text

'My husband was traveling for work last month and this was the perfect corporate hotel. We stayed at the Hilton Waikiki Beach for 2 amazing weeks. Its in a great location. Close enough to all the action but also wasn’t in the middle of Waikiki and all the noise. The hotel staff, valet team, housekeeping went above and beyond for us. The hotel is clean and the yoga instructor- Justin was amazing and probably one of the best I have ever practiced yoga with and I practice almost daily back home in TX. There’s an ABC Store right across the street which made so convenient if you needed anything. We had an incident after our first week of being there, there was guests on our floor, arguing(drunk, maybe) in the hallway at 2am and woke us up, we understand this isn’t the hotels control to have loud tourists but the next day we mentioned it to the front desk and she went above and beyond and upgraded us from the 6th floor to an ocean view on the 36th floor! We aren’t expecting this but it sure 

In [7]:
# To get the dates 
dates = driver.find_elements(By.XPATH,"//span[@class = 'euPKI _R Me S4 H3']" )
dates[0].text[14:]

'March 2022'

In [8]:
# To get the review title 
title = driver.find_elements(By.XPATH,"//div[contains(@data-test-target, 'review-title')]")
title[0].text

'Great Location & So much ALOHA'

In [9]:
#To get ratings 
ratings = []
    
for div in driver.find_elements_by_xpath("//div[@class = 'emWez F1']"):
    elements = div.find_elements_by_tag_name('span')
    for i in range(len(elements)):
        ratings.append(elements[i].get_attribute('class'))

ratings


  for div in driver.find_elements_by_xpath("//div[@class = 'emWez F1']"):


['ui_bubble_rating bubble_50',
 'ui_bubble_rating bubble_50',
 'ui_bubble_rating bubble_50',
 'ui_bubble_rating bubble_50',
 'ui_bubble_rating bubble_10',
 'ui_bubble_rating bubble_30',
 'ui_bubble_rating bubble_20',
 'ui_bubble_rating bubble_40',
 'ui_bubble_rating bubble_20',
 'ui_bubble_rating bubble_50']

## 3.  Build web scraper to automate the scraping of 9040 reviews

In [4]:
Title = []
Date = []
Review = []
ratings = []
pages_to_scrape = 905

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
driver.get('https://www.tripadvisor.com/Hotel_Review-g60763-d1938661-Reviews-Row_NYC_Hotel-New_York_City_New_York.html')

#Loop through the pages to scrape     
for page in range(1, pages_to_scrape):
        print(f"page {page}")
        time.sleep(5)
        element_list = driver.find_elements(By.XPATH, '//span[(@class ="fmBIl _S Nc")]') # Click on'Read More' button to expand each review to get full review
        
        if len(element_list) > 0:
            print(f'there is an element: {element_list}')
            driver.execute_script("arguments[0].click();", element_list[0])
        else:
            print('theres no element')
        time.sleep(2)
            
    # To get the reviews 
        review = driver.find_elements(By.XPATH,"//q[@class = 'XllAv H4 _a']")
        for i in range(len(review)):
            Review.append(review[i].text)
        
    #To get review title
        title = driver.find_elements(By.XPATH,"//div[contains(@data-test-target, 'review-title')]")
        for j in range(len(title)):
            Title.append(title[j].text)
  
    # To get the dates 
        dates = driver.find_elements(By.XPATH,"//span[@class = 'euPKI _R Me S4 H3']" )
        for k in range(len(dates)):
            Date.append(dates[k].text)
    
    # To get ratings 
        for div in driver.find_elements(By.XPATH,"//div[@class = 'emWez F1']"):
            elements = div.find_elements_by_tag_name('span')
            for i in range(len(elements)):
                ratings.append(elements[i].get_attribute('class'))
            
     # Looping through mutiple pages        
        driver.implicitly_wait(10)
        if page <4:
            driver.find_elements (By.XPATH,f'//*[@id="component_16"]/div/div[3]/div[13]/div/div/a[{page}]')[0].click()
        else:
            driver.implicitly_wait(10)  
            driver.find_elements (By.XPATH,f'//*[@id="component_16"]/div/div[3]/div[13]/div/div/a[4]')[0].click()
        time.sleep(3)




Current google-chrome version is 101.0.4951
Get LATEST chromedriver version for 101.0.4951 google-chrome
Trying to download new driver from https://chromedriver.storage.googleapis.com/101.0.4951.41/chromedriver_win32.zip
Driver has been saved in cache [C:\Users\yeosi\.wdm\drivers\chromedriver\win32\101.0.4951.41]
  driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)


page 1
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="17a34e62-0db9-4919-8fa2-c7202c24fe0b")>]




page 2
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="24f6fec1-85d3-4b2d-9dfe-d5ed5179d87c")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 3
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="58bb6e9f-9589-4ea6-b3fa-0c5f55308104")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 4
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="54b89d3a-183f-4f80-907e-57540f5deaa8")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 5
there is an element: [<selenium.webdriver.

page 28
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="375bc542-c44a-4e26-982e-62fc309376e5")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 29
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="f097f132-8806-4869-b4b4-7948085be995")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 30
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="25cac2ef-bb99-4f2d-8890-eda803a6d9a2")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 31
there is an element: [<selenium.webdri

page 54
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="0bef7b0c-33d5-4444-a18a-486601a4331c")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 55
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="0eca158a-3b27-42a3-8e2a-d07835cf5e7c")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 56
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="4f09b7b2-7871-4b6a-a2ee-2d02684d9fae")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 57
there is an element: [<selenium.webdri

page 80
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="94dc6fe6-b9b7-4f6f-93b0-85b127fd1a92")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 81
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="13ba21ea-baa6-4a5f-8958-1bf4dbad5435")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 82
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="dd6e0276-f2a3-42be-b036-5cacaebad80a")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 83
there is an element: [<selenium.webdri

page 106
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="93bed471-13a1-4dec-807b-ce135945291b")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 107
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="d26601fe-8515-4ed2-ac5c-778171594b96")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 108
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="3a3e28f7-4879-4036-bb87-05d0033dea28")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 109
there is an element: [<selenium.we

page 132
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="19df9db3-2e5c-4598-8755-1af77b7a7613")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 133
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="b0622084-ebd4-4407-88aa-96af0862d5cd")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 134
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="e6f7296a-4515-4ec5-ba9a-dd211493a581")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 135
there is an element: [<selenium.we

page 158
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="05bc47cf-2f53-4a65-a962-36310f9bfaf4")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 159
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="c4a4efe9-8d21-4e75-8edd-c95ddf991bd7")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 160
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="687fc823-ad96-47cf-9562-e0c5892b3fe2")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 161
there is an element: [<selenium.we

page 184
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="cfa026e5-6f8a-4fbf-b629-9db7cadc80ae")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 185
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="c047e5d5-02c1-4291-8c7e-a7a68df23397")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 186
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="02486c66-397d-4dbe-978d-e4f8ce7230d2")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 187
there is an element: [<selenium.we

page 210
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="662d3470-f17f-466b-94ea-1eccadf2b289")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 211
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5fff4219-4549-4d3a-ae10-f90bb83ad3e3")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 212
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="37243a2e-1191-42cd-8c41-b0232ab220af")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 213
there is an element: [<selenium.we

page 236
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5b0e874b-b3d1-4dd9-ab87-9bf5618474e7")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 237
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="3e8468a8-4ab0-4319-96e7-e9ff5a64e877")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 238
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="0f9663c5-072e-498c-9c05-cbc7a9578905")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 239
there is an element: [<selenium.we

page 262
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="789c0044-90b6-4e5d-8e93-79804807cab0")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 263
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="36fedc9f-edf7-4813-8dce-b27f4dac3285")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 264
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e032110-a606-414f-af1c-d652fe2b8ce7")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 265
there is an element: [<selenium.we

page 288
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="f61800fc-c79c-4251-9092-9ea5743bb5f3")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 289
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="ab3bc207-3db2-49d3-86a7-c6957dba1324")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 290
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="21b88601-091e-4f18-a028-f8d5a508253c")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 291
there is an element: [<selenium.we

page 314
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="ebc9c5f7-dd58-47b3-8251-541d0d7850d4")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 315
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="c48814c1-3eb0-466b-a7b1-bdb2549b0187")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 316
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="d6c127fb-4310-4775-be5f-a8322cba49f2")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 317
there is an element: [<selenium.we

page 340
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="d0d3639a-0396-4f89-97c6-e6fdca91854c")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 341
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="9f5b3be6-91bd-4f2d-8468-932325d3b01d")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 342
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="e66f6259-7ad1-4754-9738-aaecdf5d9de7")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 343
there is an element: [<selenium.we

page 366
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="37a47392-7aec-46e1-9100-8294fc72ae04")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 367
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="3ed9b1b5-1dd3-4bbb-9986-8d26086b56bc")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 368
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="6bc6e510-0b2e-439c-94c2-8d69fdb6778a")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 369
there is an element: [<selenium.we

page 392
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="72eb1409-b11e-490b-9d17-1d267f14e6a9")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 393
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="ec049552-77d4-40d2-99ab-f45ff0b14222")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 394
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="d8a97851-23a6-4f20-bf8d-94e304911d38")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 395
there is an element: [<selenium.we

page 418
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="77c45068-7bde-4b65-b6bb-88c0d3ef9fe4")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 419
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="9b28691a-009a-4bfc-afe7-46618a8aa0fe")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 420
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="37f2f14f-6ef8-4441-9910-cba14785376c")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 421
there is an element: [<selenium.we

page 444
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="e3a636ef-0a50-4696-9620-0f3e4688b51b")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 445
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="76f0ec8a-c4db-46e7-abe3-73ecc85cb5b7")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 446
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="c9d7697c-ae0a-4953-9381-bdbed93b5b44")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 447
there is an element: [<selenium.we

page 470
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="29f099bf-a424-4eac-8ddd-2f65dce441a6")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 471
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="3c01ffb8-fa43-44df-b203-acafa66d6260")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 472
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="347afd85-8fe1-460c-bdb2-bb7996fc588e")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 473
there is an element: [<selenium.we

page 496
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="05ec0ba9-221a-4d71-8bc8-677854834584")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 497
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="6a6d2050-07e0-4881-b57a-0c851ec9af88")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 498
there is an element: [<selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="b08f6d50-989b-45da-b07c-1227c8935c43")>, <selenium.webdriver.remote.webelement.WebElement (session="26e9cfa05e8989f6a2fb8289e3eb02d2", element="5e818139-ed0a-4682-b854-6070770063f1")>]
page 499
there is an element: [<selenium.we

In [31]:
# Complile all the data extracted into a dataframe 
df = pd.DataFrame()
df['Date'] = Date
df['Title'] = Title
df['Review']= Review
df['Rating']= ratings

# Basic cleaning of the columns 
df['Date'] = df['Date'].str.replace('Date of stay:', '')
df['Rating'] = df['Rating'].str.replace('ui_bubble_rating bubble_', '')
df['Rating'] = df['Rating'].str.replace('0', '')

In [32]:
df

Unnamed: 0,Date,Title,Review,Rating
0,April 2022,Nice stay in the heart of NYC,Very good location. Reasonable price. The room...,4
1,December 2021,Modern and Clean,We stayed in this hotel just before Christmas ...,5
2,April 2022,"Mid range hotel, very strategic location",Pleasant staff and security in place. Stayed h...,4
3,April 2022,"Small, cramped","Pricing was okay. Very noisy, small room. The ...",2
4,April 2022,6 days of Terror,"Rooms are filthy, elevators are dangerous and ...",1
...,...,...,...,...
9035,January 2017,"Comfortable, very central hotel",Location was perfect right in the theatre dist...,4
9036,January 2017,"Nice, affordable hotel conveniently located in...","For a place to stay, the hotel was affordable,...",4
9037,January 2017,Nice stay,"Great stay at the Row. Perfect location, staff...",5
9038,January 2017,Good stay!,My partner had her 25th last week and I must s...,4


In [33]:
# save df as CSV for future NLP analysis 
df.to_csv('Reviews TA')