# WEB SCRAPING FOR BEST SUMMER SPOTS BY TRIPADVISOR

### <font color="green">Travel planning and booking site TripAdvisor recently released its 2017 Summer Vacation Value Report, naming the top 50 U.S. destinations for this summer. Destinations with major increases in hotel booking interest made it to the top of the list, and the average weeklong expense also factored into a destination’s position in the ranking. Nine of the top 10 destinations are by the water, and a grand total of 38 of the 50 destinations are waterfront, either on a lake or on the sea. If you needed a sign to book your beach vacation now, this might be it!</font>

### <font color="red">Overview about Selenium and how it is useful!</font>

### Selenium is used for scraping the website with Python3. It runs a webkit (Chrome/Safari), gecko (Firefox) or edge (MS Explorer) web browser and executes instructions on it. It supports XPath and CSS selectors. It works over computed DOM and so mimics what one finds in the DOM inspector of a web browser's dev tools. It also supports interaction and AJAX-derived scraping (without having to reverse engineer the AJAX calls). Additionally,  it is more-or-less cross-language API equivalence (i.e. the lessons are more portable).

### <font color="red">Project starts here..</font>

#### <font color="blue">First things first.. Import statements!</font>

In [1]:
from selenium import webdriver
import pandas as pd

#### <font color="blue">Now, initialize the web driver for Chrome/Firefox. I have used Chrome!</font>

In [3]:
# The path should be changed in accordance with your system.
pathToChromeDriver = '/Users/SRIRAM VETURI/Desktop/chromedriver_win32/chromedriver.exe'
browser = webdriver.Chrome(executable_path=pathToChromeDriver)

#### <font color="blue">Selecting the elements by their XPath!</font>

In [4]:
# Site to be scraped for data
browser.get('https://www.coastalliving.com/travel/top-10/tripadvisor-best-summer-destinations')
# For the below line, you double quotes!
# Using single quotes would give you some error.
locationsList = browser.find_elements_by_xpath("//div[@class='padded']/h2/a")
locationsInfo = browser.find_elements_by_xpath("//div[@class='padded']/p")

#### <font color="blue">Make some data list initializations!</font>

In [6]:
# Three lists to indicate three different types of information.
datasetOne = []
datasetTwo = []
datasetThree = []

#### <font color="blue">Now, let's fetch the required information!</font>

In [7]:
# Fetching information
for loc in locationsList:
    entries = (loc.text.encode('ascii', 'replace'), loc.get_attribute('href').encode('ascii', 'replace'))
    datasetOne.append(entries)

# Fetching information
for info in range(4, len(locationsInfo)):
    if info > 23:
        break
    else:
        if info % 2 == 0:
            information = (locationsInfo[info].text.encode('ascii','replace'))
            datasetTwo.append(information)
        else:
            offer = (locationsInfo[info].text.encode('ascii','ignore'))
            datasetThree.append(offer)

#### <font color="blue">Now that we have the lists, we can convert them into pandas dataframes as follows..</font>

In [8]:
dfOne = pd.DataFrame(datasetOne, columns=['PLACES', 'LINKS FOR MORE INFORMATION'])
dfTwo = pd.DataFrame(datasetTwo, columns=['AVERAGE WEEKLY EXPENSE PER PERSON'])
dfThree = pd.DataFrame(datasetThree, columns=['BEST DEALS SUGGESTION'])

#### <font color="blue">Let's process the "PLACES" column!</font>

In [9]:
for x in dfOne['PLACES']:
    byteStream = str(x)
    new_x = byteStream[2:-1]
    dfOne.replace(x, new_x, inplace=True)

#### <font color="blue">Let's process the "LINKS FOR MORE INFORMATION" column!<font>

In [10]:
for x in dfOne['LINKS FOR MORE INFORMATION']:
    byteStream = str(x)
    new_x = byteStream[2:-1]
    dfOne.replace(x, new_x, inplace=True)

#### <font color="blue">Let's process the "AVERAGE WEEKLY EXPENSE PER PERSON" column!</font>

In [11]:
for x in dfTwo['AVERAGE WEEKLY EXPENSE PER PERSON']:
    byteStream = str(x)
    new_x = byteStream[39:-1]
    dfTwo.replace(x, new_x, inplace=True)

#### <font color="blue">Now, let's split the "BEST DEALS SUGGESTION" into two sub dataframes as follows..</font>

In [13]:
strToBeDeletedList = []
finalByteStreamlist = []
bestTime = pd.DataFrame(columns=['BEST TIME TO VISIT'])
bestDeal = pd.DataFrame(columns=['BEST SAVINGS DEAL'])

#### <font color="blue">Processing and making splitted dataframes for the "BEST DEALS SUGGESTION" dataframe!</font>

In [14]:
# Processing 'BEST DEALS SUGGESTION' column
for x in dfThree['BEST DEALS SUGGESTION']:
    byteStream = str(x)
    clearedByteStream = byteStream.replace("Least expensive summer week to go, and the cost: ", "")
    ind = clearedByteStream.index('$')
    strToBeDeleted = clearedByteStream[ind-2:]
    finalByteStream = clearedByteStream.replace(strToBeDeleted, "")
    bestTimeVisit = strToBeDeleted[2:-1]
    strToBeDeletedList.append(bestTimeVisit)
    bestDealVisit = finalByteStream[2:]
    finalByteStreamlist.append(bestDealVisit)

time = pd.Series(strToBeDeletedList)
deal = pd.Series(finalByteStreamlist)
bestTime['BEST TIME TO VISIT'] = deal.values
bestDeal['BEST SAVINGS DEAL'] = time.values

#### <font color="blue">Finally, join all the individual dataframes and export it to a 'csv' file as follows..</font>

In [15]:
df = pd.DataFrame()
df = dfOne.join(dfTwo.join(bestTime.join(bestDeal, lsuffix=bestTime, rsuffix=bestDeal), lsuffix=dfTwo, rsuffix=bestTime), lsuffix=dfOne, rsuffix=dfTwo)
df.to_csv('best_locations.csv')

### <font color="green">Now, we have a 'csv' file in the same working directory. Go ahead and have a look at the list of top summer spots which is guaranteed to have at least one destination perfect for you. (Spoiler: Most of them are on the coast!). Hope you find yours!!</font>