# Data Collection - Trip Reports

This notebook contains the code used to scrape user-generated trail reviews from the Washington Trails Association website.

A list of recent trip reports with their urls is located at https://www.wta.org/go-outside/trip-reports.  I opened this page using Selenium as opposed to something like Requests because Selenium allows us to interact with Javascript objects on the page.  In the following code, I scrape all reports listed on the first page, then repeatedly click through to the next most recent page of reviews to get more urls.  An implicit wait is added to allow the Javascript time to load.

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

In [38]:
import time
url = 'https://www.wta.org/go-outside/trip-reports'
driver = webdriver.Chrome()
driver.implicitly_wait(30)
driver.get(url)
element = driver.find_elements_by_class_name('listitem-title'):
url = element.get_attribute('href')
hike = element.text
with open('trepurls3.csv', 'a') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow([str(url), str(hike)])
time.sleep(7)

for m in range(1,100):   
    link = driver.find_element_by_link_text(str(m))
    link.click()
    driver.implicitly_wait(30)
    time.sleep(10)
    for el in driver.find_elements_by_class_name('listitem-title'):
        url = el.get_attribute('href')
        hike = el.text
        with open('trepurls3.csv', 'a') as csv_file:
            writer = csv.writer(csv_file)
            writer.writerow([str(url), str(hike)])
    time.sleep(7)
print('finished')
driver.quit()

finished


Check the scraped data

In [2]:
import pandas as pd
tripreports = pd.read_csv("trepurls3.csv", header=None)

tripreportnames = ['url', 'hike']

tripreports.columns = tripreportnames
tripreports.head()

Unnamed: 0,url,hike
0,https://www.wta.org/go-hiking/trip-reports/tri...,McClellan Butte
1,https://www.wta.org/go-hiking/trip-reports/tri...,"Nooksack Cirque, Hannegan Pass and Peak, Goat ..."
2,https://www.wta.org/go-hiking/trip-reports/tri...,Annette Lake
3,https://www.wta.org/go-hiking/trip-reports/tri...,Anti-Aircraft Peak
4,https://www.wta.org/go-hiking/trip-reports/tri...,Tahoma Creek Suspension Bridge - Emerald Ridge...


In [3]:
tripreports.describe()

Unnamed: 0,url,hike
count,10026,10026
unique,9448,2398
top,https://www.wta.org/go-hiking/trip-reports/tri...,Snow Lake
freq,4,109


In [4]:
tripreports = tripreports.drop_duplicates()

In [5]:
tripreports.describe()

Unnamed: 0,url,hike
count,9448,9448
unique,9448,2398
top,https://www.wta.org/go-hiking/trip-reports/tri...,Annette Lake
freq,1,99


In [37]:
#close the page where we were just scraping
driver.quit()

## Get the report text and attributes for reports found above

In [84]:
tripreports['url'][:3]

0    https://www.wta.org/go-hiking/trip-reports/tri...
1    https://www.wta.org/go-hiking/trip-reports/tri...
2    https://www.wta.org/go-hiking/trip-reports/tri...
Name: url, dtype: object

In [39]:
from bs4 import BeautifulSoup
import json
import csv
import time

chunk = tripreports['url'][0:10000]

for hike in chunk:
    print('getting data for ' + str(hike))
    driver = webdriver.Chrome()
    driver.get(hike)
    driver.implicitly_wait(30)
    
    page = driver.page_source
    driver.quit()
    
    try:
        soup = BeautifulSoup(page, 'html.parser')
    
        title = soup.find('title').get_text()
    
        body = soup.find(attrs={'id':'tripreport-body-text'}).get_text()
        bodyclean = body.replace(',','')
    
        traillinkline = soup.find(attrs={'class':'documentFirstHeading'})
        traillinkhtml = traillinkline.find('a')
        traillink = traillinkhtml.get('href')
    
        dateclass = soup.find(attrs={'class':'elapsed-time'})
        date = dateclass.attrs['datetime']
        dateclean = date.replace(',','')

        cond = soup.find(attrs={'id':'trip-conditions'}).get_text()
        cond = cond.replace('\n', '')
    
        feat = soup.find(attrs={'id':'trip-features'})
    
        thumbs = soup.find(attrs={'class':'tally-total'}).get_text()
    
        with open('peoplereviews3.csv', 'a') as csv_file:
            writer = csv.writer(csv_file)
            writer.writerow([str(hike), str(title), str(traillink), str(bodyclean), str(dateclean), str(cond), str(feat), str(thumbs)])
        
    except:
        continue
        
print('Finished')        

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-03.1310035353
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.0594537828
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.3001078761
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.4718943123
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.6848981258
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.4353379591
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.5699633483
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.6921077571
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.7124474372
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.1554669329
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.2139763215
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.8193123813
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.8130945621
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-22.1360729571
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-20.2564300255
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-11.3785831788
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-11.2937131566
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-03.7764554112
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.5497333436
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.0082951093
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-05.6320062624
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-10.3311373888
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-06.9249823061
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-28.9213634610
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-08.6659441809
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-07.8827889404
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-10.1818657405
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-09.0227265723
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-09.6765553518
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-08.9924911772
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.8849299567
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.9090287975
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.0338743056
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-30.6035595703
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-30.9434758821
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-30.4353347584
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-30.7399058784
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-30.8036134044
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-06.4739738965
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-05.5172605510
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-06.7998070927
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-03.4303743862
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-03.8493562769
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.4764321885
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.2593592687
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-05.6605422168
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-04.4454466862
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-03.0515693741
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.0212362271
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.5731219908
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-29.6502666893
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-28.6958828674
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-28.0080248121
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-17.5686349821
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-15.8277122830
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-09.2972282959
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-03.8374164247
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.5792739351
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-30.6446037847
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-29.5831322623
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-26.3693323920
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-26.3051743418
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-11.8872346418
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.1453647215
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-28.1997426499
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-28.0655937346
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-28.7922477347
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.0372233801
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.1034115086
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.1597239581
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.4485675562
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-29.5300301064
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.8898847338
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.5082730336
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.8025398538
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.1327948241
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-26.0794801613
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.7174297931
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.4816084586
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.1132923978
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-26.9589008150
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.8016638356
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.6008437856
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.7414019556
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.6260264914
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.6497975175
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.1163577782
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.6320739042
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.8449189841
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.8658152944
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-29.2065706432
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.1642428928
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.7338182938
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.1595298602
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-26.4064893683
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-13.1399961922
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-28.1720921603
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-07.8666380602
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-30.4528420826
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-17.6412577552
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.6049423661
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-06.3049767706
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-06.2009634387
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-02.5022249105
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-29.7393795874
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-29.3246495986
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-29.5626397265
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-28.9898333402
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.3050459208
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.9100993968
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.7105244055
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.0858814454
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.3261871456
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.7011577880
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.3494210051
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.5242567679
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-27.8416296998
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-26.6561227284
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-26.1391063555
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-25.0826678793
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-24.3576699557
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.5947616583
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.5783502498
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.5282132815
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.2508635436
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.5818621835
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-22.9064493864
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-22.9419479029
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-14.4555880193
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-28.6123437462
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-21.0157263813
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.4472799880
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.0318673169
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.1935888746
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-29.0486736640
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.2125985338
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.2616943540
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-30.5953919826
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-26.9940715624
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.5791411235
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.9651384872
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.0926587831
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.8281849020
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-19.9091609408
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-19.5933076239
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-19.1963187580
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.9976473254
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-19.4260927943
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-19.6129961655
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-08-28.4656227267
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.8608039925
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.8261256244
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-19.1878857361
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-19.6886451300
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-19.0450410391
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-19.8261957003
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-19.7404922867
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-18.7420097234
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-18.9655063574
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-18.7136712785
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-22.5050674861
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-21.3588852948
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-21.1704665320
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-21.7498389567
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-21.5960865174
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-23.3850929134
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-22.0416836803
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-21.6359383739
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-26.5877772417
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-22.9160483750
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-18.8816425062
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-18.6452830968
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.2294459887
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.5058954703
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.1324564945
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.7204575146
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.6748181987
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.0778967695
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.4620843272
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-16.6418769081
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-18.5038253522
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-18.7766040015
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.9570539276
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.2226918511
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.7161353900
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.6193865282
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.9073047596
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.8381831451
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.5447667765
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.5317088463
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-15.3868872490
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-15.9924699702
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-15.0959982913
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-09.5548934995
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-06.2204774911
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-30.3427452059
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-01.8728837097
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.3230594147
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.5319674562
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-16.6041090973
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-14.1536977327
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-03.3756009361
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-24.9945747422
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-17.2680699598
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-18.1694396677
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-18.3746922341
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.4777811346
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-16.0273088182
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-15.1172460062
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-15.3217762894
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.8975669560
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-16.8306878279
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-15.2667523088
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-13.5636360890
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-13.4997303658
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-13.8101374124
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-13.9635888705
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-13.5216406439
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-12.3364564085
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-12.3918241500
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-11.7301782934
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.7555358251
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.7152569802
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.1837330896
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.4725660542
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.1486073618
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.6985822415
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.5749767416
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.0018713910
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.6113797590
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-13.0245224600
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-12.1617585532
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-12.4185858007
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-11.7728641038
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-11.3436411243
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-11.9445775040
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-11.0233384652
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-11.6739234304
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.8841487611
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.4703293585
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-09.1392921101
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-09.0406883792
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-09.3696188154
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-09.3944793871
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-09.9739422551
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-20.0629920339
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-15.4004188710
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-15.2098666038
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-13.6960817975
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-13.4759879324
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-13.6958970697
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-12.7200282480
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.5667031198
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.3383052605
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-09.4648520117
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-08.8693601744
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-08.1526367969
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-08.2659671526
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-08.8074666551
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-08.1783659871
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-07.4686911651
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-06.6191884404
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-06.3668974151
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-30.5126008045
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-07-23.1535959493
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-11.4225892735
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.8998791423
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-09.7675282284
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-08.1254160510
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-07.3804191796
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.0261155567
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-08.7322195755
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-04.1698681463
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-04.6618081802
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-04.0379405415
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-17.2399171341
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-10.9698948177
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-06.3214792407
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-04.5518227843
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-04.3164708543
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-04.2638545377
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-04.5441610284
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.6851952659
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.6966744779
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.9841775237
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.0097025010
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.9480063207
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.0137568061
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.7179317222
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.6804645197
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.0328040783
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.9587640972
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.4870771034
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.3963650739
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.9182537233
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.0527453880
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.4265633878
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.6818171660
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.6132939323
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-03.7069448487
getting data for https://www.wta.org/go-hiking/trip-reports/

getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-06.4703903521
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-06.5398393371
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-05.3872490942
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-05.3882615214
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-05.8208496416
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-05.2257518169
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-05.6876700520
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-05.0357134540
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-04.8169516882
getting data for https://www.wta.org/go-hiking/trip-reports/trip_report.2018-06-04.2602414486
getting data for https://www.wta.org/go-hiking/trip-reports/

In [3]:
peoplereports = pd.read_csv("peoplereviews3.csv", header=None)

#tripreportnames = ['url', 'hike']

#tripreports.columns = tripreportnames
peoplereports

Unnamed: 0,0,1,2,3,4,5,6,7
0,https://www.wta.org/go-hiking/trip-reports/tri...,McClellan Butte — Washington Trails Association,https://www.wta.org/go-hiking/hikes/mcclellan-...,\nQuick run up the Butte with beautiful fall w...,Sep 19 2018,Type of HikeDay hikeTrail ConditionsTrail in g...,"<div id=""trip-features"">\n<div class=""feature ...",5
1,https://www.wta.org/go-hiking/trip-reports/tri...,"Nooksack Cirque, Hannegan Pass and Peak, Goat ...",https://www.wta.org/go-hiking/hikes/nooksack-c...,\nHannegan Pass Road is closed from 9/18-9/20 ...,Sep 19 2018,Type of HikeDay hikeTrail ConditionsTrail in g...,"<div id=""trip-features"">\n</div>",3
2,https://www.wta.org/go-hiking/trip-reports/tri...,Annette Lake — Washington Trails Association,https://www.wta.org/go-hiking/hikes/annette-lake,\nMy dog Dusty and I reached the Annette Lake ...,Sep 19 2018,Type of HikeDay hikeTrail ConditionsTrail in g...,"<div id=""trip-features"">\n<div class=""feature ...",3
3,https://www.wta.org/go-hiking/trip-reports/tri...,Anti-Aircraft Peak — Washington Trails Associa...,https://www.wta.org/go-hiking/hikes/anti-aircr...,\nDidn't want to drive too far so did this hik...,Sep 19 2018,Type of HikeDay hikeTrail ConditionsTrail in g...,"<div id=""trip-features"">\n</div>",1
4,https://www.wta.org/go-hiking/trip-reports/tri...,Tahoma Creek Suspension Bridge - Emerald Ridge...,https://www.wta.org/go-hiking/hikes/emerald-ridge,\nWe chose to do the unmaintained Tahoma Creek...,Sep 19 2018,Type of HikeDay hikeTrail ConditionsTrail diff...,"<div id=""trip-features"">\n</div>",10
5,https://www.wta.org/go-hiking/trip-reports/tri...,Scott Paul Trail — Washington Trails Association,https://www.wta.org/go-hiking/hikes/scott-paul...,\nI hiked the trail counterclockwise taking th...,Sep 18 2018,Type of HikeDay hikeTrail ConditionsTrail in g...,"<div id=""trip-features"">\n<div class=""feature ...",4
6,https://www.wta.org/go-hiking/trip-reports/tri...,Heliotrope Ridge — Washington Trails Association,https://www.wta.org/go-hiking/hikes/heliotrope...,\nThe road is closed until 9/21 :( No hikers a...,Sep 18 2018,Type of HikeDay hikeTrail ConditionsTrail in g...,"<div id=""trip-features"">\n</div>",9
7,https://www.wta.org/go-hiking/trip-reports/tri...,Green Mountain — Washington Trails Association,https://www.wta.org/go-hiking/hikes/green-moun...,\nThe Green Mountain trail is magic my sweet s...,Sep 18 2018,Type of HikeDay hikeTrail ConditionsTrail in g...,"<div id=""trip-features"">\n<div class=""feature ...",22
8,https://www.wta.org/go-hiking/trip-reports/tri...,"Burroughs Mountain, Shadow Lake - Sunrise Camp...",https://www.wta.org/go-hiking/hikes/burroughs-...,\nDid the loop around Shadow Lake to 1st and 2...,Sep 18 2018,Type of HikeDay hikeTrail ConditionsTrail in g...,"<div id=""trip-features"">\n<div class=""feature ...",7
9,https://www.wta.org/go-hiking/trip-reports/tri...,Winchester Mountain — Washington Trails Associ...,https://www.wta.org/go-hiking/hikes/winchester...,\nThe road is what it is. I parked and walked ...,Sep 18 2018,Type of HikeDay hikeTrail ConditionsTrail in g...,"<div id=""trip-features"">\n<div class=""feature ...",13


In [4]:
peoplereports.describe()

Unnamed: 0,7
count,9448.0
mean,4.109018
std,5.368615
min,0.0
25%,1.0
50%,2.0
75%,6.0
max,78.0


In [5]:
peoplereports = peoplereports.drop_duplicates()

In [6]:
peoplereports.describe()

Unnamed: 0,7
count,9425.0
mean,4.109602
std,5.369554
min,0.0
25%,1.0
50%,2.0
75%,6.0
max,78.0


## Optional: Store in postgreSQL database

In [1]:
from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists, create_database
from sqlalchemy import *
import psycopg2

In [2]:
# Define a database name (we're using a dataset on births, so we'll call it birth_db)
# Set your postgres username/password, and connection specifics
username = 'postgres'
password = 'mypassowrd'     # change this
host     = 'localhost'
port     = '5432'            # default port that postgres listens on
db_name  = 'trail_project2'

In [3]:
## 'engine' is a connection to a database
## Here, we're using postgres, but sqlalchemy can connect to other things too.
engine = create_engine( 'postgresql://{}:{}@{}:{}/{}'.format(username, password, host, port, db_name) )
print(engine.url)

postgresql://postgres:mypassowrd@localhost:5432/trail_project2


In [26]:
## create a database (if it doesn't exist)
if not database_exists(engine.url):
    create_database(engine.url)
print(database_exists(engine.url))

True


In [None]:
## insert data into database from Python (proof of concept - this won't be useful for big data, of course)
peoplereports.to_sql('peoplereports', engine, if_exists='replace')