# US National Parks
This notebook is structured based on a [project scoping guide](http://www.datasciencepublicpolicy.org/our-work/tools-guides/data-science-project-scoping-guide/) provided by Carnegie Mellon University.<br><br>
**GOALS:** ***(1)*** Characterize reactional use of public lands in the United States; and ***(2)*** learn how to scrape and visualize data from a webpage.<br>
**DATA:** Publically avaialable data on recreational use of the US national parks will be scrapped from the [National Parks Service (NPS)](https://irma.nps.gov/STATS/) website.<br>
**ANALYSIS:** Exploratory data analysis to gain insights into the dataset.<br>
**ETHICAL CONSIDERATIONS:** There are no apparent issues with privacy, transparency,
discrimination/equity, or accountability in terms of avaiable data. Whether access to the national parks is equitable across communities in the US should be considered further. The NPS has begun administering a [survey](https://www.nps.gov/subjects/socialscience/socioeconomic-monitoring-visitor-surveys.htm) to understand who accesses the parks, and whether access differs as a function of demographic and economic factors. I'd like to incorporate data from that survey into this notebook when it becomes publically available.<br>
**ADDITIONAL CONSIDERATIONS:** None.

## Load libraries

In [1]:
import pandas as pd
import requests as rq
from bs4 import BeautifulSoup

## Gather list US national parks 
First step is to compile a list of the national parks in the United States. For practice, I will scrape this information from this [wiki page](https://en.wikipedia.org/wiki/List_of_national_parks_of_the_United_States). 

In [2]:
# url to scrape information from
wiki = 'https://en.wikipedia.org/wiki/List_of_national_parks_of_the_United_States'

In [3]:
# extract HTML text
page = rq.get(wiki).text

# convert to BeautifulSoup object
soup = BeautifulSoup(page)

In [4]:
# pull <table> tag that match our class name
table = soup.find('table', class_ = 'wikitable sortable plainrowheaders')

# find <tr> tags in our specified table, ignoring the labels row
parks_table = table.find_all('tr')[1:]

In [5]:
# create empty list to store park names
park_names = []

In [6]:
# Loop through each park in the website's table
for park in parks_table:
    
    # extract park name from <a> tag
    name = park.find('a').get('title')
        
    # extract park latitude from <span class='latitude'> tag
    latitude = park.find(attrs={'class':'span', 'class':'latitude'})
    latitude = latitude.text
    
    # extract park longitude from <span class='longitude'> tag
    longitude = park.find(attrs={'class':'span', 'class':'longitude'})
    longitude = longitude.text
    
    # append information to full parks list
    park_names.append([name, latitude, longitude])

In [7]:
print(park_names)

[['Acadia National Park', '44°21′N', '68°13′W'], ['National Park of American Samoa', '14°15′S', '170°41′W'], ['Arches National Park', '38°41′N', '109°34′W'], ['Badlands National Park', '43°45′N', '102°30′W'], ['Big Bend National Park', '29°15′N', '103°15′W'], ['Biscayne National Park', '25°39′N', '80°05′W'], ['Black Canyon of the Gunnison National Park', '38°34′N', '107°43′W'], ['Bryce Canyon National Park', '37°34′N', '112°11′W'], ['Canyonlands National Park', '38°12′N', '109°56′W'], ['Capitol Reef National Park', '38°12′N', '111°10′W'], ['Carlsbad Caverns National Park', '32°10′N', '104°26′W'], ['Channel Islands National Park', '34°01′N', '119°25′W'], ['Congaree National Park', '33°47′N', '80°47′W'], ['Crater Lake National Park', '42°56′N', '122°06′W'], ['Cuyahoga Valley National Park', '41°14′N', '81°33′W'], ['Death Valley National Park', '36°14′N', '116°49′W'], ['Denali National Park and Preserve', '63°20′N', '150°30′W'], ['Dry Tortugas National Park', '24°38′N', '82°52′W'], ['Ever

Great! We now have a list that contains names and coordinates of the sixty-three national parks in the United States.

## Access data from NPS

In [8]:
# url to scrape information from
nps = 'https://irma.nps.gov/STATS/SSRSReports/Park%20Specific%20Reports/Recreation%20Visitors%20By%20Month%20(1979%20-%20Last%20Calendar%20Year)?Park={}'

In [9]:
# create list of NPS-defined park abbreviations
parks = ['ACAD', 'ARCH', 'BADL', 'BIBE', 'BISC', 'BLCA', 'BRCA', 'CANY', 'CARE', 'CAVE', 
         'CHIS', 'CONG', 'CRLA', 'CUVA', 'DEVA', 'DENA', 'DRTO', 'EVER', 'GAAR', 'JEFF', 
         'GLBA', 'GLAC', 'GRCA', 'GRTE', 'GRBA', 'GRSA', 'GRSM', 'GUMO', 'HALE', 'HAVO', 
         'HOSP', 'INDU', 'ISRO', 'JOTR', 'KATM', 'KEFJ', 'KICA', 'KOVA', 'LACL', 'LAVO', 
         'MACA', 'MEVE', 'MORA', 'NERI', 'NPAS', 'NOCA', 'OLYM', 'PEFO', 'PINN', 'REDW', 
         'ROMO', 'SAGU', 'SEQU', 'SHEN', 'THRO', 'VIIS', 'VOYA', 'WHSA', 'WICA', 'WRST', 
         'YELL', 'YOSE', 'ZION']

In [10]:
# create empty list to store park data
#parks_data = []

# temporary parks list for testing
test_park = parks[0]

In [11]:
for park in test_park:
    
    # url to monthly data for each park 
    url = nps.format(park)
    
    # extract HTML text
    page = rq.get(url).text
    print(page)
    
    # convert to BeautifulSoup object
    #soup = BeautifulSoup(page)
    
    # pull <table> tag that match our class name
    #table = soup.find('table', class_ = 'Ae30f20f368af4927806ac09a734045d0170')
    
    # find <tr> tags in our specified table, ignoring the labels row
    #temp_table = table.find_all('tr')
    
    # extract park name from <a> tag
    #xyz = park.find('a').get('xyz')
    
    # append information to full parks list
    #park_data.append(xyz)





<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>Stats Report Viewer</title>
    <link rel="shortcut icon" type="image/x-icon" href="https://irmafiles.nps.gov/WebContent/Irma/Common/v1_0_0/Images/favicon.ico" />
</head>
<body>

<iframe Height="900px" Width="100%" src="/STATS/MvcReportViewer.aspx?_id=7c6bd428-cb13-4b0a-8565-337cefaa418b&amp;_m=Remote&amp;_r=%2fNPS.Stats.Reports%2fPark+Specific+Reports%2fRecreation+Visitors+By+Month+(1979+-+Last+Calendar+Year)&amp;_15=True&amp;_16=True&amp;_18=True&amp;_19=True&amp;_34=False&amp;_35=False&amp;_39=880px&amp;Park=A" style="border: none"></iframe></body>
</html>




<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>Stats Report Viewer</title>
    <link rel="shortcut icon" type="image/x-icon" href="https://irmafiles.nps.gov/WebContent/Irma/Common/v1_0_0/Images/favicon.ico" />
</head>
<body>

<iframe Height="900px" Width="100%" src="/STATS/MvcReportViewer.aspx?_id=a5b232a3-304f-4d94

There seems to be an issue with extracting the HTML text from the webpage. Maybe that's due to javascript? Unclear. There is another popular web-scraping tool called [Selenium](https://oxylabs.io/blog/selenium-web-scraping) that is suppose to work with javascript. I'll try that out next.<br><br>***To be continued***