# Exploring the National Parks of the United States
> "Wilderness is not a luxury but a necessity of the human spirit, and as vital to our lives as water and good bread. A civilization which destroys what little remains of the wild, the spare, the original, is cutting itself off from its origins and betraying the principle of civilization itself." <br>***Edward Abbey***, [Desert Solitare](https://www.goodreads.com/book/show/214614.Desert_Solitaire?ac=1&from_search=true&qid=3inpz3msB9&rank=1)<br><br>
"Like winds and sunsets, wild things were taken for granted until progress began to do away with them. Now, we face the question whether a still higher 'standard of living' is worth its cost in things natural, wild and free." <br>***Aldo Leopold***, [A Sand County Almanac](https://www.aldoleopold.org/about/aldo-leopold/sand-county-almanac/)<br>

From [wiki](https://en.wikipedia.org/wiki/List_of_national_parks_of_the_United_States): "The United States has 63 national parks, which are congressionally designated protected areas operated by the National Park Service, an agency of the Department of the Interior. National parks are designated for their natural beauty, unique geological features, diverse ecosystems, and recreational opportunities. The [Organic Act of 1916](https://www.nps.gov/grba/learn/management/organic-act-of-1916.htm) created the National Park Service 'to conserve the scenery and the natural and historic objects and wildlife therein, and to provide for the enjoyment of the same in such manner and by such means as will leave them unimpaired for the enjoyment of future generations.' "<br><br>
**GOAL:** *(1)* Characterize reactional use of public lands in the United States; and *(2)* learn how to scrape and visualize data from a webpage.<br>
**DATA:** Publicly avaialable data on recreational use of the US national parks will be scrapped from the [National Parks Service (NPS)](https://irma.nps.gov/STATS/) website.<br>
**ANALYSIS:** Exploratory data analysis to gain insights into the dataset.<br>
**ETHICAL CONSIDERATIONS:** There are no apparent issues with privacy, transparency,
or accountability in terms of avaiable data. Whether access to the national parks is equitable across communities in the US should be considered further. The NPS has begun administering a [survey](https://www.nps.gov/subjects/socialscience/socioeconomic-monitoring-visitor-surveys.htm) to understand who accesses the parks, and whether access differs as a function of demographic and economic factors. I'd like to incorporate data from that survey into this notebook when it becomes publicly available.<br>
**ADDITIONAL CONSIDERATIONS:** None.

## Load libraries

In [1]:
import requests as rq
from bs4 import BeautifulSoup
import pandas as pd
from keplergl import KeplerGl 

## Create parks data frame 
First step is to compile a data frame of the national parks in the United States. For practice, I will scrape this information from this [wiki page](https://en.wikipedia.org/wiki/List_of_national_parks_of_the_United_States). 

In [2]:
# url to scrape information from
wiki = 'https://en.wikipedia.org/wiki/List_of_national_parks_of_the_United_States'

In [None]:
# extract HTML text
page = rq.get(wiki).text

# convert to BeautifulSoup object
soup = BeautifulSoup(page)

In [None]:
# pull <table> tag that match our class name
table = soup.find('table', class_='wikitable sortable plainrowheaders')

# find <tr> tags in our specified table, ignoring the labels row
parks_table = table.find_all('tr')[1:]

In [None]:
# create empty list to store park information
parks_list = []

In [None]:
# Loop through each park in the website's table
for park in parks_table:
    
    # extract name from <a> tag
    name = park.find('a').get('title')

    # extract state/teritory from specific <a> tag
    state_terr = park.find_all('a')[2].get('title')
    
    # extract date established from specific <span> tag
    established = park.find_all('span')[13].get('data-sort-value')
    established = pd.to_datetime(established[8:-5]).date()
                                
    # extract park latitude from <span class='latitude'> tag
    latitude = park.find(attrs={'class':'span', 'class':'latitude'})
    latitude = latitude.text[:-2].replace('°', '.')
    
    # extract park longitude from <span class='longitude'> tag
    longitude = park.find(attrs={'class':'span', 'class':'longitude'})
    longitude = longitude.text[:-2].replace('°', '.')
    
    # append information to full parks dataframe
    parks_list.append([name, state_terr, latitude, longitude, established])

In [None]:
# create list of NPS park abbreviations
parks_abrv = ['ACAD', 'NPAS', 'ARCH', 'BADL', 'BIBE', 'BISC', 'BLCA', 'BRCA', 'CANY', 'CARE', 
              'CAVE', 'CHIS', 'CONG', 'CRLA', 'CUVA', 'DEVA', 'DENA', 'DRTO', 'EVER', 'GAAR', 
              'JEFF', 'GLBA', 'GLAC', 'GRCA', 'GRTE', 'GRBA', 'GRSA', 'GRSM', 'GUMO', 'HALE', 
              'HAVO', 'HOSP', 'INDU', 'ISRO', 'JOTR', 'KATM', 'KEFJ', 'KICA', 'KOVA', 'LACL', 
              'LAVO', 'MACA', 'MEVE', 'MORA', 'NERI', 'NOCA', 'OLYM', 'PEFO', 'PINN', 'REDW', 
              'ROMO', 'SAGU', 'SEQU', 'SHEN', 'THRO', 'VIIS', 'VOYA', 'WHSA', 'WICA', 'WRST', 
              'YELL', 'YOSE', 'ZION']

In [None]:
# append NPS abbrevations to parks list
for x, y in zip(parks_list, parks_abrv):
    x.append(y)

In [None]:
# create parks dataframe
parks = pd.DataFrame(parks_list,
                     columns=['name', 'state_terr', 'latitude', 'longitude', 'est_date', 'nps_abrv'])

In [None]:
print(parks.head())

Great! We now have a data frame with the name, state/territory, coordinates, establishment date, and abbreviation for each national parks in the United States. 

## Location of parks
The next step will be to use `kepler.gl` to visualize the locations of each national park. I'd like to do something similar to [this approach](https://www.kaggle.com/code/parulpandey/visualizing-india-s-seismic-activity/notebook) posted on Kaggle. Let's come back to this another time.

In [None]:
#map_1 = KeplerGl(height=600)

In [None]:
#map_1.add_data(data=parks, name='name')
#map_1

## Scrape NPS data

In [None]:
# url to scrape information from
nps = 'https://irma.nps.gov/STATS/SSRSReports/Park%20Specific%20Reports/Recreation%20Visitors%20By%20Month%20(1979%20-%20Last%20Calendar%20Year)?Park={}'

In [None]:
# create empty list to store park data
#parks_data = []

# temporary parks list for testing
test_park = ['ACAD']

In [None]:
for park in test_park:
    
    # url to monthly data for each park 
    url = nps.format(park)
    
    # extract HTML text
    page = rq.get(url).text
    print(page)
    
    # convert to BeautifulSoup object
    #soup = BeautifulSoup(page)
    
    # pull <table> tag that match our class name
    #table = soup.find('table', class_ = 'Ae30f20f368af4927806ac09a734045d0170')
    
    # find <tr> tags in our specified table, ignoring the labels row
    #temp_table = table.find_all('tr')
    
    # extract park name from <a> tag
    #xyz = park.find('a').get('xyz')
    
    # append information to full parks list
    #park_data.append(xyz)

There seems to be an issue with extracting the HTML text from the webpage. Maybe that's due to javascript? Unclear. There is another popular web-scraping tool called [Selenium](https://oxylabs.io/blog/selenium-web-scraping) that is suppose to work with javascript. I'll try that out next.<br><br>***To be continued***