# US National Parks
This notebook is structured based on a [project scoping guide](http://www.datasciencepublicpolicy.org/our-work/tools-guides/data-science-project-scoping-guide/) provided by Carnegie Mellon University.
<br>
<br>
**GOAL:** Characterize reactional use of public lands in the United States.
<br>
**DATA:** Publically avaialable data on recreational use of the US national parks will be scrapped from the [National Parks Service (NPS)](https://irma.nps.gov/STATS/) website.<br>
**ANALYSIS:** Exploratory data analysis to gain insights into the dataset.
<br>
**ETHICAL CONSIDERATIONS:** There are no apparent issues with privacy, transparency,
discrimination/equity, or accountability based on information provide from NPS. 
<br>
**ADDITIONAL CONSIDERATIONS:** None.

## Load libraries

In [1]:
import pandas as pd

## Access data from NPS
Data for this project is publically avialable on the NPS website. This is my first attempt at web-scrapping, which can be accomplished using different libaries in Python (e.g., **BeautifulSoup**). I'm going to keep it simple for now and use the `read_html` function from **pandas**.<br>
Before doing that, I'd like to modularize the URL so that I can, ultimately, look through all the national parks, rather than accessing each park data page individually. The rationale for this approach was presented in [this tutorial](https://www.youtube.com/watch?v=ooj84UP3r6M). 

In [2]:
# define url string 
string = 'https://irma.nps.gov/STATS/SSRSReports/Park%20Specific%20Reports/Recreation%20Visitors%20By%20Month%20(1979%20-%20Last%20Calendar%20Year)?Park={}'

Now we can loop through a list of NPS-defined park abbreviations to access data for each specific national park.

In [3]:
# temporary list, need to get all NPS-defined abbreviations
parks = ['ACAD', 'ARCH', 
         'BADL', 'BIBE', 'BISC', 'BLCA', 'BRCA',
         'CANY', 'CARE', 'CAVE', 'CHIS', 'CONG', 'CRLA', 'CUVA'
        ]

In [4]:
for park in parks:
    url = string.format(park)
    print(url)

https://irma.nps.gov/STATS/SSRSReports/Park%20Specific%20Reports/Recreation%20Visitors%20By%20Month%20(1979%20-%20Last%20Calendar%20Year)?Park=ACAD
https://irma.nps.gov/STATS/SSRSReports/Park%20Specific%20Reports/Recreation%20Visitors%20By%20Month%20(1979%20-%20Last%20Calendar%20Year)?Park=ARCH
https://irma.nps.gov/STATS/SSRSReports/Park%20Specific%20Reports/Recreation%20Visitors%20By%20Month%20(1979%20-%20Last%20Calendar%20Year)?Park=BADL
https://irma.nps.gov/STATS/SSRSReports/Park%20Specific%20Reports/Recreation%20Visitors%20By%20Month%20(1979%20-%20Last%20Calendar%20Year)?Park=BIBE
https://irma.nps.gov/STATS/SSRSReports/Park%20Specific%20Reports/Recreation%20Visitors%20By%20Month%20(1979%20-%20Last%20Calendar%20Year)?Park=BISC
https://irma.nps.gov/STATS/SSRSReports/Park%20Specific%20Reports/Recreation%20Visitors%20By%20Month%20(1979%20-%20Last%20Calendar%20Year)?Park=BLCA
https://irma.nps.gov/STATS/SSRSReports/Park%20Specific%20Reports/Recreation%20Visitors%20By%20Month%20(1979%20-%

Looks good! The next step will be to read the data tables into the notebook, followed by general data tidying.<br><br>***To be continued***