This repository contains Python notebooks (and their output) combining and analyzing the FOIA records provided by the US Geological Survey to the Data Liberation Project.
For more information about these records, see the introductory documentation.
- notebooks/00-combine.ipynb combines the USGS-provided individual spreadsheets provided into a single CSV file. (Note: That file is too large to store in GitHub; the Data Liberation Project is sharing it via Google Drive.)
- notebooks/01-descriptive-stats.ipynb generates descriptive statistics and identifies some outliers in the data.
To run the code yourself, take these steps:
- Download the raw data files and copy them to
data/raw/sites/
- Ensure that you have Python 3 installed
- Run
make venv
to create a Python virtual environment for this repository - Run
source venv/bin/activate
to activate the virtual environment - Run
jupyter lab
to launch Jupyter, and then navigate within it to thenotebooks/
directory - Run the notebooks
This repository's code is available under the MIT License terms. The raw data files (those in data/raw
) are public domain. All other data files are available under the Creative Commons CC BY-SA 4.0 license terms.
File an issue in this repository or email Jeremy Singer-Vine at jsvine@gmail.com
.