This is a Code for Boston project that is trying to predict health-based drinking water violations using the Environmental Protection Agency's Safe Drinking Water Information System.
We will explore other datasets from the EPA, including the Toxic Release Inventory database, the Superfund Enterprise Management System, the Environmental Radiation Monitoring database, and the Enforcement and Compliance History Outline.
We are using Python for the analysis and are in the early stages.
Find us on our Slack channel #water
The easiest way to install the Python dependencies is using Pipenv. Ensure that you have Pipenv installed, then, with the repo as your working directory, run:
To add a new Python dependency, run:
pipenv install antigravity # Replace `antigravity` with desired package name
Be sure to commit
Pipfile.lock to the repo.
Running the notebooks
To run the notebooks, run:
pipenv run jupyter notebook jupyter_notebooks
Running the scraper
To run the scraper, run:
pipenv run python -i swid-db-scraper.py
This will load the file and put you into a command prompt. From that prompt, run the following:
The web-scraper functions as a command-line program with a flag based interface. The --help and -h flag is supported to get futher information however flags covered are listed here
- -p takes a number of process that the script can use to process all of the different data sets
- -l takes a pathname to a logfile which the script will write too
- -rs takes a number which maps to the number of records to be included in any one request
- -mq takes a number which maps to the number of times a url should be attemped before giving up
- test parallel implementation of web scraper
- adjust webscraper to reference file name
- update list of databases we want beyond SWDIS databases
- test that data read in
- how can we join and aggreate data so that we dont always have to navigate
- additional print out info to better understand were we are in the scripts
- additional error handlings
running scripts on your machine
(instructions only tested on solus linux but should hold on other os') in the command line cd to the safe water directory run the command 'python3.6 -i swid-db-scraper.py' this will load the file and put you into a command prompt now run the following command: 'pull_envirofacts_data()'