For analyzing comments to regulations.gov
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README.md
generate-random.sh
join-search-results.py
join-then-export-files.py
match-random.sh
random-from-search.sh
run order.txt
search-comments.sh
utah-residents.txt

README.md

comment-analysis

For analyzing comments submitted to regulations.gov.

Yes, this is a bit messy and could get a whole lot cleaner if we used this all the time. It's basically a one-off, so it's not the cleanest bit of scripting you've ever seen. I'd probably build the whole thing in Python/Pandas if I had to do it again, but csvkit and bash get the job done. Searching is significantly slower than if you used Pandas, however.

Prerequisites

  • Jupyter Notebook (Highly recommend using virtual environments. pip install jupyter if you're already using Python)
  • Pandas (pip install pandas)
  • markegge's get-comments-with-api notebook
  • csvkit (pip install csvkit)
  • jot (included in MacOS, must compile from source on other platforms. Alternately, use another random number generator in line 19 of generate-random.sh.)
  • GNU core utilities (included in Linux, must install on MacOS using brew install coreutils)

Step-by-step

  1. Run get-comments-with-api from Jupyter Notebook to download the full comment set. (Alternately, export the notebook to a .py file and run that from the command line.) Note that you need an API key from data.gov to download all the comments.
  2. Copy comments.csv into your working directory.
  3. Run sh match-random.sh to clean comments.csv and pick 1000 random comments from it.
  4. Run sh search-comments.sh utah-residents.txt to find possible comments from Utah residents (output is in utah-residents.csv)
  5. Run sh random-from-search.sh 1000 utah-residents.csv to pick 1000 random comments.
  6. Import export-1000-random.csv and utah-residents-random.csv into a spreadsheet (we used Google Docs for simultaneous editing) and code each comment by hand.

Notes

  • If you just want to search the comment set for a bunch of terms, first generate a clean.csv file: csvclean -l comments.csv && mv comments_out.csv clean.csv

  • Then put your search terms into a .txt file, one term per line. (csvgrep uses regex, so terms like liv(e|ed|ing) in utah will find people who live, lived, or are living in Utah. , utah (\d*) finds digits (like a zip code) after comma-space-utah.)

  • run sh search-comments.sh [myfile.txt] to search clean.csv for all the terms in your text file. Output will be in [myfile].csv.]

  • This all works for me on MacOS Sierra. It should work fine on Linux, but in line 19 of generate-random.sh, you'll need to change gshuf to shuf.

  • Our analysis of 650,000 comments posted as of 7:00 am MDT Monday, July 10 is available here and here.