For analyzing comments to
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
run order.txt


For analyzing comments submitted to

Yes, this is a bit messy and could get a whole lot cleaner if we used this all the time. It's basically a one-off, so it's not the cleanest bit of scripting you've ever seen. I'd probably build the whole thing in Python/Pandas if I had to do it again, but csvkit and bash get the job done. Searching is significantly slower than if you used Pandas, however.


  • Jupyter Notebook (Highly recommend using virtual environments. pip install jupyter if you're already using Python)
  • Pandas (pip install pandas)
  • markegge's get-comments-with-api notebook
  • csvkit (pip install csvkit)
  • jot (included in MacOS, must compile from source on other platforms. Alternately, use another random number generator in line 19 of
  • GNU core utilities (included in Linux, must install on MacOS using brew install coreutils)


  1. Run get-comments-with-api from Jupyter Notebook to download the full comment set. (Alternately, export the notebook to a .py file and run that from the command line.) Note that you need an API key from to download all the comments.
  2. Copy comments.csv into your working directory.
  3. Run sh to clean comments.csv and pick 1000 random comments from it.
  4. Run sh utah-residents.txt to find possible comments from Utah residents (output is in utah-residents.csv)
  5. Run sh 1000 utah-residents.csv to pick 1000 random comments.
  6. Import export-1000-random.csv and utah-residents-random.csv into a spreadsheet (we used Google Docs for simultaneous editing) and code each comment by hand.


  • If you just want to search the comment set for a bunch of terms, first generate a clean.csv file: csvclean -l comments.csv && mv comments_out.csv clean.csv

  • Then put your search terms into a .txt file, one term per line. (csvgrep uses regex, so terms like liv(e|ed|ing) in utah will find people who live, lived, or are living in Utah. , utah (\d*) finds digits (like a zip code) after comma-space-utah.)

  • run sh [myfile.txt] to search clean.csv for all the terms in your text file. Output will be in [myfile].csv.]

  • This all works for me on MacOS Sierra. It should work fine on Linux, but in line 19 of, you'll need to change gshuf to shuf.

  • Our analysis of 650,000 comments posted as of 7:00 am MDT Monday, July 10 is available here and here.