Code to index and Analyze FCC comments
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.vscode
fcc_analysis
.gitignore
README.md
setup.py

README.md

FCC Comment Analysis

This reposity has Python code designed to download FCC data, storing it in an ElasticSearch instance. There's an additional command to tag and analyze the data further.

After a first pass in a Jupyter Notebook, I used Kibana on AWS to do most of my digging.

To install the package and run tests:

$ pip install -e .
$ python setup.py test

To crawl the comments, make sure you have a server setup, and then run:

$ fcc index --endpoint=http://localhost:9200/

This will take anywhere from 2-4 hours (or wont' work at all, if the API is down).

To get a smaller subset of comments for testing, add -g YYYY-MM-DD to get comments submitted after the specified date:

$ fcc index --endpoint=http://localhost:9200/ -g 2017-06-01

I then take another pass on the data, appending "analysis" variables to all of the documents. This makes it a lot easier to spot trends in Kibana.

To analyze the comments:

$ fcc analyze --endpoint=http://localhost:9200/