Summary: Fetching, wrangling, analyzing, and visualizing Twitter and Congress data with Python 3.5, tweepy, pandas, and matplotlib.
Twitter and Congress Mashup
This repo contains data and several walkthroughs for fetching, wrangling, and visualizing the data, as a means to practice general Python programming as well as learn a bit of pandas and matplotlib.
About the data
The data comes from two sources:
- The unitedstates/congress-legislators Github repo, which contains crowdsourced lists of biographical data for every U.S. congressmember, including their known social media accounts.
- The Twitter Public REST API, specifically, the users/lookups and statuses/user_timeline endpoints.
- There's a spreadsheet of basic info about legislators -- data/legislators-basic-info.csv that is derived from a spreadsheet at the Sunlight Foundation, which itself derived the data from unitedstates/congress-legislators.
Programming environment requirements
This code was written and tested using the Python 3.5.0 installation provided by Anaconda. I try to use as few non-standard libraries as possible, but in general, Anaconda creates an environment with has just about everything you'd need, including python-dateutil
If you plan on trying to fetch the data for yourself and following my fetch-code to the letter, you'll need to install tweepy on your own.
- Fetching the data - how did the data in data/twitter show up in the repo? Not by magic, but by using the Twitter API and mashing it with crowdsourced Congress data. Note: you don't actually have to do these steps to get data; this repo comes packaged with all the fetched data so that you can focus on the wrangling and visualization.
- Wrangling the Twitter profiles - The data structure of a Twitter user profile, as Twitter's API provides it, is pretty complicated. Complicated enough that it needs to be serialized as a nested JSON, which makes it hard to throw all the data in data/twitter/profiles into a spreadsheet for easy comparison. So let's make our own data file by picking the interesting data points from each Twitter profile and saving as a flat, easy-to-use CSV data/wrangled/congress-twitter-profiles.csv
- Wrangling the Twitter tweets - Same deal as above, except not as lengthy of a walkthrough.
- Analyzing the wrangled data with pandas - with the data in convenient-to-read CSV files, let's use pandas to do some data analysis.
- Visualizing the wrangled data - When you've spent time to think through the structure of data and how to organize it, visualizations become very easy to produce.