Twitter API and wrangling practice. Udacity NanoDegree project for lesson 'Wrangle and Analyze Data'
Wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations. The Twitter archive is great, but it only contains very basic tweet information. Additional gathering, then assessing and cleaning is required.
- CSV file with WeRateDogs archive of tweets;
- Download TSV file with dog image breed predictions programatically using
Requests
library - Query the Twiteer API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file
- Detect and document at least eight (8) quality issues and two (2) tidiness issues
- Clean each of the issues you documented while assessing. Perform this cleaning in wrangle_act.ipynb as well. The result should be a high quality and tidy master pandas DataFrame (or DataFrames, if appropriate)
- Store the clean DataFrame(s) in a CSV file with the main one named twitter_archive_master.csv
- Create a 300-600 word written report called wrangle_report.pdf that briefly describes your wrangling efforts. This is to be framed as an internal document
- Create a 250-word-minimum written report called act_report.pdf that communicates the insights and displays the visualization(s) produced from your wrangled data. This is to be framed as an external document, like a blog post or magazine article, for example.
- You can check an HTML view of our data wrangling approach here.
- Check the Udacity grade report on our data wrangling efforts in PDF