Skip to content

alexmpaz/data-wrangling

Repository files navigation

data-wrangling

Twitter API and wrangling practice. Udacity NanoDegree project for lesson 'Wrangle and Analyze Data'

The Goal

Wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations. The Twitter archive is great, but it only contains very basic tweet information. Additional gathering, then assessing and cleaning is required.

Provided files

  • CSV file with WeRateDogs archive of tweets;

To Do

  • Download TSV file with dog image breed predictions programatically using Requests library
  • Query the Twiteer API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file
  • Detect and document at least eight (8) quality issues and two (2) tidiness issues
  • Clean each of the issues you documented while assessing. Perform this cleaning in wrangle_act.ipynb as well. The result should be a high quality and tidy master pandas DataFrame (or DataFrames, if appropriate)
  • Store the clean DataFrame(s) in a CSV file with the main one named twitter_archive_master.csv
  • Create a 300-600 word written report called wrangle_report.pdf that briefly describes your wrangling efforts. This is to be framed as an internal document
  • Create a 250-word-minimum written report called act_report.pdf that communicates the insights and displays the visualization(s) produced from your wrangled data. This is to be framed as an external document, like a blog post or magazine article, for example.

Check our Analysis

About

Twitter API practice and wrangling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published