WeRateDogs is twitter account with username dog_rates where people post tweets about dogs and rate them. click here for more info on the twitter account
Data-Wrangling Analysis process which includes Gathering, Assessing and Cleaning was performed on the data obtained from the twitter account using three various method of gathering data for the gathering stage;
- data_1 = twitter_archive_enhanced.csv (which is already available for download in the classroom)
- data_2 = image_predictions.tsv (queried from the udacity server using request library)
- data_3 = twitter_data.json (scraped from twitter with twitter Api v1)
Tweet data is unclean, messy and unorganised, no accurrate or reliable insights can be obtaine from the data
Gather, clean, organise and tidy the messy data to extract meaningful insights
- Assess the data for tidyness and quality issues
- Document the issues
- Fix and clean the issues documented -Analyse the clean data for interesting facts For the Wrangle report Click here
Perform data-wrangling analysis on the WeRateDog tweets from 2015 - 2017, which involves gathering, assessing and cleaning the datas
The Gathered data was assessed visualy using Ms-Excel Spreadsheet and .head and .tail function in python to some quality issues(content issues) and tidiness(Structural issues). Moreso, the dataset was also assessed programmatically using some funtions in python to further investigate te dataset for hidden issues.
After assessing and documenting the detected issues, the cleaning stage was initialize to fix the issues highligted to get the dataset ready for analysis by writing and automating some python codes to fix the issues.
- From the result of the analysis it was observed that the dog specie with the highest prediction count in prediction 1 is the Golden_Retriever with the highest count of 150
- The result of the analysis reveals the tweet with highest retweet count is Here's a doggo realizing you can stand in a po... with the retweet count of 70770 among all other tweets between the 2015-2017
- From the analysis it was revealed that the most frequent tweet source for dogs' tweet between 2015 - 2017 is twitter for iphone. For the act report Click here
Analysis and some syntax ideas was acquired from ALX DA Udacity classroom, data science community such as stackoverflow and geeks4geeks,datacamp and some code documentation websites such as tweepy docs and regular expression docs.