Skip to content

Lamozain/weRateDogs_project

Repository files navigation

WERATEDOGS DATA

by Amos Moses Omofaiye (UDACITY DAND SCHOLAR)

Dataset

This project is part of the Bertelsman Scholarship for Data Analyst from Udacity. The main thrust is to analyze tweet data of WeRateDogs - a dog rating organization. This organization provides a humourous dog rating service. One notable thing is that their rating numerator is usually greater than the denominator. This is akin to awarding a student 12/10. They do this because the dog is too good to them. Other information about them can be found in the README.txt file. To begin with, the data must first be gathered from three sources. One of the files has been provided by Udacity (twitter-archive-enhanced.csv). The second dataset will be programmatically gathered from udacity through the address (https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv). This second dataset will be stored as image-predictions.tsv which is the file name. The third and the final dataset for this project will be programmatically gathered from twitter using their API. The data will be stored as tweet-json.txt and the needed data will be ectracted from it into a dataframe. After gathering the data will be cleaned, combined, stored, analyzed, visualized, and reported on. I will try to make the process as interactive as possible. Now, we look at the specific objectives of the project.


Objectives of the project

The objectives of the project are to:

  • gather data from three different sources.
  • assess the gathered data with the aim of identifying at least 8 quality issues and 2 tidiness issues.
  • clean the data with respect to the identified issues.
  • store the cleaned data in a file titled twitter-archive-master.csv.
  • analyze and visualize the stored data producing at least 3 insights and 1 visualization.
  • report the work by producing two documents namely internal (wrangle_report.pdf or html with 300-600 words) detailing the wrangling efforts and external (act_report.pdf or html, 250 words mininum) detailing the insights and visualizations.

Conclusion

Conclusively, I gathered data from three datasets, assessed them, cleaned them, stored the cleaned dataset, and also analyzed and visualized the cleaned dataset. I have been able to discover that the most popular dog stage is Pupper while the most popular dog name is 'A', followed by 'Charlie'. I have also found that tweets decreases overtime while favorites keep increasing. Moreover, I found that the source of the text, the year the text was made, and the developmental stage of the dog affects dogs' rating. The supporting documents, attached, should be consulted to see more of the insights.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published