WERATEDOGS DATA

by Amos Moses Omofaiye (UDACITY DAND SCHOLAR)

Dataset

This project is part of the Bertelsman Scholarship for Data Analyst from Udacity. The main thrust is to analyze tweet data of WeRateDogs - a dog rating organization. This organization provides a humourous dog rating service. One notable thing is that their rating numerator is usually greater than the denominator. This is akin to awarding a student 12/10. They do this because the dog is too good to them. Other information about them can be found in the README.txt file. To begin with, the data must first be gathered from three sources. One of the files has been provided by Udacity (twitter-archive-enhanced.csv). The second dataset will be programmatically gathered from udacity through the address (https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv). This second dataset will be stored as image-predictions.tsv which is the file name. The third and the final dataset for this project will be programmatically gathered from twitter using their API. The data will be stored as tweet-json.txt and the needed data will be ectracted from it into a dataframe. After gathering the data will be cleaned, combined, stored, analyzed, visualized, and reported on. I will try to make the process as interactive as possible. Now, we look at the specific objectives of the project.

Objectives of the project

The objectives of the project are to:

gather data from three different sources.
assess the gathered data with the aim of identifying at least 8 quality issues and 2 tidiness issues.
clean the data with respect to the identified issues.
store the cleaned data in a file titled twitter-archive-master.csv.
analyze and visualize the stored data producing at least 3 insights and 1 visualization.
report the work by producing two documents namely internal (wrangle_report.pdf or html with 300-600 words) detailing the wrangling efforts and external (act_report.pdf or html, 250 words mininum) detailing the insights and visualizations.

Conclusion

Conclusively, I gathered data from three datasets, assessed them, cleaned them, stored the cleaned dataset, and also analyzed and visualized the cleaned dataset. I have been able to discover that the most popular dog stage is Pupper while the most popular dog name is 'A', followed by 'Charlie'. I have also found that tweets decreases overtime while favorites keep increasing. Moreover, I found that the source of the text, the year the text was made, and the developmental stage of the dog affects dogs' rating. The supporting documents, attached, should be consulted to see more of the insights.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.txt		README.txt
act_report.docx		act_report.docx
act_report.pdf		act_report.pdf
final_werate.csv		final_werate.csv
image-predictions.tsv		image-predictions.tsv
output-toggle.tpl		output-toggle.tpl
readme.md		readme.md
requirements.txt		requirements.txt
tweet-json.txt		tweet-json.txt
tweet_extract.csv		tweet_extract.csv
twitter-archive-enhanced.csv		twitter-archive-enhanced.csv
twitter_archive_master.csv		twitter_archive_master.csv
wrangle_act.html		wrangle_act.html
wrangle_act.ipynb		wrangle_act.ipynb
wrangle_acts.pdf		wrangle_acts.pdf
wrangle_project_details.docx		wrangle_project_details.docx
wrangle_report.docx		wrangle_report.docx
wrangle_report.pdf		wrangle_report.pdf

Lamozain/weRateDogs_project

Folders and files

Latest commit

History

Repository files navigation

WERATEDOGS DATA

by Amos Moses Omofaiye (UDACITY DAND SCHOLAR)

Dataset

Objectives of the project

Conclusion

About

Resources

Stars

Watchers

Forks

Languages