Skip to content

Project 4 Udacity Project: Data Wrangling with Python and Jupyter Notebooks. I have analysed Twitter data of @WeAreDogs account. SQL and Excel were also used in this project.

Notifications You must be signed in to change notification settings

burakgunbatan/UdacityProject---Data-Wrangling

Repository files navigation

UdacityProject---Data-Wrangling (Twitter Data)

Project 4 - Udacity Project: Data Wrangling with Python and Jupyter Notebooks

Description

In this project, I will be using Python and jupiter notebooks and its libraries to gather data from a variety of sources/formats, assess its quality/tidyness and then clean it. The dataset that will be wrangled and analyzed is the tweet archive of Twitter user @dog_Rates (WeRateDogs).

Steps:

Data is gathered from at least 3 different sources;

a) One is a dataset that is given to me to be downloaded manually. b) Another should be downloaded programatically using a URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv.
c) Additional data should be pulled using the JSON file from Twitter's API. If there is a problem related with server, you can get it from Udacity documents.

  • Data is assessed, cleaned and tested.
  • Performed Visual and Programmatic assessment.
  • Detected quality issues and tidiness issues and documented.
  • Steps were defined to fix the issues and carried out.
  • Tests were done to make sure these issues have been resolved.
  • Data is exported to SQLite databse and CSV files.
  • Data is then Analyzed to find at least 3 findings.
  • Findings are documented in pdf file.

Submission:

  • Project code will be performed within a jupyter notebook.
  • The wrangling steps (gather, assess, clean) are clearly identified.
  • Data must be gathered from at least three different sources.
  • Imported into separate pandas DataFrames at first.
  • Cleaned datasets must be stored as CSV or SQLite database.
  • Dataset must be analyzed to produce at least 3 separate insights.
  • Contains at least one labeled visualization.
  • The wrangling phase is documented in a concise PDF file.
  • The findings are placed in a report as a PDF as well.
  • Important files & Installation Notes.

wrangling-act.ipynb is the jupyter notebook file which outlines each individual step during the wrangling process along with comments and markdown analysis of the assessment stages.

References and Citations

Tidy Data
Twitter API
Udacity Project Guide

Burak Gunbatan

About

Project 4 Udacity Project: Data Wrangling with Python and Jupyter Notebooks. I have analysed Twitter data of @WeAreDogs account. SQL and Excel were also used in this project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published