Project | |||
Police Killings | |||
Path | Module | Course | Date |
Data Analyst | Intermediate Pandas and Python | Data Analysis With Pandas: Intermediate | July 20th, 2016 |
Description | |||
In this guided project, a dataset containing information about citizens killed by police in 2015 was explored. Race and socioeconomic factors were analyzed. US State Census data was merged with this dataset to create a rate statistic which describes the frequency in which citizens were killed by police in each state. Finally, the top and bottom 10 states, ranked by the police killing rate statistic, are compared by their mean incomes and racial proportions.
|
|||
Datasets | |||
Datasets supplied by FiveThirtyEight and the US Census Bureau |
Project | |||
Visualizing Pixar's Rollercoaster | |||
Path | Module | Course | Date |
Data Analyst | Intermediate Python And Pandas | Exploratory Data Visualization | July 22nd, 2016 |
Description | |||
In this guided project, the financial and critical successes of Pixar movies created between 1995-2015 are explored. Graphs created with pandas's plotting methods are used to compare reviews from various movie review websites. Another graph displays the share of Pixar's domestic and international revenue for each film.
Libraries used:
|
|||
Datasets | |||
Dataset supplied by Paulo Vasconcellos |
Project | |||
Custom Data Visualization | |||
Path | Module | Course | Date |
Data Analyst | Intermediate Pandas and Python | Exploratory Data Visualization | July 23rd, 2016 |
Description | |||
The purpose of this guided project was to apply our knowledge of matplotlib customization options. Using a dataset describing employment outcomes and gender of recent graduates from 173 different majors, a pair of graphs are created. Code is used to add and rotate labels, constrain the range of the graph, and to create a figure with 4 subplots.
Libraries used:
|
|||
Datasets | |||
Dataset supplied by Dataquest.io |
Project | |||
Preparing Data For SQLite | |||
Path | Module | Course | Date |
Data Analyst | Working With Data Sources | SQL And Databases: Intermediate | August 13th, 2016 |
Description | |||
This project is the first of a two-part SQL guided project. In part 1, a dataset of Academy Award winners is prepared and imported into a newly created SQL database. Libraries used: |
|||
Datasets | |||
Dataset supplied by AggData |
Project | |||
Creating Relations In SQLite | |||
Path | Module | Course | Date |
Data Analyst | Working With Data Sources | SQL And Databases: Intermediate | August 28th, 2016 |
Description | |||
In part 2 of the SQL guided project, a new SQL table is created to store information about Academy Award ceremonies (namely, who hosted the event) from 2000 to 2010. A one-to-many connection is made between the nominations and ceremonies table by adding a foreign keys column to the nominations table. Next, a many-to-many connection is made by creating an actors and movies table which is then connected by a join table.
Libraries used:
|
|||
Datasets | |||
Dataset supplied by AggData |
Project | |||
Investigating Airplane Accidents | |||
Path | Module | Course | Date |
Data Analyst | Advanced Python And Computer Science | Data Structures And Algorithms | September 20th, 2016 |
Description | |||
In this guided project, a non-CSV dataset is imported and cleaned. A list of dictionaries is used to store the data rather than a pandas DataFrame. After the data is properly prepared, a pair of functions are written to perform a cursory exploration of the data.
Libraries used:
|
|||
Datasets | |||
Datasets supplied by National Transport Safety Board |
Project | |||
Analyzing Movie Reviews | |||
Path | Module | Course | Date |
Data Analyst | Probability And Statistics | Probability And Statistics In Python: Beginner | September 28th, 2016 |
Description | |||
A dataset containing the review scores from Metacritic, IMDB, Rotten Tomatoes, and Fandango for 146 films is analyzed. The data is normalized and rounded to create a common scale for comparison. Correlation and linear regression values are calculated while exploring the relationship between Metacritic and Fandango scores.
Libraries used: |
|||
Dataset | |||
Data provided byFiveThirtyEight from their article on Fandango movie review scores |
Project | |||
Analyzing NYC High Schools | |||
Path | Module | Course | Date |
Data Analyst | Intermediate Pandas and Python | Data Cleaning | September 28th, 2016 |
Description | |||
Datasets containing information about New York City schools including class sizes, SAT scores, racial demographics and survey results are imported and cleaned. Correlations between SAT scores and all other numerical dataset values are calculated and visualized with a heatmap. Schools are grouped by district with their reported safety scores averaged and then plotted onto a map with color-coordinated dots. Skewness seen on the map graphic is then visualized with a graphic of a probability density function.
Libraries used:
|
|||
Datasets | |||
Datasets supplied by NYC Department of Education |
Project | |||
Predicting Bike Rentals | |||
Path | Module | Course | Date |
Data Scientist | Machine Learning | Decision Trees | September 29th, 2016 |
Description | |||
Data from a bike sharing program is imported and briefly explored using correlations and a histogram. Some feature engineering is done to improve the data's suitability for machine learning models. Next, the dataset is used to train three different machine learning models. Their accuracy is compared using root mean squared errors.
Libraries used:
|
|||
Dataset | |||
Capital Bikeshare data cleaned and combined by Dataquest. |