Skip to content

BStaff1986/Dataquest-Assignments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Certificate of Completion - Bryan Stafford

Project
Police Killings
Path Module Course Date
Data Analyst Intermediate Pandas and Python Data Analysis With Pandas: Intermediate July 20th, 2016
Description
In this guided project, a dataset containing information about citizens killed by police in 2015 was explored. Race and socioeconomic factors were analyzed. US State Census data was merged with this dataset to create a rate statistic which describes the frequency in which citizens were killed by police in each state. Finally, the top and bottom 10 states, ranked by the police killing rate statistic, are compared by their mean incomes and racial proportions.

    Libraries used:

  • pandas
  • matplotlib
  • numpy

Datasets
  • police_killings.csv
  • state_population.csv

Datasets supplied by FiveThirtyEight and the US Census Bureau

Project
Visualizing Pixar's Rollercoaster
Path Module Course Date
Data Analyst Intermediate Python And Pandas Exploratory Data Visualization July 22nd, 2016
Description
In this guided project, the financial and critical successes of Pixar movies created between 1995-2015 are explored. Graphs created with pandas's plotting methods are used to compare reviews from various movie review websites. Another graph displays the share of Pixar's domestic and international revenue for each film.

Libraries used:

  • pandas
  • matplotlib
  • Seaborn

  • Datasets
    • PixarMovies.csv

    Dataset supplied by Paulo Vasconcellos

    Project
    Custom Data Visualization
    Path Module Course Date
    Data Analyst Intermediate Pandas and Python Exploratory Data Visualization July 23rd, 2016
    Description
    The purpose of this guided project was to apply our knowledge of matplotlib customization options. Using a dataset describing employment outcomes and gender of recent graduates from 173 different majors, a pair of graphs are created. Code is used to add and rotate labels, constrain the range of the graph, and to create a figure with 4 subplots.

    Libraries used:

  • pandas
  • matplotlib
  • Datasets
    • recent-grads.csv

    Dataset supplied by Dataquest.io

    Project
    Preparing Data For SQLite
    Path Module Course Date
    Data Analyst Working With Data Sources SQL And Databases: Intermediate August 13th, 2016
    Description

    This project is the first of a two-part SQL guided project. In part 1, a dataset of Academy Award winners is prepared and imported into a newly created SQL database.

    Libraries used:

  • pandas
  • sqlite3
  • Datasets
    • academy_awards.csv

    Dataset supplied by AggData

    Project
    Creating Relations In SQLite
    Path Module Course Date
    Data Analyst Working With Data Sources SQL And Databases: Intermediate August 28th, 2016
    Description
    In part 2 of the SQL guided project, a new SQL table is created to store information about Academy Award ceremonies (namely, who hosted the event) from 2000 to 2010. A one-to-many connection is made between the nominations and ceremonies table by adding a foreign keys column to the nominations table.
    Next, a many-to-many connection is made by creating an actors and movies table which is then connected by a join table.

    Libraries used:

  • pandas
  • sqlite3
  • Datasets
    • academy_awards.csv

    Dataset supplied by AggData

    Project
    Investigating Airplane Accidents
    Path Module Course Date
    Data Analyst Advanced Python And Computer Science Data Structures And Algorithms September 20th, 2016
    Description
    In this guided project, a non-CSV dataset is imported and cleaned. A list of dictionaries is used to store the data rather than a pandas DataFrame. After the data is properly prepared, a pair of functions are written to perform a cursory exploration of the data.

    Libraries used:

  • collections.Counter
  • Datasets
    • AirplaneAccidents.txt

    Datasets supplied by National Transport Safety Board

    Project
    Analyzing Movie Reviews
    Path Module Course Date
    Data Analyst Probability And Statistics Probability And Statistics In Python: Beginner September 28th, 2016
    Description
    A dataset containing the review scores from Metacritic, IMDB, Rotten Tomatoes, and Fandango for 146 films is analyzed. The data is normalized and rounded to create a common scale for comparison. Correlation and linear regression values are calculated while exploring the relationship between Metacritic and Fandango scores.

    Libraries used:

  • pandas
  • matplotlib
  • numpy
  • scipy.stats
  • Dataset
    • bike_rental_hour.csv

    Data provided byFiveThirtyEight from their article on Fandango movie review scores

    Project
    Analyzing NYC High Schools
    Path Module Course Date
    Data Analyst Intermediate Pandas and Python Data Cleaning September 28th, 2016
    Description
    Datasets containing information about New York City schools including class sizes, SAT scores, racial demographics and survey results are imported and cleaned. Correlations between SAT scores and all other numerical dataset values are calculated and visualized with a heatmap. Schools are grouped by district with their reported safety scores averaged and then plotted onto a map with color-coordinated dots. Skewness seen on the map graphic is then visualized with a graphic of a probability density function.

    Libraries used:

  • pandas
  • re
  • numpy
  • Seaborn
  • matplotlib
  • Basemap
  • Datasets
    • ap_2010.csv - Advanced Placement test data
    • class_size.csv
    • demographics.csv
    • graduation.csv
    • hs_directory.csv
    • sat_results.csv
    • survey_all.txt
    • survey_d75.txt

    Datasets supplied by NYC Department of Education

    Project
    Predicting Bike Rentals
    Path Module Course Date
    Data Scientist Machine Learning Decision Trees September 29th, 2016
    Description
    Data from a bike sharing program is imported and briefly explored using correlations and a histogram. Some feature engineering is done to improve the data's suitability for machine learning models. Next, the dataset is used to train three different machine learning models. Their accuracy is compared using root mean squared errors.

    Libraries used:

  • pandas
  • matplotlib
  • SKLearn
    • linear_model.LinearRegression
    • tree.DecisionTreeRegressor
    • ensemble.RandomForestRegressor
  • Dataset
    • bike_rental_hour.csv

    Capital Bikeshare data cleaned and combined by Dataquest.