# COGS 108 - Data Checkpoint

# Names

- Dominic Chua
- Hulk
- Iron Man
- Thor
- Wasp

<a id='research_question'></a>
# Research Question

*In the past 10 years is there a correlation to how recent precipitation (3 days prior) modulates the risk for car accidents in the state of California (comparing North and South California)?*

# Dataset(s)

- - -

1) [Climate Data Online: Dataset Discovery](https://www.ncdc.noaa.gov/cdo-web/datasets#GHCND)

- The National Oceanic and Atmospheric Administration (NOAA) is a United States organization that maintains historical and real time weather data that is publicly available. We intend to collect 10 years worth of precipitaion data from 2011 through 2021 for six different cities, three in the northern half of California and three in the southern half of California. 


- - -

2) [Statewide Integrated Traffic Records System](https://iswitrs.chp.ca.gov/Reports/jsp/CollisionReports.jsp)

- The Statewide Integrated Traffic Records System (SWITRS) is a database that collects and processes data gathered from a collision scene. The Internet SWITRS application is a tool that leverages this database to allow California Highway Patrol (CHP) staff, members of its Allied Agencies, as well as researchers and members of the public to request various types of statistical reports in an electronic format. The application allows for the creation of custom reports requested by the user based on different categories including, but not limited to locations, dates, and collision types. 


- - -

3) [Precipitation Car Crash Datasets](https://github.com/DeusSeos/Precipitation-Car-Crash-Datasets)

- Aggregate car crash and precipitation data for Bakersfield, Palm Springs, Sacramento, San Diego, San Jose, and Stockton. Data is from the time frame 2011 through 2021. The datasets are hosted on a public repository. 


- - -

We will be using pandas to concatentate the precipitation data and car crash data into a 10-year dataframe per city. We will then perform data cleaning in order to create concise visualizations.

- - -



# Setup

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Data Cleaning

Describe your data cleaning steps here.

In [4]:
collision_headers = ["CASE_ID","ACCIDENT_YEAR","PROC_DATE","JURIS","COLLISION_DATE","COLLISION_TIME","OFFICER_ID","REPORTING_DISTRICT","DAY_OF_WEEK","CHP_SHIFT","POPULATION","CNTY_CITY_LOC","SPECIAL_COND","BEAT_TYPE","CHP_BEAT_TYPE","CITY_DIVISION_LAPD","CHP_BEAT_CLASS","BEAT_NUMBER","PRIMARY_RD","SECONDARY_RD","DISTANCE","DIRECTION","INTERSECTION","WEATHER_1","WEATHER_2","STATE_HWY_IND","CALTRANS_COUNTY","CALTRANS_DISTRICT","STATE_ROUTE","ROUTE_SUFFIX","POSTMILE_PREFIX","POSTMILE","LOCATION_TYPE","RAMP_INTERSECTION","SIDE_OF_HWY","TOW_AWAY","COLLISION_SEVERITY","NUMBER_KILLED","NUMBER_INJURED","PARTY_COUNT","PRIMARY_COLL_FACTOR","PCF_CODE_OF_VIOL","PCF_VIOL_CATEGORY","PCF_VIOLATION","PCF_VIOL_SUBSECTION","HIT_AND_RUN","TYPE_OF_COLLISION","MVIW","PED_ACTION","ROAD_SURFACE","ROAD_COND_1","ROAD_COND_2","LIGHTING","CONTROL_DEVICE","CHP_ROAD_TYPE","PEDESTRIAN_ACCIDENT","BICYCLE_ACCIDENT","MOTORCYCLE_ACCIDENT","TRUCK_ACCIDENT","NOT_PRIVATE_PROPERTY","ALCOHOL_INVOLVED","STWD_VEHTYPE_AT_FAULT","CHP_VEHTYPE_AT_FAULT","COUNT_SEVERE_INJ","COUNT_VISIBLE_INJ","COUNT_COMPLAINT_PAIN","COUNT_PED_KILLED","COUNT_PED_INJURED","COUNT_BICYCLIST_KILLED","COUNT_BICYCLIST_INJURED","COUNT_MC_KILLED","COUNT_MC_INJURED","PRIMARY_RAMP","SECONDARY_RAMP"]
#import of Bakersfield data 
BF_2011_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Bakersfield/2011-2015-BF.csv")
BF_2016_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Bakersfield/2016-2021-BF.csv")
BF_weather = pd.concat([BF_2011_weather, BF_2016_weather])
BF_collision = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Bakersfield/2011-2021CollisionRecords.txt", names=collision_headers)

PS_2011_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Palm-Springs/2011-2015-BF.csv")
PS_2016_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Palm-Springs/2016-2021-BF.csv")
PS_weather = pd.concat([PS_2011_weather, PS_2016_weather])
PS_collision = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Palm-Springs/2011-2021CollisionRecords.txt", names=collision_headers)

SAC_2011_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Sacramento/2011-2015-BF.csv")
SAC_2016_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Sacramento/2016-2021-BF.csv")
SAC_weather = pd.concat([SAC_2011_weather, SAC_2016_weather])
SAC_collision = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Sacramento/2011-2021CollisionRecords.txt", names=collision_headers)

SD_2011_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/San-Diego/2011-2015-BF.csv")
SD_2016_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/San-Diego/2016-2021-BF.csv")
SD_weather = pd.concat([SD_2011_weather, SD_2016_weather])
SD_collision = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/San-Diego/2011-2021CollisionRecords.txt", names=collision_headers)

SJ_2011_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/San-Jose/2011-2015-BF.csv")
SJ_2016_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/San-Jose/2016-2021-BF.csv")
SJ_weather = pd.concat([SJ_2011_weather, SJ_2016_weather])
SJ_collision = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/San-Jose/2011-2021CollisionRecords.txt", names=collision_headers)

ST_2011_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Stockton/2011-2015-BF.csv")
ST_2016_weather = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Stockton/2016-2021-BF.csv")
ST_weather = pd.concat([ST_2011_weather, ST_2016_weather])
ST_collision = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Stockton/2011-2021CollisionRecords.txt", names=collision_headers)


  BF_collision = pd.read_csv("https://raw.githubusercontent.com/DeusSeos/Precipitation-Car-Crash-Datasets/main/Bakersfield/2011-2021CollisionRecords.txt", names=collision_headers)


0        A
1        A
2        A
3        A
4        A
        ..
31966    A
31967    C
31968    A
31969    A
31970    A
Name: WEATHER_1, Length: 31971, dtype: object