# Grand Circus Final Project
### Car Crash and Safety Data Comparisons/Evaluations

This project aims to compare safety ratings from crash tests to actual data of fatal crashes. The use of fatal crash data is better suited for hard crashes where occupant life is and was in danger, providing more relevant data entries compared to fender benders or other minimal 'traffic incidents'. This analysis could be useful for car buyers, car manufactureres, government testers, and insurance companies.

## Extraction
To start the ETA process, data but be extracted and placed into usable structures. To do this, we will be importing the data from the api(s) and any other flat file sources.

In [38]:
import pandas as pd
import requests
import json

# Begin pulling make names and ID's for internal use
# Definitions endpoint query
make_url = "https://crashviewer.nhtsa.dot.gov/CrashAPI/definitions/GetVariableAttributes?variable=make&caseYear=2022&format=json"

# Get response
response = requests.get(make_url)
# Turn response into json
data = response.json()

In [3]:
#Drill down json to list of dictionary
results = data['Results'][0]

In [15]:
# split data into lists
id_list = []
name_list = []
for entry in results:
    id_list.append(entry['ID'])
    name_list.append(entry['TEXT'])

# Make columns dictionary based on lists
data = {'MakeID': id_list, 'Name': name_list}

# Create df using dictionary
manufacturer_df = pd.DataFrame(data)

# Sort by Id instead of name
manufacturer_df = manufacturer_df.sort_values(by=['Name'])
manufacturer_df.head(20)

Unnamed: 0,MakeID,Name
2,3,AM General
0,54,Acura
1,31,Alfa Romeo
3,1,American Motors
4,32,Audi
5,33,Austin/Austin Healey
7,34,BMW
9,70,BSA
6,90,Bluebird
8,80,Brockway


### Make ID: Create dataframe that displays the make and model of cars from API

In [17]:
all_models = []
for make_ID in manufacturer_df['MakeID']:
    model_url = f'https://crashviewer.nhtsa.dot.gov/CrashAPI/definitions/GetVariableAttributesForModel?variable=model&caseYear=2022&make={make_ID}&format=json'
    response = requests.get(model_url)
    model_data = response.json()
    
    results_model = model_data.get('Results') 

    for model in results_model:
        all_models.append({
            'MakeID': make_ID,
            'Models': model[0:]
        })
# Drill down into JSON
drill_down = all_models[0]['Models']
drill_down

[{'ID': 983, 'MODELNAME': 'Bus: Rear engine, Flat front', 'Make': None},
 {'ID': 401, 'MODELNAME': 'Dispatcher', 'Make': None},
 {'ID': 466, 'MODELNAME': 'Dispatcher', 'Make': None},
 {'ID': 402, 'MODELNAME': 'Hummer', 'Make': None},
 {'ID': 482, 'MODELNAME': 'Hummer', 'Make': None},
 {'ID': 431,
  'MODELNAME': 'Hummer (2004 on; see model 421 for 1993-2003)',
  'Make': None},
 {'ID': 481,
  'MODELNAME': 'Hummer (Pickup) (for SUV see model 421 for 1993-2003; see 431 for 2004 on)',
  'Make': None},
 {'ID': 421,
  'MODELNAME': 'Hummer (SUV from 1993-2003; see 431 for 2004 on) (for Pickup, see model 481)',
  'Make': None},
 {'ID': 884, 'MODELNAME': 'Medium/Heavy Truck', 'Make': None},
 {'ID': 441, 'MODELNAME': 'MV-1', 'Make': None},
 {'ID': 498, 'MODELNAME': 'Other (light truck)', 'Make': None},
 {'ID': 898, 'MODELNAME': 'Other (medium/heavy truck)', 'Make': None},
 {'ID': 998, 'MODELNAME': 'Other (vehicle)', 'Make': None},
 {'ID': 988, 'MODELNAME': 'Other(bus)', 'Make': None},
 {'ID': 999

In [19]:
models_df = pd.DataFrame(all_models).sort_values(by='MakeID')
models_df

Unnamed: 0,MakeID,Models
3,1,"[{'ID': 3, 'MODELNAME': 'Ambassador', 'Make': ..."
20,10,"[{'ID': 44, 'MODELNAME': 'Medallion', 'Make': ..."
24,12,"[{'ID': 441, 'MODELNAME': 'Aerostar', 'Make': ..."
45,13,"[{'ID': 401, 'MODELNAME': 'Aviator', 'Make': N..."
51,14,"[{'ID': 9, 'MODELNAME': 'Bobcat', 'Make': None..."
...,...,...
46,93,"[{'ID': 981, 'MODELNAME': 'Bus**: Conventional..."
75,94,"[{'ID': 981, 'MODELNAME': 'Bus**: Conventional..."
57,97,"[{'ID': 997, 'MODELNAME': 'Not Reported', 'Mak..."
61,98,"[{'ID': 701, 'MODELNAME': '0-50cc', 'Make': No..."


In [24]:
# Merge manufacturer_df & models_df
merged_df = pd.merge(manufacturer_df, models_df, on="MakeID", how="left")
merged_df = merged_df.sort_values(by='MakeID')

In [26]:
# Explode the Models column to separate rows
exploded_df = merged_df.explode('Models')
exploded_df

Unnamed: 0,MakeID,Name,Models
3,1,American Motors,"{'ID': 3, 'MODELNAME': 'Ambassador', 'Make': N..."
3,1,American Motors,"{'ID': 5, 'MODELNAME': 'AMX', 'Make': None}"
3,1,American Motors,"{'ID': 9, 'MODELNAME': 'Eagle', 'Make': None}"
3,1,American Motors,"{'ID': 10, 'MODELNAME': 'Eagle SX-4', 'Make': ..."
3,1,American Motors,"{'ID': 7, 'MODELNAME': 'Hornet/Concord', 'Make..."
...,...,...,...
78,99,Unknown Make,"{'ID': 499, 'MODELNAME': 'Unknown (light truck..."
78,99,Unknown Make,"{'ID': 598, 'MODELNAME': 'Unknown (LSG/NGV)', ..."
78,99,Unknown Make,"{'ID': 599, 'MODELNAME': 'Unknown (LSV/NGV)', ..."
78,99,Unknown Make,"{'ID': 709, 'MODELNAME': 'Unknown cc', 'Make':..."


In [28]:
# Extract ID and MODELNAME from the dictionaries in the Models column
exploded_df['ModelID'] = exploded_df['Models'].apply(lambda x: x['ID'] if isinstance(x, dict) else None)
exploded_df['MODELNAME'] = exploded_df['Models'].apply(lambda x: x['MODELNAME'] if isinstance(x, dict) else None)

In [30]:
# Drop the original Models column
df = exploded_df.drop(columns=['Models'])

In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1738 entries, 3 to 78
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   MakeID     1738 non-null   object
 1   Name       1738 non-null   object
 2   ModelID    1738 non-null   int64 
 3   MODELNAME  1738 non-null   object
dtypes: int64(1), object(3)
memory usage: 67.9+ KB


## Getting Crashes Per Year

In [None]:
# Need to add crash totals per model to above dataframe 
# this will be done by simply tallying responses for each car
# Since the api has a max return limit, querying by each year (2010-onwards) will ensure all data is gathered, and allow for year grouping

# Base URL for NHTSA API
base_url = "https://crashviewer.nhtsa.dot.gov/CrashAPI/crashes/GetCrashesByVehicle"


# Since we need all models per year, two loops are needed
for year in range(2011, 2022):
    for car in range(len(df)):
        fatalities = 0
        for state in range(51):
            for model_year in range(2011, 2022)
                # get crashes_by_vehicle for car 
                # something like
                params = f"?make={df['MakeID'].iloc[[car]]}&model={df['ModelID'].iloc[[car]]}&modelyear={model_year}
                    &bodyType={BODYTYPE}&fromCaseYear={year}&toCaseYear={year}&state={state}&format=json"
                # get response(s)
                # check for success/fail
                # if success increment fatalities
                # sleep for a few seconds
                # end of model-year loop
            # end of state loop
        # Add year column for fatalties
        # end of car loop
    # end of year loop


# TO SOLVE
# Bodytype, need to know how many there are, or add bodytype column to previous dataframe from another query

## Transformation
Now that we have usable, workable data, we can begin cleaning and organizing.

In [None]:
# Transformation code

# Drop any unneeded columns/rows
    # duplicates
    # nulls
    # outliers

# Merge/Join Data into one dataframe



## Analysis
With curated data, analysis can begin to check for trends, patters, and correlations.

In [None]:
# Analysis stuff

# list stats
    # most common car with fatalities
    # that kinda stuff, idk

# can probably save visualizing for PowerBI

# more stuff I'm probably forgetting