# Exploratory Data Analysis - Car Crashes Dataset
## Project Summary:
- **Author** Derek Graves
- **Date** 18 Feb 2024
- **Purpose** This project will display the use of data analysis and data inspection as well as transformation techniques that can be used on data imported from the 'Car Crashes' data set. The 'Car Crashes' data set provides a glimpse into various factors surrounding these incidents, such as the number of crashes, the presence of alcohol-related crashes, and more. We will use this dataset to explore patterns, identify trends, and gain insights into the factors contributing to road accidents. Throughout this notebook, we'll use data visualization techniques using various Python libraries to uncover hidden relationships and tell a compelling story behind the statistics.

## Environment Setup
This section provides a setup of the project through creating a virtual environment and installing required packages. See the Project README for detailed step-by-step instructions.

### Import Dependencies

In [7]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

## Exploratory Data Analysis

### Step 1: Data Acquisition 
Load the Car Crashes dataset into a pandas DataFrame in order to inspect the first few rows of the frame. 

In [8]:
# Load the Iris dataset into DataFrame
df = sns.load_dataset('car_crashes')

# Inspect first rows of the DataFrame
print(df.head())

   total  speeding  alcohol  not_distracted  no_previous  ins_premium  \
0   18.8     7.332    5.640          18.048       15.040       784.55   
1   18.1     7.421    4.525          16.290       17.014      1053.48   
2   18.6     6.510    5.208          15.624       17.856       899.47   
3   22.4     4.032    5.824          21.056       21.280       827.34   
4   12.0     4.200    3.360          10.920       10.680       878.41   

   ins_losses abbrev  
0      145.08     AL  
1      133.93     AK  
2      110.35     AZ  
3      142.39     AR  
4      165.63     CA  


### Step 2: Initial Data Inspection
Display the first 10 rows of the DataFrame, check the shape, and display the data types of each column. 

In [9]:
# Display the first 10 rows of the Dataframe 

print(df.head(10))
print(df.shape)
print(df.dtypes)

   total  speeding  alcohol  not_distracted  no_previous  ins_premium  \
0   18.8     7.332    5.640          18.048       15.040       784.55   
1   18.1     7.421    4.525          16.290       17.014      1053.48   
2   18.6     6.510    5.208          15.624       17.856       899.47   
3   22.4     4.032    5.824          21.056       21.280       827.34   
4   12.0     4.200    3.360          10.920       10.680       878.41   
5   13.6     5.032    3.808          10.744       12.920       835.50   
6   10.8     4.968    3.888           9.396        8.856      1068.73   
7   16.2     6.156    4.860          14.094       16.038      1137.87   
8    5.9     2.006    1.593           5.900        5.900      1273.89   
9   17.9     3.759    5.191          16.468       16.826      1160.13   

   ins_losses abbrev  
0      145.08     AL  
1      133.93     AK  
2      110.35     AZ  
3      142.39     AR  
4      165.63     CA  
5      139.91     CO  
6      167.02     CT  
7      151.4

### Step 3: Initial Descriptive Statistics
Use the DataFrame describe() method to displays statistics for each column.

In [10]:
print(df.describe())

           total   speeding    alcohol  not_distracted  no_previous  \
count  51.000000  51.000000  51.000000       51.000000    51.000000   
mean   15.790196   4.998196   4.886784       13.573176    14.004882   
std     4.122002   2.017747   1.729133        4.508977     3.764672   
min     5.900000   1.792000   1.593000        1.760000     5.900000   
25%    12.750000   3.766500   3.894000       10.478000    11.348000   
50%    15.600000   4.608000   4.554000       13.857000    13.775000   
75%    18.500000   6.439000   5.604000       16.140000    16.755000   
max    23.900000   9.450000  10.038000       23.661000    21.280000   

       ins_premium  ins_losses  
count    51.000000   51.000000  
mean    886.957647  134.493137  
std     178.296285   24.835922  
min     641.960000   82.750000  
25%     768.430000  114.645000  
50%     858.970000  136.050000  
75%    1007.945000  151.870000  
max    1301.520000  194.780000  
