# Aviation Risk and Investing Analysis
![Plane Lot](https://hips.hearstapps.com/hmg-prod/images/rear-view-silhouette-of-an-airplane-taking-off-at-royalty-free-image-1695239529.jpg)

## Business Understanding
This project analyizes 88,889 aviation accidents from National Transportation Safety Board from 1962 to 2023 for private and commercial airplanes. These accidents range in severity from fatal to uninjured passengers, we're  analyzing risk by type, injury/fatality rate and location to provide recommendations for the business on the aircraft with the lowest risk and safest investment.

# Data Understanding

The National Transportation Safety Board report is the most comprehensive dataset on aviation accidents with 88,889 instances recored from 1962 to 2023, ranging from domestic/internal flights, commercial vs private, location, weather conditions and injury statistics (number of fatal, serious, minor and uninjured passangers) for each incident are provided. 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
aviation_data  = pd.read_csv('data/AviationData.csv',encoding='latin-1',low_memory=False)
state_codes = pd.read_csv('data/USState_Codes.csv')

## Aviation Data
The aviation_data dataset contains 88,889 recorded aviation accidents from 1962 to 2023, ranging from uninjured incidents to fatal accidents. 

In [None]:
aviation_data.head()

In [None]:
aviation_data.head()

In [None]:
aviation_data.info()

In [None]:
aviation_data.describe()

In [None]:
state_codes.head()

In [None]:
state_codes.describe()

In [None]:
aviation_data['Investigation.Type'].value_counts()
# Accident: 85015
# Incident: 3874

In [None]:
aviation_data['Make'].value_counts()

In [None]:
aviation_data['Broad.phase.of.flight'].value_counts()

## Data Preperation and Merging

In [None]:
#Cleaning the date format of Event.Date to YYYY-MM-DD
aviation_data['Event.Date'] = pd.to_datetime(aviation_data['Event.Date'])
aviation_data['Event.Date'] = aviation_data['Event.Date'].dt.strftime('%Y-%m-%d')

In [None]:
# Create a State column for in the aviation_data to join on state_codes
avaiation_US = aviation_data[aviation_data['Country']=='United States']
aviation_data['State'] = avaiation_US['Location'].str[-2:]

In [None]:
aviation_data[aviation_data['Country']=='United States']

In [None]:
# Make column names easier to use (caused error's when rerunning cells)
# aviation_data.columns = aviation_data.columns.str.lower().str.replace(' ', '_')
# state_codes.columns = state_codes.columns.str.lower().str.replace(' ', '_')
print(aviation_data.columns)

#### Drop columns in aviation_data that are mostly null or not appliable to the risk analysis

- Latitude                34382 non-null   
- Longitude               34373 non-null  
- Aircraft.Category       32287 non-null   
- FAR.Description         32023 non-null   
- 2Schedule               12582 non-null   

In [None]:
null_columns = ['Latitude', 'Longitude','Aircraft.Category','FAR.Description','Schedule']
aviation_data = aviation_data.drop(columns=null_columns)

The injury columns are the primary metrics of the analysis that will help assess risk. We'll need to handle update null values with data points that will not sku the injury data

In [None]:
aviation_data['Total.Fatal.Injuries'].describe()
# There is a large outlier that sku the mean up, in most instances a fatality does not occur and the 
# median will be used to fill null data for Total.Fatal.Injuries
# aviation_data['Total.Fatal.Injuries'].fillna(aviation_data['Total.Fatal.Injuries'].median())

In [None]:
aviation_data['Total.Serious.Injuries'].describe()
# There is a large outlier that sku the mean up, in most instances a serious injuries does not occur and the 
# median will be used to fill null data for Total.Serious.Injuries
aviation_data['Total.Serious.Injuries'].fillna(aviation_data['Total.Serious.Injuries'].median())

In [None]:
aviation_data['Total.Minor.Injuries'].describe()
# There is a large outlier that sku the mean up, in most instances a Minor injuries does not occur and the 
# median will be used to fill null data for Total.Minor.Injuries
aviation_data['Total.Minor.Injuries'].fillna(aviation_data['Total.Minor.Injuries'].median())


In [None]:
aviation_data['Total.Uninjured'].describe()
# There is a large outlier that sku the mean up, in most instances a Uninjured does not occur and the 
# median will be used to fill null data for Total.Uninjured. There is a large standard deviation, meaning there is more 
#spread in the data. To remain consistent we're going to use the median.
aviation_data['Total.Uninjured'].fillna(aviation_data['Total.Uninjured'].median())

### Merging Data
Merging avaiation_data against the state_codes to pull in state names for accidents that occured in the United States. 

In [None]:
aviation_data.set_index('State', inplace=True)
state_codes.set_index('Abbreviation', inplace=True)

In [None]:
# aviation_accidents = aviation_data.join(state_codes, how='left')
aviation_accidents = pd.merge(aviation_data, state_codes,how='left', on=['State','abbreviation'])
# aviation_data.columns
# aviation_data
# state_codes

# Exploratory Data Analysis