# Aviation Accidents Analysis

You are part of a consulting firm that is tasked to do an analysis of commercial and passenger jet airline safety. The client (an airline/airplane insurer) is interested in knowing what types of aircraft (makes/models) exhibit low rates of total destruction and low likelihood of fatal or serious passenger injuries in the event of an accident. They are also interested in any general variables/conditions that might be at play. Your analysis will be based off of aviation accident data accumulated from the years 1948-2023. 

Our client is only interested in airplane makes/models that are professional builds and could potentially still be active. Assume a max lifetime of 40 years for a make/model retirement and make sure to filter your data accordingly (i.e. from 1983 onwards). They would also like separate recommendations for small aircraft vs. larger passenger models. **In addition, make sure that claims that you make are statistically robust and that you have enough samples when making comparisons between groups.**


In this summative assessment you will demonstrate your ability to:
- **Use Pandas to load, inspect, and clean the dataset appropriately.**
- **Transform relevant columns to create measures that address the problem at hand.**
- conduct EDA: visualization and statistical measures to systematically understand the structure of the data
- recommend a set of airplanes and makes conforming to the client's request and identify at least *two* factors contributing to airplane safety. You must provide supporting evidence (visuals, summary statistics, tables) for each claim you make.

### Make relevant library imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Data Loading and Inspection

### Load in data from the relevant directory and inspect the dataframe.
- inspect NaNs, datatypes, and summary statistics

In [6]:
# Load the data as aviation_df
aviation_df = pd.read_csv(
    "data/AviationData.csv",
    encoding="latin1",
    low_memory=False
)

aviation_shape = aviation_df.shape
aviation_shape

(88889, 31)

In [None]:
aviation_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88889 entries, 0 to 88888
Data columns (total 31 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event.Id                88889 non-null  object 
 1   Investigation.Type      88889 non-null  object 
 2   Accident.Number         88889 non-null  object 
 3   Event.Date              88889 non-null  object 
 4   Location                88837 non-null  object 
 5   Country                 88663 non-null  object 
 6   Latitude                34382 non-null  object 
 7   Longitude               34373 non-null  object 
 8   Airport.Code            50132 non-null  object 
 9   Airport.Name            52704 non-null  object 
 10  Injury.Severity         87889 non-null  object 
 11  Aircraft.damage         85695 non-null  object 
 12  Aircraft.Category       32287 non-null  object 
 13  Registration.Number     87507 non-null  object 
 14  Make                    88826 non-null

In [15]:
for col in aviation_df.columns:
    print(col)

Event.Id
Investigation.Type
Accident.Number
Event.Date
Location
Country
Latitude
Longitude
Airport.Code
Airport.Name
Injury.Severity
Aircraft.damage
Aircraft.Category
Registration.Number
Make
Model
Amateur.Built
Number.of.Engines
Engine.Type
FAR.Description
Schedule
Purpose.of.flight
Air.carrier
Total.Fatal.Injuries
Total.Serious.Injuries
Total.Minor.Injuries
Total.Uninjured
Weather.Condition
Broad.phase.of.flight
Report.Status
Publication.Date


In [17]:
pd.set_option('display.max_columns', None)
aviation_df.tail()

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Aircraft.Category,Registration.Number,Make,Model,Amateur.Built,Number.of.Engines,Engine.Type,FAR.Description,Schedule,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
88884,20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,,,,,Minor,,,N1867H,PIPER,PA-28-151,No,,,91.0,,Personal,,0.0,1.0,0.0,0.0,,,,29-12-2022
88885,20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,,,,,,,,N2895Z,BELLANCA,7ECA,No,,,,,,,0.0,0.0,0.0,0.0,,,,
88886,20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,341525N,1112021W,PAN,PAYSON,Non-Fatal,Substantial,Airplane,N749PJ,AMERICAN CHAMPION AIRCRAFT,8GCBC,No,1.0,,91.0,,Personal,,0.0,0.0,0.0,1.0,VMC,,,27-12-2022
88887,20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,,,,,,,,N210CU,CESSNA,210N,No,,,91.0,,Personal,MC CESSNA 210N LLC,0.0,0.0,0.0,0.0,,,,
88888,20221230106513,Accident,ERA23LA097,2022-12-29,"Athens, GA",United States,,,,,Minor,,,N9026P,PIPER,PA-24-260,No,,,91.0,,Personal,,0.0,1.0,0.0,1.0,,,,30-12-2022


In [9]:
aviation_df.isna().mean().sort_values(ascending=False)

Schedule                  0.858453
Air.carrier               0.812710
FAR.Description           0.639742
Aircraft.Category         0.636772
Longitude                 0.613304
Latitude                  0.613203
Airport.Code              0.436016
Airport.Name              0.407081
Broad.phase.of.flight     0.305606
Publication.Date          0.154924
Total.Serious.Injuries    0.140737
Total.Minor.Injuries      0.134246
Total.Fatal.Injuries      0.128261
Engine.Type               0.079830
Report.Status             0.071820
Purpose.of.flight         0.069660
Number.of.Engines         0.068445
Total.Uninjured           0.066510
Weather.Condition         0.050535
Aircraft.damage           0.035932
Registration.Number       0.015547
Injury.Severity           0.011250
Country                   0.002542
Amateur.Built             0.001147
Model                     0.001035
Make                      0.000709
Location                  0.000585
Investigation.Type        0.000000
Event.Date          

## Data Cleaning

### Filtering aircrafts and events

We want to filter the dataset to include aircraft that the client is interested in an analysis of:
- inspect relevant columns
- figure out any reasonable imputations
- filter the dataset

In [None]:

#Convert Event.Date to datetime
aviation_df['Event.Date'] = pd.to_datetime(
    aviation_df['Event.Date'], errors="coerce"
)

In [None]:
#Add an Event.Year column derived from Event.Date
aviation_df['Event.Year'] = aviation_df['Event.Date'].dt.year
aviation_df.head() 

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Aircraft.Category,Registration.Number,Make,Model,Amateur.Built,Number.of.Engines,Engine.Type,FAR.Description,Schedule,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date,Event.Year
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,Fatal(2),Destroyed,,NC6404,Stinson,108-3,No,1.0,Reciprocating,,,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,,1948
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,Fatal(4),Destroyed,,N5069P,Piper,PA24-180,No,1.0,Reciprocating,,,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996,1962
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,Fatal(3),Destroyed,,N5142R,Cessna,172M,No,1.0,Reciprocating,,,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007,1974
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,Fatal(2),Destroyed,,N1168J,Rockwell,112,No,1.0,Reciprocating,,,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000,1977
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,Fatal(1),Destroyed,,N15NY,Cessna,501,No,,,,,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980,1979


In [None]:
#Filter out events before 1983
aviation_filter_year_df = aviation_df[aviation_df['Event.Year'] >= 1983]
aviation_filter_year_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 85289 entries, 3600 to 88888
Data columns (total 32 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Event.Id                85289 non-null  object        
 1   Investigation.Type      85289 non-null  object        
 2   Accident.Number         85289 non-null  object        
 3   Event.Date              85289 non-null  datetime64[ns]
 4   Location                85237 non-null  object        
 5   Country                 85073 non-null  object        
 6   Latitude                34379 non-null  object        
 7   Longitude               34370 non-null  object        
 8   Airport.Code            48435 non-null  object        
 9   Airport.Name            50514 non-null  object        
 10  Injury.Severity         84289 non-null  object        
 11  Aircraft.damage         82151 non-null  object        
 12  Aircraft.Category       28723 non-null  object  

In [34]:
#Filter out Amateur Built aircrafts
aviation_filtered_df = aviation_filter_year_df[aviation_filter_year_df['Amateur.Built'] == "No"].copy()
aviation_filtered_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 76960 entries, 3600 to 88888
Data columns (total 32 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Event.Id                76960 non-null  object        
 1   Investigation.Type      76960 non-null  object        
 2   Accident.Number         76960 non-null  object        
 3   Event.Date              76960 non-null  datetime64[ns]
 4   Location                76913 non-null  object        
 5   Country                 76750 non-null  object        
 6   Latitude                30167 non-null  object        
 7   Longitude               30161 non-null  object        
 8   Airport.Code            43375 non-null  object        
 9   Airport.Name            45285 non-null  object        
 10  Injury.Severity         75961 non-null  object        
 11  Aircraft.damage         73868 non-null  object        
 12  Aircraft.Category       25405 non-null  object  

### Cleaning and constructing Key Measurables

Injuries and robustness to destruction are a key interest point for the client. Clean and impute relevant columns and then create derived fields that best quantifies what the client wishes to track. **Use commenting or markdown to explain any cleaning assumptions as well as any derived columns you create.**

**Construct metric for fatal/serious injuries**

*Hint:* Estimate the total number of passengers on each flight. The likelihood of serious / fatal injury can be estimated as a fraction from this.

In [36]:
#Fill missing injury counts
#Assumption: If Total.Fatal.Injuries is missing, it likely means 0 injuries reported
aviation_filtered_df["Total.Fatal.Injuries"] = aviation_filtered_df["Total.Fatal.Injuries"].fillna(0)
aviation_filtered_df["Total.Serious.Injuries"] = aviation_filtered_df["Total.Serious.Injuries"].fillna(0)
aviation_filtered_df["Total.Minor.Injuries"] = aviation_filtered_df["Total.Minor.Injuries"].fillna(0)
aviation_filtered_df["Total.Uninjured"] = aviation_filtered_df["Total.Uninjured"].fillna(0)

aviation_filtered_df[['Event.Id','Total.Fatal.Injuries','Total.Serious.Injuries','Total.Minor.Injuries','Total.Uninjured']].info()

<class 'pandas.core.frame.DataFrame'>
Index: 76960 entries, 3600 to 88888
Data columns (total 5 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event.Id                76960 non-null  object 
 1   Total.Fatal.Injuries    76960 non-null  float64
 2   Total.Serious.Injuries  76960 non-null  float64
 3   Total.Minor.Injuries    76960 non-null  float64
 4   Total.Uninjured         76960 non-null  float64
dtypes: float64(4), object(1)
memory usage: 3.5+ MB


In [38]:
#Estimate total passengers on each flight
#Total Passengers = Fatal + Serious + Minor + Uninjured
aviation_filtered_df['Total.Passengers'] = (
    aviation_filtered_df['Total.Fatal.Injuries'] +
    aviation_filtered_df['Total.Serious.Injuries'] +
    aviation_filtered_df['Total.Minor.Injuries'] +
    aviation_filtered_df['Total.Uninjured']
)

aviation_filtered_df.tail(10)

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Aircraft.Category,Registration.Number,Make,Model,Amateur.Built,Number.of.Engines,Engine.Type,FAR.Description,Schedule,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date,Event.Year,Total.Passengers
88879,20221219106472,Accident,DCA23LA096,2022-12-18,"Kahului, HI",United States,,,,,,,,N393HA,AIRBUS,A330-243,No,,,121.0,SCHD,,HAWAIIAN AIRLINES INC,0.0,0.0,0.0,0.0,,,,,2022,0.0
88880,20221219106477,Accident,WPR23LA071,2022-12-18,"San Manual, AZ",United States,,,,,Non-Fatal,,,N4144P,PIPER,PA28,No,,,91.0,,Personal,Chandler Air Service,0.0,0.0,0.0,3.0,,,,20-12-2022,2022,3.0
88881,20221221106483,Accident,CEN23LA067,2022-12-21,"Auburn Hills, MI",United States,,,,,Minor,,,N8786U,CESSNA,172F,No,,,91.0,NSCH,Personal,Pilot,0.0,1.0,0.0,0.0,,,,22-12-2022,2022,1.0
88882,20221222106486,Accident,CEN23LA068,2022-12-21,"Reserve, LA",United States,,,,,Minor,,,N321GD,GRUMMAN AMERICAN AVN. CORP.,AA-5B,No,,,91.0,,Instructional,,0.0,1.0,0.0,1.0,,,,27-12-2022,2022,2.0
88883,20221228106502,Accident,GAA23WA046,2022-12-22,"Brasnorte,",Brazil,,,,,Fatal,,,PP-IRC,AIR TRACTOR,AT502,No,,,,,,,1.0,0.0,0.0,0.0,,,,28-12-2022,2022,1.0
88884,20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,,,,,Minor,,,N1867H,PIPER,PA-28-151,No,,,91.0,,Personal,,0.0,1.0,0.0,0.0,,,,29-12-2022,2022,1.0
88885,20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,,,,,,,,N2895Z,BELLANCA,7ECA,No,,,,,,,0.0,0.0,0.0,0.0,,,,,2022,0.0
88886,20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,341525N,1112021W,PAN,PAYSON,Non-Fatal,Substantial,Airplane,N749PJ,AMERICAN CHAMPION AIRCRAFT,8GCBC,No,1.0,,91.0,,Personal,,0.0,0.0,0.0,1.0,VMC,,,27-12-2022,2022,1.0
88887,20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,,,,,,,,N210CU,CESSNA,210N,No,,,91.0,,Personal,MC CESSNA 210N LLC,0.0,0.0,0.0,0.0,,,,,2022,0.0
88888,20221230106513,Accident,ERA23LA097,2022-12-29,"Athens, GA",United States,,,,,Minor,,,N9026P,PIPER,PA-24-260,No,,,91.0,,Personal,,0.0,1.0,0.0,1.0,,,,30-12-2022,2022,2.0


In [None]:
#Compute the fraction of passengers fatally or seriously injured
#Assumption: If Total.Passengers = 0, we set the fraction to NaN to avoid divide-by-zero
aviation_filtered_df['Frac.Fatal.Serious'] = (
    (aviation_filtered_df['Total.Fatal.Injuries'] + aviation_filtered_df['Total.Serious.Injuries']) /
    aviation_filtered_df['Total.Passengers']
).replace([np.inf, -np.inf], np.nan)
aviation_filtered_df[['Event.Id','Total.Fatal.Injuries','Total.Serious.Injuries','Total.Minor.Injuries','Total.Uninjured','Total.Passengers','Frac.Fatal.Serious']].info()

<class 'pandas.core.frame.DataFrame'>
Index: 76960 entries, 3600 to 88888
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event.Id                76960 non-null  object 
 1   Total.Fatal.Injuries    76960 non-null  float64
 2   Total.Serious.Injuries  76960 non-null  float64
 3   Total.Minor.Injuries    76960 non-null  float64
 4   Total.Uninjured         76960 non-null  float64
 5   Total.Passengers        76960 non-null  float64
 6   Frac.Fatal.Serious      75701 non-null  float64
dtypes: float64(6), object(1)
memory usage: 4.7+ MB


In [41]:
## Create binary flags for fata, serious, or destroyed
aviation_filtered_df['fatal_accident'] = aviation_filtered_df['Total.Fatal.Injuries'] > 0
aviation_filtered_df['serious_accident'] = aviation_filtered_df['Total.Serious.Injuries'] > 0
aviation_filtered_df['fatal_or_serious'] = aviation_filtered_df['fatal_accident'] | aviation_filtered_df['serious_accident']

aviation_filtered_df[[
    "Total.Passengers", "Total.Fatal.Injuries",
    "Total.Serious.Injuries", "Frac.Fatal.Serious", "fatal_or_serious"
]].head(10)

Unnamed: 0,Total.Passengers,Total.Fatal.Injuries,Total.Serious.Injuries,Frac.Fatal.Serious,fatal_or_serious
3600,2.0,0.0,1.0,0.5,True
3601,4.0,0.0,0.0,0.0,False
3602,2.0,0.0,0.0,0.0,False
3603,1.0,0.0,0.0,0.0,False
3604,2.0,0.0,0.0,0.0,False
3605,2.0,0.0,0.0,0.0,False
3606,2.0,0.0,1.0,0.5,True
3607,4.0,0.0,0.0,0.0,False
3608,2.0,2.0,0.0,1.0,True
3609,3.0,3.0,0.0,1.0,True


**Aircraft.Damage**
- identify and execute any cleaning tasks
- construct a derived column tracking whether an aircraft was destroyed or not.

In [None]:
# See unique values in column
aviation_filtered_df['Aircraft.damage'].value_counts(dropna=False)

Aircraft.damage
Substantial    55689
Destroyed      15469
NaN             3092
Minor           2594
Unknown          116
Name: count, dtype: int64

In [43]:
# Fill NaN to 'Unknown'
aviation_filtered_df['Aircraft.damage'] = aviation_filtered_df['Aircraft.damage'].fillna('Unknown')
aviation_filtered_df['Aircraft.damage'].value_counts(dropna=False)

Aircraft.damage
Substantial    55689
Destroyed      15469
Unknown         3208
Minor           2594
Name: count, dtype: int64

In [44]:
## Create a binary flag: True if Destroyed, False otherwise
aviation_filtered_df['destroyed'] = aviation_filtered_df['Aircraft.damage'] == 'Destroyed'
aviation_filtered_df[['Aircraft.damage','destroyed']].value_counts()

Aircraft.damage  destroyed
Substantial      False        55689
Destroyed        True         15469
Unknown          False         3208
Minor            False         2594
Name: count, dtype: int64

### Investigate the *Make* column
- Identify cleaning tasks here
- List cleaning tasks clearly in markdown
- Execute the cleaning tasks
- For your analysis, keep Makes with a reasonable number (you can put the threshold at 50 though lower could work as well)

### Inspect the Make column
- Look for:
    - Typos / inconsistent capitalization
    - Missing values (NaN)
    - Rare makes with very few accidents

In [46]:
#Check unique makes and top occurences
aviation_filtered_df['Make'].value_counts(dropna=False).head(20)

Make
Cessna               20818
Piper                11260
CESSNA                4922
Beech                 4052
PIPER                 2839
Bell                  2006
Boeing                1498
BOEING                1151
BEECH                 1042
Mooney                1029
Grumman                987
Robinson               914
Bellanca               819
Hughes                 740
Schweizer              609
BELL                   588
Air Tractor            577
Mcdonnell Douglas      511
Aeronca                458
Maule                  427
Name: count, dtype: int64

### Identify Cleaning Tasks
Proposed cleaning tasks:
1. Standardize capitalization
    - Convert all entries to title case (boeing to Boeing)
2. Remove missing values
    - Remove rows where Make is NaN since we cannot assign safety metrics to unknown aircraft
3. Remove rare Makes
    - Deep only Makes with >= 50 occurrences to ensure statistical robustness

In [52]:
# Standardize capitalization
aviation_filtered_df['Make'] = aviation_filtered_df['Make'].str.title()
aviation_filtered_df['Make'].value_counts().head(30)

Make
Cessna               25740
Piper                14099
Beech                 5094
Boeing                2649
Bell                  2594
Mooney                1271
Robinson              1197
Grumman               1064
Bellanca               978
Hughes                 877
Schweizer              753
Air Tractor            673
Aeronca                607
Mcdonnell Douglas      593
Maule                  571
Champion               504
Stinson                421
Aero Commander         411
De Havilland           405
Luscombe               390
North American         370
Taylorcraft            366
Aerospatiale           360
Rockwell               337
Hiller                 329
Airbus                 289
Enstrom                281
Douglas                262
Embraer                236
Ayres                  231
Name: count, dtype: int64

In [54]:
# Drop rows with missing Make
aviation_filtered_df = aviation_filtered_df.dropna(subset=['Make'])
aviation_filtered_df[['Event.Id','Make']].info()

<class 'pandas.core.frame.DataFrame'>
Index: 76914 entries, 3600 to 88888
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Event.Id  76914 non-null  object
 1   Make      76914 non-null  object
dtypes: object(2)
memory usage: 1.8+ MB


In [55]:
# Filter to makes with at least 50 occurences
make_counts = aviation_filtered_df['Make'].value_counts()
valid_makes = make_counts[make_counts >= 50].index
print(valid_makes)

Index(['Cessna', 'Piper', 'Beech', 'Boeing', 'Bell', 'Mooney', 'Robinson',
       'Grumman', 'Bellanca', 'Hughes', 'Schweizer', 'Air Tractor', 'Aeronca',
       'Mcdonnell Douglas', 'Maule', 'Champion', 'Stinson', 'Aero Commander',
       'De Havilland', 'Luscombe', 'North American', 'Taylorcraft',
       'Aerospatiale', 'Rockwell', 'Hiller', 'Airbus', 'Enstrom', 'Douglas',
       'Embraer', 'Ayres', 'Robinson Helicopter', 'Sikorsky',
       'Cirrus Design Corp', 'Air Tractor Inc', 'Eurocopter',
       'Grumman American', 'Robinson Helicopter Company', 'Swearingen',
       'Airbus Industrie', 'Ercoupe (Eng & Research Corp.)', 'Fairchild',
       'Cirrus', 'Aviat', 'Waco', 'Balloon Works', 'Lake', 'Schleicher', 'Let',
       'Mitsubishi', 'Learjet', 'Grumman-Schweizer', 'Socata', 'Burkhart Grob',
       'Lockheed', 'Ryan', 'Helio', 'Dehavilland', 'Globe', 'Schempp-Hirth',
       'Aerostar', 'Wsk Pzl Mielec', 'British Aerospace', 'Weatherly', 'Pitts',
       'Gulfstream', 'Raven', 'Aviat

### Cleaning other columns
- there are other columns containing data that might be related to the outcome of an accident. We list a few here:
- Engine.Type
- Weather.Condition
- Number.of.Engines
- Purpose.of.flight
- Broad.phase.of.flight

Inspect and identify potential cleaning tasks in each of the above columns. Execute those cleaning tasks. 

**Note**: You do not necessarily need to impute or drop NaNs here.

### Column Removal
- inspect the dataframe and drop any columns that have too many NaNs

### Save DataFrame to csv
- its generally useful to save data to file/server after its in a sufficiently cleaned or intermediate state
- the data can then be loaded directly in another notebook for further analysis
- this helps keep your notebooks and workflow readable, clean and modularized