# Aviation Accidents Analysis

## 1. Overview

A company I am working with is planning to diversify its portfolio by exploring new areas, particularly the aviation industry. The leadership is interested in the risks associated with aircraft operations, mainly safety and reliability. It is on that background that as a senior data scientist in the company that the management tusked me with the responsibility of carrying out data analysis to assist the management in making informed investment decisions to determine which aircraft presents the lowest operational risk.


## 2. Business Understanding
To find out the safest and most reliable aircraft for the company to purchase based on an analysis of accident and incident data collected over the years. Therefore, the company wants to:
* Minimize accident frequency
* low fetality count


## 3. Data Science Understanding
Analyze accident data collected over the years to establish which aircraft has the lowest risk by using risk indicators such as accident frequency, fatality rate, and severity of injuries.
This will involve the following analysis:
* Data exploration
* Data Cleaning
* Grouping aircraft by make and model
* Calculating the safety matrix per aircraft type
* Visualizing patterns of high-risk and low-risk aircrafts


In [1]:
# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns



In [2]:
# Loading accident data to the notebook
aviation_data = pd.read_csv("../Data/AviationData.csv", index_col = 0, encoding = "latin1", low_memory = False)

### 3.1 exploring data

In [3]:
aviation_data.head() # checking the first five rows of the dataframe

Unnamed: 0_level_0,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,Injury.Severity,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
Event.Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,Fatal(2),...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,Fatal(4),...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,Fatal(3),...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,Fatal(2),...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,Fatal(1),...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980


In [4]:
aviation_data.tail() # chwcking the last five rows of the dataframe

Unnamed: 0_level_0,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,Injury.Severity,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
Event.Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,,,,,Minor,...,Personal,,0.0,1.0,0.0,0.0,,,,29-12-2022
20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,,,,,,...,,,0.0,0.0,0.0,0.0,,,,
20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,341525N,1112021W,PAN,PAYSON,Non-Fatal,...,Personal,,0.0,0.0,0.0,1.0,VMC,,,27-12-2022
20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,,,,,,...,Personal,MC CESSNA 210N LLC,0.0,0.0,0.0,0.0,,,,
20221230106513,Accident,ERA23LA097,2022-12-29,"Athens, GA",United States,,,,,Minor,...,Personal,,0.0,1.0,0.0,1.0,,,,30-12-2022


In [5]:
aviation_data.info() # getting concise information about the dataframe

<class 'pandas.core.frame.DataFrame'>
Index: 88889 entries, 20001218X45444 to 20221230106513
Data columns (total 30 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Investigation.Type      88889 non-null  object 
 1   Accident.Number         88889 non-null  object 
 2   Event.Date              88889 non-null  object 
 3   Location                88837 non-null  object 
 4   Country                 88663 non-null  object 
 5   Latitude                34382 non-null  object 
 6   Longitude               34373 non-null  object 
 7   Airport.Code            50132 non-null  object 
 8   Airport.Name            52704 non-null  object 
 9   Injury.Severity         87889 non-null  object 
 10  Aircraft.damage         85695 non-null  object 
 11  Aircraft.Category       32287 non-null  object 
 12  Registration.Number     87507 non-null  object 
 13  Make                    88826 non-null  object 
 14  Model                

All the columns have missing values except the first three.

In [6]:
aviation_data.shape # checking the dataframe dimensions, i.e number of rows and cloumns in the dataframe.

(88889, 30)

Dataset contains 88,889 rows and 30 columns.

## 4. Data Preparation

### 4.1 Checking for missing values

In [7]:
aviation_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 88889 entries, 20001218X45444 to 20221230106513
Data columns (total 30 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Investigation.Type      88889 non-null  object 
 1   Accident.Number         88889 non-null  object 
 2   Event.Date              88889 non-null  object 
 3   Location                88837 non-null  object 
 4   Country                 88663 non-null  object 
 5   Latitude                34382 non-null  object 
 6   Longitude               34373 non-null  object 
 7   Airport.Code            50132 non-null  object 
 8   Airport.Name            52704 non-null  object 
 9   Injury.Severity         87889 non-null  object 
 10  Aircraft.damage         85695 non-null  object 
 11  Aircraft.Category       32287 non-null  object 
 12  Registration.Number     87507 non-null  object 
 13  Make                    88826 non-null  object 
 14  Model                

In [8]:
missing_values_percentage= aviation_data.isnull().mean().sort_values(ascending = False) * 100
missing_values_percentage

Schedule                  85.845268
Air.carrier               81.271023
FAR.Description           63.974170
Aircraft.Category         63.677170
Longitude                 61.330423
Latitude                  61.320298
Airport.Code              43.601570
Airport.Name              40.708074
Broad.phase.of.flight     30.560587
Publication.Date          15.492356
Total.Serious.Injuries    14.073732
Total.Minor.Injuries      13.424608
Total.Fatal.Injuries      12.826109
Engine.Type                7.982990
Report.Status              7.181991
Purpose.of.flight          6.965991
Number.of.Engines          6.844491
Total.Uninjured            6.650992
Weather.Condition          5.053494
Aircraft.damage            3.593246
Registration.Number        1.554748
Injury.Severity            1.124999
Country                    0.254250
Amateur.Built              0.114750
Model                      0.103500
Make                       0.070875
Location                   0.058500
Accident.Number            0

In [9]:
# columns with more than 60 percent missing values
columns_to_drop = missing_values_percentage.head(6) 
columns_to_drop

Schedule             85.845268
Air.carrier          81.271023
FAR.Description      63.974170
Aircraft.Category    63.677170
Longitude            61.330423
Latitude             61.320298
dtype: float64

In [10]:
# Dropping columns that have more than 60 percent missing values
aviation_data = aviation_data.drop(columns= columns_to_drop.index)
aviation_data

Unnamed: 0_level_0,Investigation.Type,Accident.Number,Event.Date,Location,Country,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Registration.Number,...,Engine.Type,Purpose.of.flight,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
Event.Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,Fatal(2),Destroyed,NC6404,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,Fatal(4),Destroyed,N5069P,...,Reciprocating,Personal,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,,,Fatal(3),Destroyed,N5142R,...,Reciprocating,Personal,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,Fatal(2),Destroyed,N1168J,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,Fatal(1),Destroyed,N15NY,...,,Personal,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,,,Minor,,N1867H,...,,Personal,0.0,1.0,0.0,0.0,,,,29-12-2022
20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,,,,,N2895Z,...,,,0.0,0.0,0.0,0.0,,,,
20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,PAN,PAYSON,Non-Fatal,Substantial,N749PJ,...,,Personal,0.0,0.0,0.0,1.0,VMC,,,27-12-2022
20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,,,,,N210CU,...,,Personal,0.0,0.0,0.0,0.0,,,,


In [11]:
aviation_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 88889 entries, 20001218X45444 to 20221230106513
Data columns (total 24 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Investigation.Type      88889 non-null  object 
 1   Accident.Number         88889 non-null  object 
 2   Event.Date              88889 non-null  object 
 3   Location                88837 non-null  object 
 4   Country                 88663 non-null  object 
 5   Airport.Code            50132 non-null  object 
 6   Airport.Name            52704 non-null  object 
 7   Injury.Severity         87889 non-null  object 
 8   Aircraft.damage         85695 non-null  object 
 9   Registration.Number     87507 non-null  object 
 10  Make                    88826 non-null  object 
 11  Model                   88797 non-null  object 
 12  Amateur.Built           88787 non-null  object 
 13  Number.of.Engines       82805 non-null  float64
 14  Engine.Type          

### 4.2 Filling the missing values

In [12]:
categorical_columns = ["Investigation.Type", "Accident.Number", "Event.Date", "Location", "Country", "Airport.Code", "Airport.Name", "Injury.Severity", "Aircraft.damage", "Registration.Number", "Make", "Model", "Amateur.Built", "Purpose.of.flight", "Weather.Condition", "Broad.phase.of.flight", "Report.Status","Publication.Date"]
categorical_columns

['Investigation.Type',
 'Accident.Number',
 'Event.Date',
 'Location',
 'Country',
 'Airport.Code',
 'Airport.Name',
 'Injury.Severity',
 'Aircraft.damage',
 'Registration.Number',
 'Make',
 'Model',
 'Amateur.Built',
 'Purpose.of.flight',
 'Weather.Condition',
 'Broad.phase.of.flight',
 'Report.Status',
 'Publication.Date']

In [17]:
aviation_data[categorical_columns] = aviation_data[categorical_columns].fillna("unknown")
aviation_data

Unnamed: 0_level_0,Investigation.Type,Accident.Number,Event.Date,Location,Country,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Registration.Number,...,Engine.Type,Purpose.of.flight,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
Event.Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,unknown,unknown,Fatal(2),Destroyed,NC6404,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,unknown
20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,unknown,unknown,Fatal(4),Destroyed,N5069P,...,Reciprocating,Personal,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,unknown,unknown,Fatal(3),Destroyed,N5142R,...,Reciprocating,Personal,3.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,26-02-2007
20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,unknown,unknown,Fatal(2),Destroyed,N1168J,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,unknown,unknown,Fatal(1),Destroyed,N15NY,...,,Personal,1.0,2.0,0.0,0.0,VMC,Approach,Probable Cause,16-04-1980
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,unknown,unknown,Minor,unknown,N1867H,...,,Personal,0.0,1.0,0.0,0.0,unknown,unknown,unknown,29-12-2022
20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,unknown,unknown,unknown,unknown,N2895Z,...,,unknown,0.0,0.0,0.0,0.0,unknown,unknown,unknown,unknown
20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,PAN,PAYSON,Non-Fatal,Substantial,N749PJ,...,,Personal,0.0,0.0,0.0,1.0,VMC,unknown,unknown,27-12-2022
20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,unknown,unknown,unknown,unknown,N210CU,...,,Personal,0.0,0.0,0.0,0.0,unknown,unknown,unknown,unknown


In [18]:
aviation_data.sample(10)

Unnamed: 0_level_0,Investigation.Type,Accident.Number,Event.Date,Location,Country,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Registration.Number,...,Engine.Type,Purpose.of.flight,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
Event.Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20080304X00254,Incident,ENG08RA015,2008-01-24,"Kingston, Jamaica",Jamaica,unknown,unknown,unknown,unknown,6Y-JRG,...,,unknown,0.0,0.0,0.0,0.0,unknown,unknown,unknown,25-09-2020
20020613X00875,Accident,LAX02LA186,2002-06-07,"HENDERSON, NV",United States,HND,Henderson,Non-Fatal,Substantial,N49672,...,Reciprocating,Instructional,0.0,0.0,1.0,1.0,VMC,Landing,Probable Cause,25-11-2003
20080811X01206,Accident,DFW08WA201,2008-08-02,"Guadalajara, Mexico",Mexico,MMGL,Guadalajara Intl Airport,Non-Fatal,Destroyed,XB-KPB,...,,Unknown,0.0,0.0,0.0,6.0,VMC,unknown,unknown,25-09-2020
20020917X03715,Accident,LAX82FVD20,1982-08-04,"LIVERMORE, CA",United States,unknown,unknown,Non-Fatal,Substantial,N6976N,...,Reciprocating,Business,0.0,0.0,0.0,1.0,VMC,Climb,Probable Cause,04-08-1983
20001208X07157,Accident,FTW97LA058,1996-12-09,EAST CAMERON 71,unknown,unknown,unknown,Non-Fatal,Substantial,N390MA,...,Turbo Shaft,Unknown,0.0,0.0,0.0,2.0,VMC,Maneuvering,Probable Cause,29-08-1997
20001213X29196,Accident,MIA89LA232,1989-08-25,"EDWARDS, MS",United States,unknown,unknown,Non-Fatal,Substantial,N53327,...,Reciprocating,Aerial Application,0.0,0.0,0.0,1.0,VMC,Maneuvering,Probable Cause,30-09-1991
20001213X29292,Accident,ATL89FA214,1989-09-17,"GORDON, AL",United States,unknown,unknown,Fatal(3),Substantial,N7503J,...,Reciprocating,Personal,3.0,0.0,1.0,0.0,VMC,Cruise,Probable Cause,02-08-1990
20100513X45441,Accident,CEN09WA617,2009-05-31,"Strovikion, Greece",Greece,unknown,Airfield Kopaidas,Fatal,Substantial,OK-JUG19,...,Reciprocating,unknown,2.0,0.0,0.0,0.0,unknown,unknown,unknown,03-11-2020
20001211X14470,Accident,LAX92T#A04,1992-04-20,"FORT IRWIN, CA",United States,unknown,unknown,Non-Fatal,Substantial,N5736N,...,Unknown,Public Aircraft,0.0,0.0,1.0,0.0,VMC,unknown,Factual,05-08-1996
20120503X01852,Accident,CEN12LA267,2012-05-02,"Valley Falls, KS",United States,unknown,unknown,Non-Fatal,Substantial,N85RB,...,Turbo Shaft,Aerial Observation,0.0,3.0,0.0,0.0,VMC,unknown,The pilots inadequate compensation for the wi...,25-09-2020


In [14]:
numericals_columns = ["Total.Fatal.Injuries", "Total.Serious.Injuries", "Total.Minor.Injuries", "Total.Uninjured", "Number.of.Engines"]
numericals_columns

['Total.Fatal.Injuries',
 'Total.Serious.Injuries',
 'Total.Minor.Injuries',
 'Total.Uninjured',
 'Number.of.Engines']

In [15]:
aviation_data[numericals_columns] = aviation_data[numericals_columns].fillna(0)
aviation_data

Unnamed: 0_level_0,Investigation.Type,Accident.Number,Event.Date,Location,Country,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Registration.Number,...,Engine.Type,Purpose.of.flight,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
Event.Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,unknown,unknown,Fatal(2),Destroyed,NC6404,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,unknown
20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,unknown,unknown,Fatal(4),Destroyed,N5069P,...,Reciprocating,Personal,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,unknown,unknown,Fatal(3),Destroyed,N5142R,...,Reciprocating,Personal,3.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,26-02-2007
20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,unknown,unknown,Fatal(2),Destroyed,N1168J,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,unknown,unknown,Fatal(1),Destroyed,N15NY,...,,Personal,1.0,2.0,0.0,0.0,VMC,Approach,Probable Cause,16-04-1980
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,unknown,unknown,Minor,unknown,N1867H,...,,Personal,0.0,1.0,0.0,0.0,unknown,unknown,unknown,29-12-2022
20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,unknown,unknown,unknown,unknown,N2895Z,...,,unknown,0.0,0.0,0.0,0.0,unknown,unknown,unknown,unknown
20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,PAN,PAYSON,Non-Fatal,Substantial,N749PJ,...,,Personal,0.0,0.0,0.0,1.0,VMC,unknown,unknown,27-12-2022
20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,unknown,unknown,unknown,unknown,N210CU,...,,Personal,0.0,0.0,0.0,0.0,unknown,unknown,unknown,unknown


In [16]:
aviation_data.sample(10)

Unnamed: 0_level_0,Investigation.Type,Accident.Number,Event.Date,Location,Country,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Registration.Number,...,Engine.Type,Purpose.of.flight,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
Event.Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20001213X25058,Accident,DEN88LA077,1988-02-21,"GILA, NM",United States,unknown,unknown,Non-Fatal,Substantial,N2359E,...,Reciprocating,Personal,0.0,1.0,1.0,0.0,VMC,Maneuvering,Probable Cause,30-03-1989
20130911X90214,Accident,ERA13WA411,2013-09-09,"Valparaiso, Chile",Chile,unknown,Vina Del Mar,Fatal,Destroyed,CC-CNW,...,,unknown,2.0,0.0,0.0,0.0,unknown,unknown,"On September 9, 2013, about 0940 universal coo...",03-11-2020
20100913X14749,Accident,ANC10LA084,2010-09-11,"South Naknek, AK",United States,unknown,unknown,Non-Fatal,Substantial,N7414K,...,Reciprocating,Personal,0.0,0.0,0.0,1.0,VMC,unknown,The loss of engine power during cruise flight ...,25-09-2020
20080118X00073,Accident,SEA08LA061,2008-01-14,"San Francisco, CA",United States,KSFO,San Francisco International,Non-Fatal,Substantial,N705SK,...,Turbo Fan,unknown,0.0,0.0,0.0,61.0,VMC,unknown,The company tug operator of the other airplane...,25-09-2020
20001212X17733,Accident,ANC91FA128,1991-08-16,"RUBY, AK",United States,unknown,unknown,Fatal(3),Destroyed,N9026E,...,Reciprocating,Personal,3.0,0.0,0.0,0.0,VMC,Maneuvering,Probable Cause,23-07-1993
20090518X43152,Accident,ERA09LA293,2009-05-17,"Camden, NC",United States,unknown,unknown,Non-Fatal,Substantial,N92RG,...,Reciprocating,Personal,0.0,0.0,1.0,0.0,VMC,unknown,A loss of engine power due to an in-flight sep...,25-09-2020
20210518103095,Accident,CEN21LA222,2021-05-18,"Racine, WI",United States,RAC,BATTEN INTL,Non-Fatal,Destroyed,N521CT,...,,Instructional,0.0,1.0,0.0,0.0,VMC,unknown,unknown,16-06-2021
20030702X01010,Accident,LAX03WA222,2003-06-20,"Camden, Australia",Australia,unknown,unknown,Fatal(2),Destroyed,unknown,...,,Instructional,2.0,0.0,0.0,0.0,VMC,unknown,Foreign,02-07-2003
20060707X00889,Accident,CHI06CA156,2006-06-19,"COOPERSTOWN, ND",United States,unknown,unknown,Non-Fatal,Substantial,N9073G,...,Reciprocating,Personal,0.0,0.0,0.0,1.0,VMC,Landing,Probable Cause,03-10-2006
20211118104270,Accident,ERA22LA066,2021-11-12,"St. Petersburg, FL",United States,unknown,unknown,Non-Fatal,Substantial,N5862W,...,,Personal,0.0,1.0,0.0,0.0,VMC,unknown,unknown,unknown


### 4.1 Checking and handling duplicated values

## 5. Exploratory Data Analysis

## 6. Recommendations