# AVIATION RISK ASSESSMENT ANALYSIS(PHASE 1 PROJECT)

##  Project Overview

The aim of this project is to analyze aviation accident data to identify the lowest-risk aircraft for potential purchase by our company. We shall involve data cleaning, imputation, analysis and visualization, culminating in actionable insights and recommendations for business stakeholders.




##  Business Understanding

Our company is expanding into the aviation industry and requires to find out about potential risks linked with different kind of aircraft. Some of the key business questions to ask are:
* What types of aircraft have the lowest risk of accidents?
* What types of aircraft has the highest number of fatalities?
* Which factors contribute most significantly to aviation accidents?

The stakeholders include:
* Head of Aviation division 
* Company Executives

##  Data Understanding
### Data Source and Description
The dataset is sourced from the National Transportation Safety Board covering aviation incidents from 1962 to 2023.
It is contains over 80,000 aviation accident records detailing  _the date of their event,aircraft type, severity of the accident, number of casualties and fatalities and more_.
The data is stored in a `.csvfile` named `AviationData.csv`.


##  Data Preparation


We shall first import relevant libraries needed to work on the dataset.

In [960]:
# Importing libraries we will work with

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Next step will be loading the data.

In [961]:
# Load data
df = pd.read_csv('AviationData.csv', encoding='ISO-8859-1')

  df = pd.read_csv('AviationData.csv', encoding='ISO-8859-1')


In [962]:
# Checking the dataset content
df

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88884,20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,,,,,...,Personal,,0.0,1.0,0.0,0.0,,,,29-12-2022
88885,20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,,,,,...,,,0.0,0.0,0.0,0.0,,,,
88886,20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,341525N,1112021W,PAN,PAYSON,...,Personal,,0.0,0.0,0.0,1.0,VMC,,,27-12-2022
88887,20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,,,,,...,Personal,MC CESSNA 210N LLC,0.0,0.0,0.0,0.0,,,,


The dataset contains 88889 rows and 31 columns.

In [963]:
df.head()

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980


Below we will see what columns are contained in the dataset.

In [964]:
columns = df.columns
columns

Index(['Event.Id', 'Investigation.Type', 'Accident.Number', 'Event.Date',
       'Location', 'Country', 'Latitude', 'Longitude', 'Airport.Code',
       'Airport.Name', 'Injury.Severity', 'Aircraft.damage',
       'Aircraft.Category', 'Registration.Number', 'Make', 'Model',
       'Amateur.Built', 'Number.of.Engines', 'Engine.Type', 'FAR.Description',
       'Schedule', 'Purpose.of.flight', 'Air.carrier', 'Total.Fatal.Injuries',
       'Total.Serious.Injuries', 'Total.Minor.Injuries', 'Total.Uninjured',
       'Weather.Condition', 'Broad.phase.of.flight', 'Report.Status',
       'Publication.Date'],
      dtype='object')

In [965]:
# Shape of the dataset
df.shape

(88889, 31)

In the cell below we use the `describe` function to carry out a statistical overview of our data, specifically those columns containing numerical values.

In [966]:
df.describe()

Unnamed: 0,Number.of.Engines,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured
count,82805.0,77488.0,76379.0,76956.0,82977.0
mean,1.146585,0.647855,0.279881,0.357061,5.32544
std,0.44651,5.48596,1.544084,2.235625,27.913634
min,0.0,0.0,0.0,0.0,0.0
25%,1.0,0.0,0.0,0.0,0.0
50%,1.0,0.0,0.0,0.0,1.0
75%,1.0,0.0,0.0,0.0,2.0
max,8.0,349.0,161.0,380.0,699.0


In [967]:
# Checking information in the dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88889 entries, 0 to 88888
Data columns (total 31 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event.Id                88889 non-null  object 
 1   Investigation.Type      88889 non-null  object 
 2   Accident.Number         88889 non-null  object 
 3   Event.Date              88889 non-null  object 
 4   Location                88837 non-null  object 
 5   Country                 88663 non-null  object 
 6   Latitude                34382 non-null  object 
 7   Longitude               34373 non-null  object 
 8   Airport.Code            50132 non-null  object 
 9   Airport.Name            52704 non-null  object 
 10  Injury.Severity         87889 non-null  object 
 11  Aircraft.damage         85695 non-null  object 
 12  Aircraft.Category       32287 non-null  object 
 13  Registration.Number     87507 non-null  object 
 14  Make                    88826 non-null

## Data Cleaning
Our next step will be preparing our data for cleaning. Identifying various things such as mixed data types,missing values, duplicates will be crucial in this phase.

#### Handling mixed data types
In the cell below we will create a function to check for columns with mixed types of data and their unique types.

In [968]:
# Checking for columns with mixed data types

# Function to identify mixed data types in a column
def check_mixed_types(column):
    types = column.apply(type).unique()
    return len(types) > 1

# Applying the function to each column
mixed_type_columns = {col: df[col].apply(type).unique() for col in df.columns if check_mixed_types(df[col])}

# Displaying columns with mixed data types and their unique types
for col, types in mixed_type_columns.items():
    print(f"Column '{col}' has mixed types: {types}")

Column 'Location' has mixed types: [<class 'str'> <class 'float'>]
Column 'Country' has mixed types: [<class 'str'> <class 'float'>]
Column 'Latitude' has mixed types: [<class 'float'> <class 'str'>]
Column 'Longitude' has mixed types: [<class 'float'> <class 'str'>]
Column 'Airport.Code' has mixed types: [<class 'float'> <class 'str'>]
Column 'Airport.Name' has mixed types: [<class 'float'> <class 'str'>]
Column 'Injury.Severity' has mixed types: [<class 'str'> <class 'float'>]
Column 'Aircraft.damage' has mixed types: [<class 'str'> <class 'float'>]
Column 'Aircraft.Category' has mixed types: [<class 'float'> <class 'str'>]
Column 'Registration.Number' has mixed types: [<class 'str'> <class 'float'>]
Column 'Make' has mixed types: [<class 'str'> <class 'float'>]
Column 'Model' has mixed types: [<class 'str'> <class 'float'>]
Column 'Amateur.Built' has mixed types: [<class 'str'> <class 'float'>]
Column 'Engine.Type' has mixed types: [<class 'str'> <class 'float'>]
Column 'FAR.Descrip

Identifying and converting columns with mixed data types to consistent data types is our next step thus in the cells bellow we shall convert numeric columns to numerical values and categorical columns to categories(string).

In [969]:
numeric_cols = ['Number.of.Engines', 'Total.Fatal.Injuries', 
                'Total.Serious.Injuries', 'Total.Minor.Injuries', 
                'Total.Uninjured']
for col in numeric_cols:
    df[col] = pd.to_numeric(df[col], errors = 'coerce')

In [970]:
categorical_cols = ['Injury.Severity', 'Aircraft.damage', 'Make', 'Model',
                     'Amateur.Built', 'Engine.Type', 'Purpose.of.flight',
                    'Weather.Condition', 'Broad.phase.of.flight', 'Report.Status']
for col in categorical_cols:
    df[col] = df[col].astype(str)
    df[col] = df[col].astype('category')

* We will then check for duplicates in the `Event.Id` column as it is our unique identifier(primary key) in the dataset.

In [971]:
# Creating a variable to find out how many duplicates are in Event.Id 
duplicated_event_id = df['Event.Id'].duplicated().sum()

print(f"There are {duplicated_event_id} duplicated event id's.")

There are 938 duplicated event id's.


In [972]:
# Dropping duplicates in Event.Id
df = df.drop_duplicates(subset = 'Event.Id')
df

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88884,20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,,,,,...,Personal,,0.0,1.0,0.0,0.0,,,,29-12-2022
88885,20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,,,,,...,,,0.0,0.0,0.0,0.0,,,,
88886,20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,341525N,1112021W,PAN,PAYSON,...,Personal,,0.0,0.0,0.0,1.0,VMC,,,27-12-2022
88887,20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,,,,,...,Personal,MC CESSNA 210N LLC,0.0,0.0,0.0,0.0,,,,


In [973]:
# Confirming if there are any more duplicates in Event.Id
df['Event.Id'].duplicated().any()

False

#### Handling missing values
We will be dropping columns with high percentages of missing values and in some replacing them with more central data values.

In [974]:
# Checking count for missing values in percentages
missing_vals = df.isnull().sum()

missing_vals_percentage = (missing_vals/len(df))*100
missing_vals_percentage

Event.Id                   0.000000
Investigation.Type         0.000000
Accident.Number            0.000000
Event.Date                 0.000000
Location                   0.059124
Country                    0.252413
Latitude                  61.101068
Longitude                 61.111301
Airport.Code              43.736853
Airport.Name              40.840923
Injury.Severity            0.000000
Aircraft.damage            0.000000
Aircraft.Category         63.410308
Registration.Number        1.534946
Make                       0.000000
Model                      0.000000
Amateur.Built              0.000000
Number.of.Engines          6.852679
Engine.Type                0.000000
FAR.Description           63.712749
Schedule                  85.946720
Purpose.of.flight          0.000000
Air.carrier               81.202033
Total.Fatal.Injuries      12.810542
Total.Serious.Injuries    14.010074
Total.Minor.Injuries      13.371082
Total.Uninjured            6.666212
Weather.Condition          0

Columns such as `Latitude, Longitude, Airport.Code, Airport.Name, Aircraft.Category, FAR.Description, Schedule, Air.carrier` have a high percentage of missing values.

The best way to deal with this is will be by dropping such columns as they are rendered quite ineffective in our dataset.

In [975]:
# # Dropping columns with high percentages of missing values(preferably over 40%)
df = df.drop(columns=['Latitude', 'Longitude', 'Airport.Code', 
                      'Airport.Name', 'Aircraft.Category', 'FAR.Description', 
                      'Schedule', 'Air.carrier', ])
df

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Injury.Severity,Aircraft.damage,Registration.Number,Make,...,Engine.Type,Purpose.of.flight,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,Fatal(2),Destroyed,NC6404,Stinson,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,Fatal(4),Destroyed,N5069P,Piper,...,Reciprocating,Personal,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,Fatal(3),Destroyed,N5142R,Cessna,...,Reciprocating,Personal,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,Fatal(2),Destroyed,N1168J,Rockwell,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,Fatal(1),Destroyed,N15NY,Cessna,...,,Personal,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88884,20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,Minor,,N1867H,PIPER,...,,Personal,0.0,1.0,0.0,0.0,,,,29-12-2022
88885,20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,,,N2895Z,BELLANCA,...,,,0.0,0.0,0.0,0.0,,,,
88886,20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,Non-Fatal,Substantial,N749PJ,AMERICAN CHAMPION AIRCRAFT,...,,Personal,0.0,0.0,0.0,1.0,VMC,,,27-12-2022
88887,20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,,,N210CU,CESSNA,...,,Personal,0.0,0.0,0.0,0.0,,,,


Now we have 23 columns from the previous 31 after dropping a few.

In the cells below we will be filling in columns with missing values with more central data tendency values like mode, mean or median.

In [976]:
# Creating function to fill in missing values in numerical columns with the median
for col in numeric_cols:
    df[col].fillna(df[col].median(), inplace=True)


In [977]:
# Creating function to fill in missing values in categorical columns with the mode
for col in categorical_cols:
    df[col].fillna(df[col].mode()[0], inplace=True)
                

In [978]:
# Checking data
df

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Injury.Severity,Aircraft.damage,Registration.Number,Make,...,Engine.Type,Purpose.of.flight,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,Fatal(2),Destroyed,NC6404,Stinson,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,Fatal(4),Destroyed,N5069P,Piper,...,Reciprocating,Personal,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,Fatal(3),Destroyed,N5142R,Cessna,...,Reciprocating,Personal,3.0,0.0,0.0,1.0,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,Fatal(2),Destroyed,N1168J,Rockwell,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,Fatal(1),Destroyed,N15NY,Cessna,...,,Personal,1.0,2.0,0.0,0.0,VMC,Approach,Probable Cause,16-04-1980
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88884,20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,Minor,,N1867H,PIPER,...,,Personal,0.0,1.0,0.0,0.0,,,,29-12-2022
88885,20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,,,N2895Z,BELLANCA,...,,,0.0,0.0,0.0,0.0,,,,
88886,20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,Non-Fatal,Substantial,N749PJ,AMERICAN CHAMPION AIRCRAFT,...,,Personal,0.0,0.0,0.0,1.0,VMC,,,27-12-2022
88887,20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,,,N210CU,CESSNA,...,,Personal,0.0,0.0,0.0,0.0,,,,


We will now review how many missing values are present.

In [979]:
# Checking count of columns with missing values
df.isna().sum()

Event.Id                      0
Investigation.Type            0
Accident.Number               0
Event.Date                    0
Location                     52
Country                     222
Injury.Severity               0
Aircraft.damage               0
Registration.Number        1350
Make                          0
Model                         0
Amateur.Built                 0
Number.of.Engines             0
Engine.Type                   0
Purpose.of.flight             0
Total.Fatal.Injuries          0
Total.Serious.Injuries        0
Total.Minor.Injuries          0
Total.Uninjured               0
Weather.Condition             0
Broad.phase.of.flight         0
Report.Status                 0
Publication.Date          13599
dtype: int64

We see that the remaining data with missing values is `Location`, `Country`, `Registration.Number` and `Publication.Date`. We will impute this missing values with unknowns and drop where necessary.

In [980]:
# Filling missing values for Location, Registration Number and Country with unknown
df['Location'].fillna('Unknown', inplace=True)
df['Country'].fillna('Unknown', inplace=True)
df['Registration.Number'].fillna('Unknown', inplace=True)

In [981]:
# Dropping rows with no Publication.Date
df = df.dropna(subset=['Publication.Date'])

In the cells below we will check if there are any more null values present in our dataset as well as new dataset shape.

In [982]:
df.isna().sum()

Event.Id                  0
Investigation.Type        0
Accident.Number           0
Event.Date                0
Location                  0
Country                   0
Injury.Severity           0
Aircraft.damage           0
Registration.Number       0
Make                      0
Model                     0
Amateur.Built             0
Number.of.Engines         0
Engine.Type               0
Purpose.of.flight         0
Total.Fatal.Injuries      0
Total.Serious.Injuries    0
Total.Minor.Injuries      0
Total.Uninjured           0
Weather.Condition         0
Broad.phase.of.flight     0
Report.Status             0
Publication.Date          0
dtype: int64

In [983]:
df.shape

(74352, 23)

In [984]:
df

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Injury.Severity,Aircraft.damage,Registration.Number,Make,...,Engine.Type,Purpose.of.flight,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,Fatal(4),Destroyed,N5069P,Piper,...,Reciprocating,Personal,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,Fatal(3),Destroyed,N5142R,Cessna,...,Reciprocating,Personal,3.0,0.0,0.0,1.0,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,Fatal(2),Destroyed,N1168J,Rockwell,...,Reciprocating,Personal,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,Fatal(1),Destroyed,N15NY,Cessna,...,,Personal,1.0,2.0,0.0,0.0,VMC,Approach,Probable Cause,16-04-1980
5,20170710X52551,Accident,NYC79AA106,1979-09-17,"BOSTON, MA",United States,Non-Fatal,Substantial,CF-TLU,Mcdonnell Douglas,...,Turbo Fan,,0.0,0.0,1.0,44.0,VMC,Climb,Probable Cause,19-09-2017
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88882,20221222106486,Accident,CEN23LA068,2022-12-21,"Reserve, LA",United States,Minor,,N321GD,GRUMMAN AMERICAN AVN. CORP.,...,,Instructional,0.0,1.0,0.0,1.0,,,,27-12-2022
88883,20221228106502,Accident,GAA23WA046,2022-12-22,"Brasnorte,",Brazil,Fatal,,PP-IRC,AIR TRACTOR,...,,,1.0,0.0,0.0,0.0,,,,28-12-2022
88884,20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,Minor,,N1867H,PIPER,...,,Personal,0.0,1.0,0.0,0.0,,,,29-12-2022
88886,20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,Non-Fatal,Substantial,N749PJ,AMERICAN CHAMPION AIRCRAFT,...,,Personal,0.0,0.0,0.0,1.0,VMC,,,27-12-2022


## Data Analysis