# AVIATION ACCIDENTS

## Business Understanding 

Business Problem: My company is expanding into the aviation industry but lacks insights into aircraft safety risks. This analysis identifies the lowest-risk aircraft models for commercial and private use.

## Problem Statement

The aviation industry may be able to improve safety measures by analyzing accident data to identify patterns in aircraft damage and fatality rates. Doing so will allow airlines, manufacturers, and regulatory agencies to better understand risk factors and implement strategies to enhance aviation safety. Using aircraft accident data, I will examine key trends to determine which aircraft models demonstrate strong safety records and provide actionable insights for improving aviation safety standards.

### Business Objectives 
1.Which aircraft models have the highest fatality rates?

2.Try and find the leading cause of aviation accidents?

3.Trends in aviation fatalities over the years?

4.Recomend actions to help mitigate/prevent aviation accidents and incedents?

### Data Understanding

In [62]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [63]:
df = pd.read_csv("aviation-accident-data-2023-05-16.csv")
df.head()

Unnamed: 0,date,type,registration,operator,fatalities,location,country,cat,year
0,date unk.,Antonov An-12B,T-1206,Indonesian AF,,,Unknown country,U1,unknown
1,date unk.,Antonov An-12B,T-1204,Indonesian AF,,,Unknown country,U1,unknown
2,date unk.,Antonov An-12B,T-1201,Indonesian AF,,,Unknown country,U1,unknown
3,date unk.,Antonov An-12BK,,Soviet AF,,Tiksi Airport (IKS),Russia,A1,unknown
4,date unk.,Antonov An-12BP,CCCP-11815,Soviet AF,0.0,Massawa Airport ...,Eritrea,A1,unknown


In [64]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23967 entries, 0 to 23966
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   date          23967 non-null  object
 1   type          23967 non-null  object
 2   registration  22419 non-null  object
 3   operator      23963 non-null  object
 4   fatalities    20029 non-null  object
 5   location      23019 non-null  object
 6   country       23967 non-null  object
 7   cat           23967 non-null  object
 8   year          23967 non-null  object
dtypes: object(9)
memory usage: 1.6+ MB


In [65]:
df.describe()

Unnamed: 0,date,type,registration,operator,fatalities,location,country,cat,year
count,23967,23967,22419,23963,20029,23019,23967,23967,23967
unique,15079,3201,21962,6017,369,14608,232,11,106
top,10-MAY-1940,Douglas C-47A (DC-3),LZ-...,USAAF,0,unknown,USA,A1,1944
freq,171,1916,13,2604,10713,272,4377,17424,1505


In [66]:
df.shape

(23967, 9)

### Data Cleaning

In [67]:
# check for duplicated values 
df.duplicated().value_counts()


False    23852
True       115
dtype: int64

In [68]:
 #View duplicates 
df[df.duplicated()].head(15)

Unnamed: 0,date,type,registration,operator,fatalities,location,country,cat,year
542,13-APR-1940,Junkers Ju-52/3m,,German AF,,"Gangsoya, Sogn o...",Norway,A1,1940
560,29-APR-1940,Junkers Ju-52/3m,,German AF,0.0,Oslo-Fornebu Air...,Norway,A1,1940
568,10-MAY-1940,Junkers Ju-52/3m,,German AF,,Waalhaven,Netherlands,A1,1940
577,10-MAY-1940,Junkers Ju-52/3m,,German AF,,near Den Haag,Netherlands,A1,1940
579,10-MAY-1940,Junkers Ju-52/3m,,German AF,,Waalhaven,Netherlands,A1,1940
580,10-MAY-1940,Junkers Ju-52/3m,,German AF,,near Den Haag,Netherlands,A1,1940
581,10-MAY-1940,Junkers Ju-52/3m,,German AF,,near Den Haag,Netherlands,A1,1940
582,10-MAY-1940,Junkers Ju-52/3m,,German AF,,near Den Haag,Netherlands,A1,1940
584,10-MAY-1940,Junkers Ju-52/3m,,German AF,,Waalhaven,Netherlands,A1,1940
585,10-MAY-1940,Junkers Ju-52/3m,,German AF,,near Den Haag,Netherlands,A1,1940


In [69]:
# removing  duplicated values 
df = df.drop_duplicates()

In [70]:
# verify that they have been removed

df.duplicated().value_counts()

False    23852
dtype: int64

In [71]:
# gives the columns names in the dataset
df.columns

Index(['date', 'type', 'registration', 'operator', 'fatalities', 'location',
       'country', 'cat', 'year'],
      dtype='object')

### Dealing with missing values

#### Detecting Null Values

In [72]:
# detecting null  values 
#isnull

df.isna().sum()

date               0
type               0
registration    1434
operator           4
fatalities      3833
location         932
country            0
cat                0
year               0
dtype: int64

### Handling missing value

In [73]:
df["operator"] = df["operator"].fillna("Unknown")
df.isna().sum()


date               0
type               0
registration    1434
operator           0
fatalities      3833
location         932
country            0
cat                0
year               0
dtype: int64

In [74]:
df["fatalities"] = df["operator"].fillna("no fatalities")
df.isna().sum()

date               0
type               0
registration    1434
operator           0
fatalities         0
location         932
country            0
cat                0
year               0
dtype: int64

In [75]:
df["location"] = df["operator"].fillna("unknown location")
df.isna().sum()

date               0
type               0
registration    1434
operator           0
fatalities         0
location           0
country            0
cat                0
year               0
dtype: int64

In [76]:

df["registration"] = df["operator"].fillna("unknown")
df.isna().sum()

date            0
type            0
registration    0
operator        0
fatalities      0
location        0
country         0
cat             0
year            0
dtype: int64