# AVIATION RISK ANALYSIS

### OVERVIEW
The aviation industry continually seeks to have reliable operations with very low risk. This project aims to provide a comprehensive analysis for historical aviation accidents to understand their causes, offer practical insights to the company's head of aviation and guide the company on the best investment.

### BUSINESS UNDERSTANDING
Our company seeks to venture into the aviation industry, they are interested in purchasing and operating airplanes for commercial and private enterprises. Multiple companies in this industry make a decision based on cost, performance, or availability. However, without looking closely at safety records, there is a risk of selecting aircraft with poor accident histories. This can lead to higher risks, unexpected costs, and safety challenges in the long run. The problem we aim to solve is how to use accident data to analyze and compare safety levels across aircraft models, to make well informed decisions balancing both performance and risk.

### Stakeholder
Director of aviation division

Responsibility: Responsible for making strategic decisions regarding the procurement of new aircraft.

Requirements: Requires detailed Information about aircraft safety and operational risks to make informed investment choices.

 

### Objectives
1. To explore and understand the aviation accident dataset.

2. To clean and prepare the data for analysis by addressing missing and inconsistent values.

3. Determine which aircraft has the lowest operational risk

### DATA UNDERSTANDING
The dataset we are using is from the National Transportation Safety Board (NTSB) and contains records of civil aviation accidents and selected incidents from 1948 to 2023. Each record provides details such as the date of the event, aircraft information, operator, location, injury severity, and probable cause.

The dataset is valuable because it allows us to analyze historical aviation accidents, identify trends, and assess risk factors across different types of aircraft. However, it also contains missing values and inconsistencies that must be handled during data cleaning before meaningful insights can be drawn.

### Importing our libraries that will be used in this analysis

In [2]:
import pandas as pd
import matplotlib as plt
import matplotlib.pyplot as plt
import numpy as np

Pandas- Pandas library will be used to load the CSV dataset, it support data cleaning, handle missing data, support grouping and filtering

Numpy-This library will support mathematical operations

Matplotlib - This library will support visualization and support seaborn

Pyplot provides functions that helps to create graphs easily

### Loading our dataset

In [3]:
df=pd.read_csv("AviationData.csv", encoding="latin1")

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


After loading our data then we will be able to preview using the .head() function or .tail() function to ensure that our data has been loaded succesfully

### Previewing our dataset using .head() function

In [4]:
df.head()

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.9222,-81.8781,,,...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980


The .head() function is used to preview the first five rows and all columns of our dataset.

### Previewing our dataset using .tail() function

In [5]:
df.tail()

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
88884,20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,,,,,...,Personal,,0.0,1.0,0.0,0.0,,,,29-12-2022
88885,20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,,,,,...,,,0.0,0.0,0.0,0.0,,,,
88886,20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,341525N,1112021W,PAN,PAYSON,...,Personal,,0.0,0.0,0.0,1.0,VMC,,,27-12-2022
88887,20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,,,,,...,Personal,MC CESSNA 210N LLC,0.0,0.0,0.0,0.0,,,,
88888,20221230106513,Accident,ERA23LA097,2022-12-29,"Athens, GA",United States,,,,,...,Personal,,0.0,1.0,0.0,1.0,,,,30-12-2022


The .tail() function is used to preview the last five rows and all columns of our dataset.

### Overview of the DataFrame

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88889 entries, 0 to 88888
Data columns (total 31 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event.Id                88889 non-null  object 
 1   Investigation.Type      88889 non-null  object 
 2   Accident.Number         88889 non-null  object 
 3   Event.Date              88889 non-null  object 
 4   Location                88837 non-null  object 
 5   Country                 88663 non-null  object 
 6   Latitude                34382 non-null  object 
 7   Longitude               34373 non-null  object 
 8   Airport.Code            50249 non-null  object 
 9   Airport.Name            52790 non-null  object 
 10  Injury.Severity         87889 non-null  object 
 11  Aircraft.damage         85695 non-null  object 
 12  Aircraft.Category       32287 non-null  object 
 13  Registration.Number     87572 non-null  object 
 14  Make                    88826 non-null

The .info() function shows a summary of the DataFrame. It shows the various columns and how many non-null values are in each column, shows the data type of every column and how many rows the dataset contain

### Lets check for missing values

In [7]:
df.isna().sum()

Event.Id                      0
Investigation.Type            0
Accident.Number               0
Event.Date                    0
Location                     52
Country                     226
Latitude                  54507
Longitude                 54516
Airport.Code              38640
Airport.Name              36099
Injury.Severity            1000
Aircraft.damage            3194
Aircraft.Category         56602
Registration.Number        1317
Make                         63
Model                        92
Amateur.Built               102
Number.of.Engines          6084
Engine.Type                7077
FAR.Description           56866
Schedule                  76307
Purpose.of.flight          6192
Air.carrier               72241
Total.Fatal.Injuries      11401
Total.Serious.Injuries    12510
Total.Minor.Injuries      11933
Total.Uninjured            5912
Weather.Condition          4492
Broad.phase.of.flight     27165
Report.Status              6381
Publication.Date          13771
dtype: i

This returns an output of which column has missing values and how many values are missing in total per column