# PHASE 1 PROJECT - Aviation Risk Analysis & Recommendations

## Introduction
As the company is seeking to diversify its portfolio through expansion into commercial and private aviation, it is impotrant to have a deep understanding of operational risks. Using the accident data from the National Transportation Safety Board, this project evaluates the safety records of various aircraft manufacturesrs and models from 1962 to 2023. I will explore the data to determine the lowest risk options available. This will inform actionable recommendations to stakeholders to acquire safe and reliable fleet.

## Business Understanding
1. Problem Statement
The primary challenge is that the company lacks internal expertise regarding the safety profiles and potential risks associated with different types of aircraft. Investing in high-risk aircraft could lead to financial loss, legal liability, reputational damage, and loss of life. The company needs to know which aircraft present the lowest risk to ensure the safety of operations and the viability of the new business unit.

2. Main Objective
The goal of this analysis is to determine which aircraft manufacturers and models have the lowest risk profile to support the company's new business endeavor.

- Analyzing historical aviation accident data to identify trends in safety.
- Evaluating risk based on key factors such as aircraft make, engine type, and phase of flight.
- Providing three concrete, actionable recommendations to the head of the new aviation division to guide their purchasing decisions.

4. Stakeholders
Primary Stakeholder: Head of the new Aviation Division.
Secondary Stakeholders: Executive Board, Investors, and potential future passengers/clients who rely on the company's commitment to safety.

5. Data Source
The analysis utilizes the National Transportation Safety Board (NTSB) Aviation Accident Dataset, which contains information on civil aviation accidents and selected incidents within the United States and international waters from 1962 to 2023. This comprehensive dataset allows for a robust assessment of long-term safety trends and specific incident causes.

---

## Data Understanding
#### Data source
The dataset for this analysis is sourved from **National Transportation Safety Board, that covers aviation accidents and incidents involving aircrafts in the United States and international waters from 1962 to 2023.
This data serves as a reliable sourve to determine aircraft safety records.

#### Data Schema
The dataset includes tens of thousands of records, where each row represents a single aviation accident or incident.
Each record contains information about:
- The aircraft (type, category, manufacturer, model)
- The event (date, location, purpose of flight)
- The environment (weather, light conditions)
- The outcome (injuries, fatalities, damage level)
This makes the dataset comprehensive enough to evaluate both accident frequency and accident severity across aircraft types.

#### Data Quality
##### Missing Values
- Some older records (1960s–1980s) may lack detail
- Certain columns may have missing values (None) that may need to be filled or droped.
- Weather and light condition fields frequently contain “Unknown”
##### Inconsistent Values
Since the data set is based on a large period, certain values may have format changes or spelling errors.
- Manufacturers may appear in multiple forms
- Aircraft models may use different formatting or spacing
##### Outliers
- Extremely old or rare aircraft types
- Occasional data-entry errors
- Records with zero injuries but aircraft recorded as “Destroyed”
---

## Data Preparation
Loading data and importing libraries

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

df = pd.read_csv('AviationData.csv', encoding='utf-8', encoding_errors='replace', low_memory=False)
df.head() #

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980


### Dataset Overview
Basic data information

In [9]:
df.shape

(88889, 31)

The dataset consists of 88,889 records (rows) and 31 features (columns)

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88889 entries, 0 to 88888
Data columns (total 31 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event.Id                88889 non-null  object 
 1   Investigation.Type      88889 non-null  object 
 2   Accident.Number         88889 non-null  object 
 3   Event.Date              88889 non-null  object 
 4   Location                88837 non-null  object 
 5   Country                 88663 non-null  object 
 6   Latitude                34382 non-null  object 
 7   Longitude               34373 non-null  object 
 8   Airport.Code            50132 non-null  object 
 9   Airport.Name            52704 non-null  object 
 10  Injury.Severity         87889 non-null  object 
 11  Aircraft.damage         85695 non-null  object 
 12  Aircraft.Category       32287 non-null  object 
 13  Registration.Number     87507 non-null  object 
 14  Make                    88826 non-null

It is primarily composed of categorical data (26 object columns) and some numerical data (5 float64 columns) representing injury counts and engine numbers.
There is significant missing data in several key columns that will require handling

## Data Strategy

The relevant columns include:
- Make
- Model
- Aircraft.Category
- Broad.Phase.of.Flight
- Weather.Condition
- Number.of.Engines
- Total.Fatal.Injuries
- Total.Serious.Injuries
- Total.Minor.Injuries
- Total_Uninjured
- Aircraft.Damage
- Purpose.of.flight
- Event.Date