## Phase 1 Project 

# Aircraft Risk Analysis Project

## Overview
The company is expanding into aviation and wants to identify which aircraft models present the **lowest risk**.  
Our **main objective** is to generate **three data-backed business recommendations** on aircraft safety.

We will support this by breaking down into **specific objectives**:

1. Understand the dataset structure and assess missing data.
2. Clean and prepare the dataset (column renaming, handling nulls, deriving new features).
3. Explore accident frequency across aircraft makes and models.
4. Assess accident severity (fatalities per accident).
5. Analyze temporal and geographic patterns.
6. Produce three clear recommendations backed by visualizations.

**Business Understanding**

- **Main objective:** Recommend the lowest-risk aircraft for acquisition.
- **Key business questions:**
  1. Which aircraft makes/models have relatively fewer accidents?
  2. Which aircraft have lower severity (fewer fatalities when accidents occur)?
  3. Are there patterns over time or geography that affect operational risk?

Loading the dataset using Pandas

In [2]:
import pandas as pd
aviation_accidents = pd.read_csv("aviation-accident-data-2023-05-16.csv")

Dataset info : shows how the data is structured; rows, colums, and their data types. 

Also shows non-null values.

The dataset contains 23,967 rows, and 9 columns

It has a number of null entries

In [6]:
aviation_accidents.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23967 entries, 0 to 23966
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   date          23967 non-null  object
 1   type          23967 non-null  object
 2   registration  22419 non-null  object
 3   operator      23963 non-null  object
 4   fatalities    20029 non-null  object
 5   location      23019 non-null  object
 6   country       23967 non-null  object
 7   cat           23967 non-null  object
 8   year          23967 non-null  object
dtypes: object(9)
memory usage: 1.6+ MB


In [None]:
aviation_accidents.shape # number of rows and columns in the dataset

(23967, 9)

In [None]:
aviation_accidents.columns # column names in the dataset

Index(['date', 'type', 'registration', 'operator', 'fatalities', 'location',
       'country', 'cat', 'year'],
      dtype='object')

A brief look at the first five rows of the dataset

In [4]:
aviation_accidents.head()

Unnamed: 0,date,type,registration,operator,fatalities,location,country,cat,year
0,date unk.,Antonov An-12B,T-1206,Indonesian AF,,,Unknown country,U1,unknown
1,date unk.,Antonov An-12B,T-1204,Indonesian AF,,,Unknown country,U1,unknown
2,date unk.,Antonov An-12B,T-1201,Indonesian AF,,,Unknown country,U1,unknown
3,date unk.,Antonov An-12BK,,Soviet AF,,Tiksi Airport (IKS),Russia,A1,unknown
4,date unk.,Antonov An-12BP,CCCP-11815,Soviet AF,0.0,Massawa Airport ...,Eritrea,A1,unknown


A look at the last 8 entries in the dataset

In [12]:
aviation_accidents.tail(8)

Unnamed: 0,date,type,registration,operator,fatalities,location,country,cat,year
23959,26-APR-2023,Boeing 777-233LR,C-FIUF,Air Canada,0,Sydney-Kingsford...,Australia,A2,2023
23960,09-MAY-2023,Cessna 208 Caravan 675,PK-HVG,Dimonim Air,0,Yabi Airstrip,Indonesia,A2,2023
23961,10-MAY-2023,Learjet 36A,N56PA,Phoenix Air,3,near San Clemente Isl...,USA,A1,2023
23962,11-MAY-2023,Hawker 900XP,PK-LRU,Angkasa Super Services,0,Maleo Airport (MOH),Indonesia,A2,2023
23963,11-MAY-2023,Cessna 208B Grand Caravan,PK-NGA,Nasional Global Aviasi,0,Fentheik Airstrip,Indonesia,A2,2023
23964,12-MAY-2023,Cessna 208B Grand Caravan,5X-RBR,Bar Aviation,0,Kampala-Kajjansi...,Uganda,A1,2023
23965,14-MAY-2023,Boeing 747-4R7F,LX-OCV,Cargolux,0,Luxembourg-Finde...,Luxembourg,A2,2023
23966,15-MAY-2023,Learjet 35A,D-CGFQ,GFD,2,Hohn Air Base,Germany,A1,2023


A summary of the dataset's stats for better understanding

In [13]:
aviation_accidents.describe()

Unnamed: 0,date,type,registration,operator,fatalities,location,country,cat,year
count,23967,23967,22419,23963,20029,23019,23967,23967,23967
unique,15079,3201,21962,6017,369,14608,232,11,106
top,10-MAY-1940,Douglas C-47A (DC-3),LZ-...,USAAF,0,unknown,USA,A1,1944
freq,171,1916,13,2604,10713,272,4377,17424,1505


## Missing values

Checking and dealing with missing values in the dataset