# Aircraft Purchase Risk Assessment
## Project overview

The goal of this project is to analyze historical aircraft data to assess the potential risks involved in purchasing different types of airplanes. This analysis is particularly relevant for aviation companies, leasing firms, and private buyers aiming to make data-driven investment decisions.

Using structured data on aircraft specifications, operational history, maintenance records, and financial metrics, we will develop a risk evaluation framework. This framework will help identify aircrafts that pose high operational or financial risks before acquisition.

## Objectives

- To understand the structure, quality, and completeness of the aircraft dataset.
- To explore patterns and relationships between aircraft attributes and risk indicators.
- To engineer new features that reflect usage intensity, depreciation, and maintenance burden.
- To build a predictive model (classification or clustering) to categorize aircraft by risk level.
- To provide actionable insights for procurement strategy based on historical data trends.


In [1]:
# import libraries

import pandas as pd
import numpy as np

In [9]:
#upload dataset 

df = pd.read_csv("AviationData.csv", encoding="latin1", low_memory=False)
df.head()



Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980


In [11]:
# check the shape of the dataset
df.shape

(88889, 31)

The dataset has 88,889 rows and 31 columns.