# Aviation Risk Analysis for Aircraft Acquisition

## Introduction
As our company explores the opportunity to diversify into the aviation industry, this project aims to identify aircraft types with the lowest risk based on historical accident data. Using aviation incident records from 1962 to 2023, our goal is to deliver data-driven insights to guide aircraft purchasing decisions.

### Stakeholder
The Head of the Aviation Division will use this analysis to decide which aircraft to consider acquiring for safe and reliable operations. Our recommendations aim to reduce operational risks and ensure strong entry into the aviation market.

## Data Understanding

The dataset comes from the [National Transportation Safety Board (NTSB)], which provides data on civil aviation accidents and selected incidents from 1962 to 2023. It includes details such as:

- Event Date & Location
- Aircraft Make/Model
- Flight Purpose (Personal, Commercial, etc.)
- Injury Severity
- Phase of Flight (Takeoff, Landing, etc.)
- Weather Conditions
- Aircraft Damage
- Narrative Descriptions

This dataset is relevant because it allows us to analyze patterns of risk associated with different aircraft types, operational environments, and usage conditions. 

### Dataset Summary (To be filled after loading the data):
- Number of rows: 
- Number of columns: 
- Features of interest: Aircraft Model, Injury Severity, Phase of Flight, Weather, Aircraft Damage

### Coded Fields
Some fields are encoded using abbreviations (e.g., `INJURY_SEVERITY` might contain values like 'FAT' for Fatal, 'MIN' for Minor). These need to be decoded for clarity before meaningful analysis.

A decoding step will be done as part of data preparation, and a dictionary of decoded values will be maintained to enhance interpretability.

### Limitations:
- Many rows have missing values
- Some fields are in free-text (e.g., narratives)
- International accidents might be inconsistently reported
- Some variables require decoding to be useful for analysis

## Data Preparation

In this section, we will load the raw aviation accident dataset and perform initial data cleaning and preparation to make it suitable for analysis.

### Steps:

1. Load the dataset: Import the aviation dataset using pandas.
2. Preview the data: Display the first few rows to get a sense of its structure and values.
3. Understand column types and names: Identify useful features for analysis.
4. Check for missing values: Determine where imputation or exclusion might be needed.
5. Identify coded or unclear values: Pinpoint fields that use abbreviations or codes and create a decoding strategy.
6. Rename columns (if needed): Make column names more readable and descriptive.
7. Drop or fill irrelevant or incomplete data: Depending on the relevance of columns or percentage of missing values.
8. Check for duplicates

This preparation phase is crucial for cleaning the data and making sure it's usable for generating actionable insights.