# Phase 1: Aviation Risk Analysis

## 1. Introduction and Business Understanding

The objective of this project is to analyze aviation accident data from the NTSB to identify the lowest-risk aircraft for our company's new aviation division. Low risk is defined by a combination of low accident frequency and low fatality rates. The findings will be translated into three concrete business recommendations.

## 2. Data Understanding

In this section, we load the `AviationData.csv` dataset and perform initial inspections to understand its structure, data types, and the extent of missing values, which will guide our data cleaning strategy.

In [2]:
import pandas as pd
import numpy as np 

# Load the dataset
df = pd.read_csv('AviationData.csv')

# Display the first 5 rows to check structure
print(df.head())

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 5: invalid continuation byte

In [3]:
# To get a look at data types and non-null counts
df.info()

NameError: name 'df' is not defined

## 3. Data Preparation

My analysis requires accurate counts for injuries and a proper date format for time-series analysis. This section focuses on cleaning these key columns.

### 3.1 Handling Missing Injury Counts

For accident data, a missing count in injury columns (e.g., `Total.Fatal.Injuries`) often implies a count of zero. We will impute `NaN` values in these columns with 0, as dropping these records would eliminate valuable information about non-fatal accidents.

In [4]:
# List of columns related to injury counts
injury_columns = [
    'Total.Fatal.Injuries',
    'Total.Serious.Injuries',
    'Total.Minor.Injuries',
    'Total.Uninjured',
    'Total.Aboard'
]

# Replacing NaN values with 0
df[injury_columns] = df[injury_columns].fillna(0)

# Verify the changes for those columns
print("--- Missing values in injury columns after imputation: ---")
print(df[injury_columns].isnull().sum())

NameError: name 'df' is not defined

In [None]:
### 3.2 Converting and Filtering by Date

The `Event.Date` column must be converted to a proper datetime format to allow for filtering and trend analysis. We will also focus the analysis on the last few decades (e.g., since 1990) to ensure relevance for modern aircraft purchasing decisions.