#  Aviation Accident Analysis (1962–2023)

##  1. Business Understanding

###  Project Overview

A new company is preparing to enter the aviation industry and is currently in the **decision-making phase** regarding **fleet acquisition** and **operational planning**. While the industry presents immense opportunities for profit and expansion, **aviation safety** remains a primary concern. Aircraft accidents not only lead to the tragic loss of life but also cause severe reputational damage, legal implications, and financial losses. Therefore, minimizing the risk of such incidents is a **core strategic goal** for the company.

This project aims to analyze over **60 years of aviation accident data** sourced from the **National Transportation Safety Board (NTSB)** to derive **data-driven insights** into:

- The **root causes** of aviation accidents
- The **types of aircraft** most involved in accidents
- The **safest aircraft models and manufacturers**
- Accident trends by **location**, **weather**, **phase of flight**, and **purpose of flight**
- **Fatality rates** and the **severity** of different types of incidents

### 🛠 Why This Analysis Matters

Entering the aviation industry without understanding historical risks would be like flying blind. This analysis will help the business:

1. **Reduce Risk Exposure**  
   By identifying aircraft types or operational conditions that frequently result in accidents, the company can avoid investing in high-risk assets or routes.

2. **Improve Procurement Strategy**  
   With insights into the safest aircraft models, the business can make more **informed purchasing decisions** that prioritize **safety, performance, and reliability**.

3. **Enhance Operational Safety Protocols**  
   By understanding accident patterns across weather conditions, flight phases, and human factors, the company can design **training programs**, **safety checklists**, and **emergency response procedures** that mitigate risk.

4. **Build Customer and Investor Trust**  
   Demonstrating a strong, data-backed commitment to safety can **enhance the brand image**, foster **customer confidence**, and attract **investors or partners** seeking responsible operators.

###  Business Questions We Aim to Answer

- What are the **most common causes** of aviation accidents?
- Which aircraft **types** and **manufacturers** are most often involved in fatal incidents?
- Which **U.S. states** and **regions** have recorded the highest number of accidents?
- How does accident frequency and severity vary by **flight type** (private, commercial, military)?
- What are the trends in aviation safety over the years?
- Which aircrafts or manufacturers have the **lowest accident-to-fatality ratio**, indicating better survivability?

### Final Deliverables

The findings of this project will be communicated via:

- An **interactive dashboard** for stakeholder exploration
- A **summary presentation** tailored to non-technical executives
- A **recommendation report** on aircraft procurement
- A clean, well-documented **GitHub repository** for transparency

---

>  This analysis goes beyond just numbers — it's a decision-making tool for the **future of aviation safety and investment**.


#  2. Data Understanding

In this section, we perform an initial exploration of the dataset to understand its structure, contents, and quality. This includes:
- Viewing column names and data types
- Previewing the first few records
- Identifying missing values
- Understanding the nature of each variable (categorical, numerical, textual)
- Flagging early data quality issues such as duplicates, inconsistent formats, or suspicious values


In [10]:
# Import essential libraries for data exploration
import pandas as pd
import numpy as np

# Increase display width to view all columns when printing DataFrames
pd.set_option('display.max_columns', None)

# Load the dataset with flexible parsing
df = pd.read_csv(
    'data/AviationData.csv',
    sep=None,
    engine='python',
    encoding='utf-8',
    error_bad_lines=False,
    warn_bad_lines=True
)

print(f"✅ After flexible parsing: {df.shape[0]} rows, {df.shape[1]} columns.")


✅ After flexible parsing: 490 rows, 0 columns.


In [11]:
# Display first 15 lines from the raw file
with open('data/AviationData.csv', 'r', encoding='utf-8') as file:
    for _ in range(15):
        print(file.readline())






<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html dir="ltr" class="ms-isBot" lang="en-US">

    <head>

        <!-- ntsbpublic.master -->



        <meta name="GENERATOR" content="Microsoft SharePoint" /><meta http-equiv="X-UA-Compatible" content="IE=Edge" /><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Expires" content="0" /><meta name="progid" content="SharePoint.WebPartPage.Document" /><meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="msapplication-TileImage" content="/_layouts/15/images/SharePointMetroAppTile.png" /><meta name="msapplication-TileColor" content="#0072C6" />




        




        <link rel="shortcut icon" href="/style%20library/ntsb/images/favicon.ico" type="image/vnd.microsoft.icon" id="favicon" />



