**#Phase 1 Project**

Student Name: Lydia Mangoa

Student Pace: Part-time DSF-PT09

Project submisssion date: 24th November 2024

Insturctors: Noah Kandie & Bonface Manyara


# Project title: Improving Aviation Safety Through Data-Driven Insights


#1.**Overview**

The goal of this analysis is to provide insights into aviation accidents to help identify low-risk aircraft models for commercial and private operations. This analysis will focus on:

Accident frequencies by aircraft type and model.

The number of engines on an aircraft and their influenc'e its safety performance and fatality rates

Recommendations for selecting safer aircraft. 

#2. **Business Understanding**


##**Stakeholders**

-Head of Aviation Division: Decision-maker for aircraft purchases.

-Leadership Team: Interested in financial viability and risk minimization.

##**Key Business Questions**

Which aircraft models have the lowest accident rates?

How does the number of engines on an aircraft influence its safety performance and fatality rates in commercial and high-stakes operations?

How can accident trends inform the selection of low-risk aircraft?

##**Source of Data**
The dataset is sourced from the Kaggle Aviation Accident Database Synopses, which includes historical aviation accident data.

#3.**Data Understanding**

In [14]:
#Load the data set
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("./AviationData.csv", encoding="ISO-8859-1", low_memory=False)


FileNotFoundError: [Errno 2] No such file or directory: './AviationData.csv'

In [None]:
#Summary of statistics
df.describe()

In [None]:
#Summary of data types
df.info()

In [None]:
#Quick overview -Showing the first five rows
df.head()

In [None]:
#Quick overview- Showing the 5 last rows
df.tail()

#4.**Data Preparation**

###Clean Mixed-Type Columns

Having received a warning while loading the data set that columns 6,7 and 28 have mixed datatypes, the first step will be to clean these columns.

In [None]:
print("Column names:", df.columns)
print("Columns with mixed types:")
print(df.iloc[:, [6, 7, 28]].head())

In [None]:
for col in [df.columns[6], df.columns[7], df.columns[28]]:
    print(f"Unique values in {col}:")
    print(df[col].unique())

    # Convert column to numeric (if needed), coercing errors to NaN
    df[col] = pd.to_numeric(df[col], errors="coerce")

In [None]:
#Validate the data
df.info()
df.describe()

###Handle Missing values

In [None]:
#Identify missing values
missing_data = df.isnull().sum()
print("Missing Values:\n", missing_data)

In [None]:
# Handle missing values
df['Aircraft.damage'].fillna('Unknown', inplace=True)
df['Weather.Condition'].fillna('Unknown', inplace=True)
df.dropna(subset=['Make', 'Model'], inplace=True)

# 5. **Data Analysis**

In [None]:
# Top 10 Aircraft Makes by Accident Count
top_makes = df['Make'].value_counts().head(10)

plt.figure(figsize=(10, 6))
top_makes.plot(kind='bar', color='skyblue')
plt.title("Top 10 Aircraft Makes by Accident Count")
plt.xlabel("Aircraft Make")
plt.ylabel("Number of Accidents")
plt.xticks(rotation=45)
plt.show()

In [None]:
# Yearly Trend in Accidents
df['Year'] = df['Event.Date'].dt.year
yearly_accidents = df.groupby('Year').size()

plt.figure(figsize=(12, 6))
yearly_accidents.plot(kind='line', marker='o', color='red')
plt.title("Yearly Trend in Aviation Accidents")
plt.xlabel("Year")
plt.ylabel("Number of Accidents")
plt.grid()
plt.show()

In [None]:
# Compare single-engine vs multi-engine accidents
# Severity mapping
severity_map = {
    "Fatal": 3,
    "Serious": 2,
    "Minor": 1,
    "None": 0,
    "Unknown": -1
}

# Apply severity mapping to create a severity score
df['Severity.Score'] = df['Injury.Severity'].map(severity_map).fillna(-1)

# Create a derived column to classify accidents as fatal or non-fatal
df['Is.Fatal'] = df['Total.Fatal.Injuries'] > 0

# Group by 'Number.of.Engines' to calculate fatality rates
engine_accidents = df.groupby('Number.of.Engines')['Is.Fatal'].mean()

# Plot the bar chart
plt.figure(figsize=(8, 5))
engine_accidents.plot(kind='bar', color=['green', 'orange'])
plt.title("Fatality Rate by Number of Engines")
plt.xlabel("Number of Engines")
plt.ylabel("Average Fatality Rate")
plt.xticks(rotation=0)
plt.show()

#6. **Conclusion**
Key findings:

-Aircraft Safety: Certain models (e.g., Cessna 172 and Boeing 737) have a higher number of accidents but could reflect higher usage rather than risk.

-Trends Over Tim e:Accident frequencies have declined over the years, indicating improving safety practices.

-Fatality Rates by Number of Engi nes:Aircraft with multiple engines generally show lower average fatality rates compared to single-engine aircraft. This suggests that redundancy in engines enhances safety, potentially offering more options in emergencies, such as engine failure.

#7.Recommendations
-Focus on Safe Aircraft Models: Prioritize modern, widely operated models with good safety records and low Severity Scores.

-The declining trend in accidents likely reflects technological advancements. Prioritize acquiring modern aircraft models with the latest safety technologies (e.g., enhanced navigation systems, automated controls, and multi-engine redundancy). These investments align with reducing operational risks. Therefore' its safe to begin operations with aircraft models that have demonstrated lower accident rates historically.

-Leverage Advanced Weather  Tech:Invest in aircraft equipped with advanced weather detection systems to handle adverse conditions effectively.

-Choose Multi-Engine Aircraft for Commercial Operations: For commercial and high-stakes operations, prioritize acquiring multi-engine aircraft due to their lower fatality rates. This aligns with safety-first principles and enhances the company's reputation in aviation safety.