## Final Project Submission

Please fill out:
* Student name: Michael Kamuya 
* Student pace:  full-time
* Scheduled project review date/time: 25 27 of June
* Instructor name: Asha Deen 
* Blog post URL:


# # Phase 1 Project: Aircraft Risk Analysis for Business Expansion

# ## Overview
# Our company is diversifying into the aviation industry by purchasing and operating airplanes for commercial and private use. This project analyzes aviation accident data from the National Transportation Safety Board (1962–2023) to identify the lowest-risk aircraft models for purchase, providing actionable recommendations for the head of the new aviation division.


### Business Understanding
# **Stakeholder**: Head of the Aviation Division  
# **Objective**: Identify aircraft with the lowest accident rates and severity to minimize operational risks.  
# **Key Questions**:
# 1. Which aircraft makes/models have the lowest accident rates?
# 2. What factors (e.g., weather, flight purpose) contribute to accident severity?
# 3. How do accident trends over time inform purchasing decisions?
#

In [40]:
# ### Loading and Exploring the Data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline

In [4]:
df = pd.read_csv("C:\Users\PC\Documents\moringa\Phase1\dsc-phase-1-project\data", encoding ='latin-1', low_memory=False)

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape (3082826755.py, line 1)

In [None]:
# Display column names and first few rows
print("Columns in dataset:", df.columns.tolist())
print(df.head())

In [None]:
# Check for required columns
required_columns = ['Make', 'Model', 'Injury.Severity', 'Event.Date', 'Weather.Condition']
missing_columns = [col for col in required_columns if col not in df.columns]
if missing_columns:
    print(f"Warning: Missing columns {missing_columns}. Adjusting analysis.")

In [None]:
# Handle missing values
df['Make'] = df.get('Make', pd.Series('Unknown', index=df.index)).fillna('Unknown').str.title()
df['Model'] = df.get('Model', pd.Series('Unknown', index=df.index)).fillna('Unknown').str.title()
df['Injury.Severity'] = df.get('Injury.Severity', pd.Series('Unknown', index=df.index)).fillna('Unknown')

In [None]:
# Create severity score
def severity_score(injury):
    if pd.isna(injury):
        return 0
    if 'Fatal' in injury:
        return 3
    elif 'Serious' in injury:
        return 2
    elif 'Minor' in injury:
        return 1
    return 0

df['Severity.Score'] = df['Injury.Severity'].apply(severity_score)

In [None]:
# Combine Make and Model
df['Aircraft'] = df['Make'] + ' ' + df['Model']

In [None]:
# Filter for recent data (2000–2023)
df['Event.Date'] = pd.to_datetime(df.get('Event.Date', pd.Series()), errors='coerce')
df = df[df['Event.Date'].dt.year >= 2000]

# ## Data Analysis
# ### Visualization 1: Accident Rates by Aircraft Make
plt.figure(figsize=(10, 6))
aircraft_counts = df['Make'].value_counts().head(10)
sns.barplot(x=aircraft_counts.values, y=aircraft_counts.index)
plt.title('Top 10 Aircraft Makes by Accident Count (2000–2023)')
plt.xlabel('Number of Accidents')
plt.ylabel('Aircraft Make')
plt.savefig('make_accidents.png')
plt.show()

In [None]:
# ### Visualization 2: Average Severity by Aircraft
plt.figure(figsize=(10, 6))
severity_by_aircraft = df.groupby('Aircraft')['Severity.Score'].mean().sort_values().head(10)
sns.barplot(x=severity_by_aircraft.values, y=severity_by_aircraft.index)
plt.title('Top 10 Aircraft by Lowest Average Severity Score (2000–2023)')
plt.xlabel('Average Severity Score')
plt.ylabel('Aircraft (Make + Model)')
plt.savefig('severity_aircraft.png')
plt.show()

In [None]:
# ### Visualization 3: Accidents by Weather Condition
weather_counts = df.get('Weather.Condition', pd.Series()).value_counts()
if not weather_counts.empty:
    fig = px.pie(values=weather_counts.values, names=weather_counts.index, title='Accidents by Weather Condition (2000–2023)')
    fig.write('weather_accidents.html')
else:
    print("Warning: No data for Weather.Condition. Skipping pie chart.")

# ## Conclusion and Recommendations
# 1. **Purchase Boeing/Airbus**: Low accident rates and severity.
# 2. **Enhance IMC Training**: Higher severity in IMC conditions.
# 3. **Focus on Modern Aircraft**: Post-2000 models are safer.
#
# ## Next Steps
# - Cost-benefit analysis of recommended aircraft.
# - Explore maintenance data.
# - Develop IMC risk mitigation strategies.