# Aircraft Risk Analysis Notebook

This notebook analyzes aviation accident data to determine the safest aircraft models for purchase.

## Step 1: Import Libraries and Load Data
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Show plots inside the notebook
%matplotlib inline

sns.set_style("whitegrid")
sns.set_palette("Set2")

file_path = "Myphase1project/Aviation_Data.csv"
aviation_data = pd.read_csv(file_path)

print("Dataset Shape:", aviation_data.shape)
print(aviation_data.head())
```

## Step 2: Data Cleaning
```python
injury_columns = ['Total.Fatal.Injuries', 'Total.Serious.Injuries',
                  'Total.Minor.Injuries', 'Total.Uninjured']
aviation_data[injury_columns] = aviation_data[injury_columns].fillna(0)

aviation_data['Aircraft.damage'] = aviation_data['Aircraft.damage'].fillna('Unknown')

damage_mapping = {
    'Destroyed': 1.0,
    'Substantial': 0.7,
    'Minor': 0.3,
    'None': 0.0,
    'Unknown': 0.5
}
aviation_data['Damage.Index'] = aviation_data['Aircraft.damage'].map(damage_mapping)
```

## Step 3: Create Risk Metrics
```python
aviation_data['Total_Persons'] = aviation_data[injury_columns].sum(axis=1)

aviation_data['Survival.Index'] = np.where(
    aviation_data['Total_Persons'] > 0,
    aviation_data['Total.Uninjured'] / aviation_data['Total_Persons'],
    np.nan
)
aviation_data['Survival.Index'] = aviation_data['Survival.Index'].fillna(0)

aviation_data['Risk.Index'] = (1 - aviation_data['Survival.Index']) * 0.6 + aviation_data['Damage.Index'] * 0.4
```

## Step 4: Aggregate by Make and Model
```python
aircraft_summary = aviation_data.groupby(['Make', 'Model']).agg({
    'Survival.Index': 'mean',
    'Damage.Index': 'mean',
    'Risk.Index': 'mean',
    'Total_Persons': 'sum'
}).reset_index()

aircraft_summary = aircraft_summary[aircraft_summary['Total_Persons'] > 5]

best_10 = aircraft_summary.sort_values(by='Risk.Index').head(10)
worst_10 = aircraft_summary.sort_values(by='Risk.Index', ascending=False).head(10)

print("Top 10 Best Performing Aircraft:\n", best_10)
print("\nTop 10 Worst Performing Aircraft:\n", worst_10)
```

## Step 5: Top 10 Best Aircraft
```python
plt.figure(figsize=(12, 6))
sns.barplot(data=best_10, x='Risk.Index', y='Model', hue='Make')
plt.title('Top 10 Lowest Risk Aircraft')
plt.xlabel('Risk Index (Lower = Safer)')
plt.ylabel('Aircraft Model')
plt.legend(title='Make')
plt.tight_layout()
plt.show()
```

## Step 6: Top 10 Worst Aircraft
```python
plt.figure(figsize=(12, 6))
sns.barplot(data=worst_10, x='Risk.Index', y='Model', hue='Make')
plt.title('Top 10 Highest Risk Aircraft')
plt.xlabel('Risk Index (Higher = More Risky)')
plt.ylabel('Aircraft Model')
plt.legend(title='Make')
plt.tight_layout()
plt.show()
```

## Step 7: Scatter Plot – Survival vs. Damage Index
```python
plt.figure(figsize=(10, 6))
sns.scatterplot(
    data=aircraft_summary,
    x='Damage.Index',
    y='Survival.Index',
    size='Total_Persons',
    hue='Make',
    alpha=0.7,
    sizes=(50, 300)
)
plt.title('Survival vs Damage Index')
plt.xlabel('Damage Index (Higher = More Damage)')
plt.ylabel('Survival Index (Higher = Safer)')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
```

## Step 8: Correlation Plot
```python
corr_columns = ['Survival.Index', 'Damage.Index', 'Risk.Index', 'Total_Persons']
corr_matrix = aircraft_summary[corr_columns].corr()

plt.figure(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Matrix of Risk Metrics')
plt.tight_layout()
plt.show()
```

## Step 9: Purpose of Flight vs Risk
```python
plt.figure(figsize=(12, 6))
sns.boxplot(data=aviation_data, x='Purpose.of.flight', y='Risk.Index', palette='husl')
plt.xticks(rotation=45)
plt.title('Risk Index by Purpose of Flight')
plt.xlabel('Purpose of Flight')
plt.ylabel('Risk Index')
plt.tight_layout()
plt.show()
```

## Step 10: Trend of Accidents Over Time
```python
aviation_data['Event.Date'] = pd.to_datetime(aviation_data['Event.Date'], errors='coerce')
aviation_data['Year'] = aviation_data['Event.Date'].dt.year

accidents_by_year = aviation_data.groupby('Year').size()

plt.figure(figsize=(12, 6))
accidents_by_year.plot(kind='line', color='purple')
plt.title('Accidents Trend Over the Years')
plt.xlabel('Year')
plt.ylabel('Number of Accidents')
plt.grid(True)
plt.tight_layout()
plt.show()
```