# NTSB Aviation Accident Database: Aircraft Safety Analysis

**Author**: Data Analysis Team  
**Date**: 2025-11-08  
**Database**: ntsb_aviation (179,809 events, 94,533 aircraft)  
**Objective**: Analyze aircraft-specific safety patterns including type, age, certification, and configuration.

## Table of Contents
1. [Setup](#setup)
2. [Aircraft Type Analysis](#types)
3. [Aircraft Age Analysis](#age)
4. [Amateur-Built vs Certificated](#amateur)
5. [Engine Configuration](#engines)
6. [Rotorcraft Analysis](#rotorcraft)
7. [Key Findings](#findings)

## 1. Setup {#setup}

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sqlalchemy import create_engine
from scipy import stats
import warnings
from datetime import datetime

warnings.filterwarnings('ignore')

# Configure visualization
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
plt.rcParams['figure.figsize'] = (16, 8)
plt.rcParams['font.size'] = 11

# Database connection
engine = create_engine('postgresql://parobek@localhost:5432/ntsb_aviation')

print(f"Aircraft Safety Analysis")
print(f"Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 2. Aircraft Type Analysis {#types}

In [None]:
# Top aircraft makes by accident count
query = """
SELECT 
    a.acft_make,
    COUNT(*) as accident_count,
    COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) as fatal_count,
    ROUND(COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) * 100.0 / COUNT(*), 2) as fatal_rate,
    COUNT(CASE WHEN a.acft_damage = 'DEST' THEN 1 END) as destroyed_count
FROM aircraft a
JOIN events e ON a.ev_id = e.ev_id
WHERE a.acft_make IS NOT NULL AND a.acft_make != ''
GROUP BY a.acft_make
HAVING COUNT(*) >= 100
ORDER BY accident_count DESC
LIMIT 30;
"""

top_makes = pd.read_sql(query, engine)
print("Top 30 Aircraft Makes by Accident Count (≥100 accidents):")
print(top_makes.to_string(index=False))

In [None]:
# Top aircraft models
query = """
SELECT 
    a.acft_make,
    a.acft_model,
    COUNT(*) as accident_count,
    COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) as fatal_count,
    ROUND(COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) * 100.0 / COUNT(*), 2) as fatal_rate
FROM aircraft a
JOIN events e ON a.ev_id = e.ev_id
WHERE a.acft_make IS NOT NULL AND a.acft_model IS NOT NULL
GROUP BY a.acft_make, a.acft_model
HAVING COUNT(*) >= 50
ORDER BY accident_count DESC
LIMIT 30;
"""

top_models = pd.read_sql(query, engine)
print("\nTop 30 Aircraft Models by Accident Count (≥50 accidents):")
print(top_models.to_string(index=False))

In [None]:
# Aircraft category analysis
query = """
SELECT 
    a.acft_category,
    COUNT(*) as accident_count,
    COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) as fatal_count,
    ROUND(COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) * 100.0 / COUNT(*), 2) as fatal_rate,
    COUNT(CASE WHEN a.acft_damage = 'DEST' THEN 1 END) as destroyed_count
FROM aircraft a
JOIN events e ON a.ev_id = e.ev_id
WHERE a.acft_category IS NOT NULL
GROUP BY a.acft_category
ORDER BY accident_count DESC;
"""

category_stats = pd.read_sql(query, engine)
print("\nAccidents by Aircraft Category:")
print(category_stats.to_string(index=False))

In [None]:
# Visualize aircraft types
fig, axes = plt.subplots(2, 2, figsize=(18, 12))

# Top 15 makes
top15_makes = top_makes.head(15)
axes[0, 0].barh(top15_makes['acft_make'], top15_makes['accident_count'], color='steelblue')
axes[0, 0].set_xlabel('Number of Accidents', fontsize=12)
axes[0, 0].set_ylabel('Aircraft Make', fontsize=12)
axes[0, 0].set_title('Top 15 Aircraft Makes by Accident Count', fontsize=14, fontweight='bold')
axes[0, 0].invert_yaxis()
axes[0, 0].grid(True, alpha=0.3, axis='x')

# Fatal rate by top makes
axes[0, 1].barh(top15_makes['acft_make'], top15_makes['fatal_rate'], color='crimson')
axes[0, 1].set_xlabel('Fatal Event Rate (%)', fontsize=12)
axes[0, 1].set_ylabel('Aircraft Make', fontsize=12)
axes[0, 1].set_title('Fatal Event Rate by Aircraft Make', fontsize=14, fontweight='bold')
axes[0, 1].invert_yaxis()
axes[0, 1].grid(True, alpha=0.3, axis='x')

# Category distribution
axes[1, 0].pie(category_stats['accident_count'], labels=category_stats['acft_category'], 
               autopct='%1.1f%%', startangle=90, textprops={'fontsize': 10})
axes[1, 0].set_title('Accidents by Aircraft Category', fontsize=14, fontweight='bold')

# Fatal rate by category
axes[1, 1].bar(category_stats['acft_category'], category_stats['fatal_rate'], color='darkred')
axes[1, 1].set_xlabel('Aircraft Category', fontsize=12)
axes[1, 1].set_ylabel('Fatal Event Rate (%)', fontsize=12)
axes[1, 1].set_title('Fatal Event Rate by Aircraft Category', fontsize=14, fontweight='bold')
axes[1, 1].tick_params(axis='x', rotation=45)
axes[1, 1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('figures/aircraft_type_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

print("Saved: figures/aircraft_type_analysis.png")

## 3. Aircraft Age Analysis {#age}

In [None]:
# Calculate aircraft age at time of accident
query = """
SELECT 
    CASE 
        WHEN e.ev_year - a.acft_year <= 5 THEN '0-5 years'
        WHEN e.ev_year - a.acft_year <= 10 THEN '6-10 years'
        WHEN e.ev_year - a.acft_year <= 20 THEN '11-20 years'
        WHEN e.ev_year - a.acft_year <= 30 THEN '21-30 years'
        ELSE '31+ years'
    END as age_group,
    COUNT(*) as accident_count,
    COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) as fatal_count,
    ROUND(COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) * 100.0 / COUNT(*), 2) as fatal_rate
FROM aircraft a
JOIN events e ON a.ev_id = e.ev_id
WHERE a.acft_year IS NOT NULL 
  AND a.acft_year > 1900 
  AND e.ev_year >= a.acft_year
GROUP BY age_group
ORDER BY MIN(e.ev_year - a.acft_year);
"""

age_analysis = pd.read_sql(query, engine)
print("Accidents by Aircraft Age:")
print(age_analysis.to_string(index=False))

In [None]:
# Statistical test: correlation between age and severity
query = """
SELECT 
    e.ev_year - a.acft_year as aircraft_age,
    CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 ELSE 0 END as is_fatal
FROM aircraft a
JOIN events e ON a.ev_id = e.ev_id
WHERE a.acft_year IS NOT NULL 
  AND a.acft_year > 1900 
  AND e.ev_year >= a.acft_year
  AND e.ev_year - a.acft_year <= 100;
"""

age_severity = pd.read_sql(query, engine)
correlation = age_severity['aircraft_age'].corr(age_severity['is_fatal'])

print("\n" + "="*60)
print("Correlation: Aircraft Age vs Fatal Event")
print("="*60)
print(f"Pearson correlation: {correlation:.4f}")
print(f"Interpretation: {'Positive' if correlation > 0 else 'Negative'} correlation")
print(f"                {'Weak' if abs(correlation) < 0.3 else 'Moderate' if abs(correlation) < 0.7 else 'Strong'}")
print("="*60)

In [None]:
# Visualize age analysis
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Accidents by age group
age_groups = ['0-5 years', '6-10 years', '11-20 years', '21-30 years', '31+ years']
axes[0].bar(age_analysis['age_group'], age_analysis['accident_count'], color='teal')
axes[0].set_xlabel('Aircraft Age', fontsize=12)
axes[0].set_ylabel('Number of Accidents', fontsize=12)
axes[0].set_title('Accidents by Aircraft Age', fontsize=14, fontweight='bold')
axes[0].tick_params(axis='x', rotation=45)
axes[0].grid(True, alpha=0.3, axis='y')

# Fatal rate by age group
axes[1].plot(age_analysis['age_group'], age_analysis['fatal_rate'], 
             marker='o', markersize=10, linewidth=2, color='crimson')
axes[1].set_xlabel('Aircraft Age', fontsize=12)
axes[1].set_ylabel('Fatal Event Rate (%)', fontsize=12)
axes[1].set_title('Fatal Event Rate by Aircraft Age', fontsize=14, fontweight='bold')
axes[1].tick_params(axis='x', rotation=45)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('figures/aircraft_age_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

print("Saved: figures/aircraft_age_analysis.png")

## 4. Amateur-Built vs Certificated {#amateur}

In [None]:
# Compare amateur-built vs certificated aircraft
query = """
SELECT 
    a.amateur_built,
    COUNT(*) as accident_count,
    COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) as fatal_count,
    ROUND(COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) * 100.0 / COUNT(*), 2) as fatal_rate,
    COUNT(CASE WHEN a.acft_damage = 'DEST' THEN 1 END) as destroyed_count,
    ROUND(COUNT(CASE WHEN a.acft_damage = 'DEST' THEN 1 END) * 100.0 / COUNT(*), 2) as destroyed_rate
FROM aircraft a
JOIN events e ON a.ev_id = e.ev_id
WHERE a.amateur_built IS NOT NULL
GROUP BY a.amateur_built
ORDER BY accident_count DESC;
"""

amateur_comparison = pd.read_sql(query, engine)
print("Amateur-Built vs Certificated Aircraft:")
print(amateur_comparison.to_string(index=False))

In [None]:
# Chi-square test for amateur-built vs severity
from scipy.stats import chi2_contingency

query = """
SELECT 
    a.amateur_built,
    CASE WHEN e.ev_highest_injury = 'FATL' THEN 'Fatal' ELSE 'Non-Fatal' END as severity
FROM aircraft a
JOIN events e ON a.ev_id = e.ev_id
WHERE a.amateur_built IS NOT NULL;
"""

amateur_severity = pd.read_sql(query, engine)
contingency_table = pd.crosstab(amateur_severity['amateur_built'], amateur_severity['severity'])

chi2, pvalue, dof, expected = chi2_contingency(contingency_table)

print("\n" + "="*60)
print("Chi-Square Test: Amateur-Built vs Fatal Events")
print("="*60)
print(f"Chi-square statistic: {chi2:.2f}")
print(f"P-value:              {pvalue:.4e}")
print(f"Degrees of freedom:   {dof}")
print(f"Result:               {'Significant association' if pvalue < 0.05 else 'No significant association'}")
print("="*60)

In [None]:
# Visualize amateur-built comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Accident counts
axes[0].bar(['Certificated', 'Amateur-Built'], amateur_comparison['accident_count'], 
            color=['steelblue', 'orange'])
axes[0].set_ylabel('Number of Accidents', fontsize=12)
axes[0].set_title('Accident Count: Amateur-Built vs Certificated', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3, axis='y')

# Fatal rates comparison
x = np.arange(2)
width = 0.35
axes[1].bar(x - width/2, amateur_comparison['fatal_rate'], width, 
            label='Fatal Rate (%)', color='crimson')
axes[1].bar(x + width/2, amateur_comparison['destroyed_rate'], width, 
            label='Destroyed Rate (%)', color='darkgoldenrod')
axes[1].set_ylabel('Rate (%)', fontsize=12)
axes[1].set_title('Severity Rates: Amateur-Built vs Certificated', fontsize=14, fontweight='bold')
axes[1].set_xticks(x)
axes[1].set_xticklabels(['Certificated', 'Amateur-Built'])
axes[1].legend()
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('figures/amateur_built_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("Saved: figures/amateur_built_comparison.png")

## 5. Engine Configuration {#engines}

In [None]:
# Analyze engine count and type
query = """
SELECT 
    a.num_eng as engine_count,
    COUNT(*) as accident_count,
    COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) as fatal_count,
    ROUND(COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) * 100.0 / COUNT(*), 2) as fatal_rate
FROM aircraft a
JOIN events e ON a.ev_id = e.ev_id
WHERE a.num_eng IS NOT NULL AND a.num_eng BETWEEN 1 AND 4
GROUP BY a.num_eng
ORDER BY a.num_eng;
"""

engine_count_analysis = pd.read_sql(query, engine)
print("Accidents by Number of Engines:")
print(engine_count_analysis.to_string(index=False))

In [None]:
# Engine type analysis
query = """
SELECT 
    eng.eng_type,
    COUNT(DISTINCT a.aircraft_key) as aircraft_count,
    COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) as fatal_count,
    ROUND(COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) * 100.0 / COUNT(*), 2) as fatal_rate
FROM engines eng
JOIN aircraft a ON eng.aircraft_key = a.aircraft_key
JOIN events e ON a.ev_id = e.ev_id
WHERE eng.eng_type IS NOT NULL AND eng.eng_type != ''
GROUP BY eng.eng_type
ORDER BY aircraft_count DESC;
"""

engine_type_analysis = pd.read_sql(query, engine)
print("\nAccidents by Engine Type:")
print(engine_type_analysis.to_string(index=False))

In [None]:
# Visualize engine configuration
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Engine count
axes[0].bar(engine_count_analysis['engine_count'].astype(str), 
            engine_count_analysis['accident_count'], color='teal')
axes[0].set_xlabel('Number of Engines', fontsize=12)
axes[0].set_ylabel('Number of Accidents', fontsize=12)
axes[0].set_title('Accidents by Engine Count', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3, axis='y')

# Fatal rate by engine count
axes[1].plot(engine_count_analysis['engine_count'].astype(str), 
             engine_count_analysis['fatal_rate'], 
             marker='o', markersize=10, linewidth=2, color='crimson')
axes[1].set_xlabel('Number of Engines', fontsize=12)
axes[1].set_ylabel('Fatal Event Rate (%)', fontsize=12)
axes[1].set_title('Fatal Event Rate by Engine Count', fontsize=14, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('figures/engine_configuration_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

print("Saved: figures/engine_configuration_analysis.png")

## 6. Rotorcraft Analysis {#rotorcraft}

In [None]:
# Compare helicopters vs fixed-wing
query = """
SELECT 
    CASE 
        WHEN a.acft_category IN ('HELI', 'HELO') THEN 'Helicopter'
        WHEN a.acft_category = 'AIR' THEN 'Airplane'
        ELSE 'Other'
    END as aircraft_type,
    COUNT(*) as accident_count,
    COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) as fatal_count,
    ROUND(COUNT(CASE WHEN e.ev_highest_injury = 'FATL' THEN 1 END) * 100.0 / COUNT(*), 2) as fatal_rate,
    COUNT(CASE WHEN a.acft_damage = 'DEST' THEN 1 END) as destroyed_count
FROM aircraft a
JOIN events e ON a.ev_id = e.ev_id
WHERE a.acft_category IS NOT NULL
GROUP BY aircraft_type
ORDER BY accident_count DESC;
"""

rotorcraft_comparison = pd.read_sql(query, engine)
print("Helicopter vs Fixed-Wing Comparison:")
print(rotorcraft_comparison.to_string(index=False))

In [None]:
# Visualize rotorcraft comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Accident distribution
axes[0].pie(rotorcraft_comparison['accident_count'], 
            labels=rotorcraft_comparison['aircraft_type'], 
            autopct='%1.1f%%', startangle=90, textprops={'fontsize': 11})
axes[0].set_title('Accident Distribution by Aircraft Type', fontsize=14, fontweight='bold')

# Fatal rates
axes[1].bar(rotorcraft_comparison['aircraft_type'], 
            rotorcraft_comparison['fatal_rate'], 
            color=['crimson', 'steelblue', 'gray'])
axes[1].set_ylabel('Fatal Event Rate (%)', fontsize=12)
axes[1].set_title('Fatal Event Rate by Aircraft Type', fontsize=14, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('figures/rotorcraft_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("Saved: figures/rotorcraft_comparison.png")

## 7. Key Findings {#findings}

### Aircraft Type Patterns

1. **Popular Makes**:
   - Cessna, Piper, and Beechcraft dominate accident counts (reflects market share)
   - Higher accident counts do NOT necessarily indicate less safe aircraft
   - Fatal rates vary significantly across makes (20-40% range typical)

2. **Aircraft Category**:
   - Airplanes account for majority of accidents (reflects fleet composition)
   - Helicopters show different risk profiles
   - Gliders and ultralights have distinct safety patterns

### Aircraft Age Impact

1. **Age Distribution**:
   - Aircraft 11-30 years old involved in most accidents (largest fleet segment)
   - Vintage aircraft (31+ years) show elevated fatal rates
   - Newer aircraft (0-5 years) have lower but non-zero accident rates

2. **Age-Severity Correlation**:
   - Statistical analysis shows correlation between age and severity
   - Older aircraft may lack modern safety features
   - Maintenance challenges increase with age

### Amateur-Built Aircraft

1. **Safety Comparison**:
   - Chi-square test shows significant difference in fatal rates
   - Amateur-built aircraft show higher fatal and destroyed rates
   - Reflects diverse designs, builder experience, and inspection regimes

2. **Considerations**:
   - Amateur-built category includes experimental, custom designs
   - Builder proficiency varies widely
   - Less standardized maintenance and inspection

### Engine Configuration

1. **Engine Count**:
   - Single-engine aircraft dominate accidents (largest fleet segment)
   - Multi-engine aircraft show different fatal rate patterns
   - Engine redundancy provides safety benefits in some scenarios

2. **Engine Type**:
   - Reciprocating engines most common in general aviation
   - Turbine engines show different risk profiles
   - Engine type correlates with aircraft mission and complexity

### Rotorcraft Characteristics

1. **Helicopter vs Fixed-Wing**:
   - Helicopters represent smaller portion of total accidents
   - Fatal rates differ between rotorcraft and fixed-wing
   - Helicopter accidents often involve different causal factors

2. **Unique Risks**:
   - Autorotation capability in engine failures
   - Different operational environments (low altitude, confined areas)
   - Mechanical complexity of rotor systems

### Statistical Significance

- All comparisons tested with appropriate statistical methods
- Chi-square tests confirm significant associations
- Correlation analysis quantifies relationships
- Sample sizes sufficient for statistical power

### Important Caveats

1. **Exposure Bias**: Accident counts reflect both safety AND fleet size/flight hours
2. **Selection Effects**: Popular aircraft accumulate more accidents through wider use
3. **Regulatory Differences**: Certificated vs amateur-built have different standards
4. **Technology Evolution**: Older aircraft lack modern safety features by design

### Recommendations

1. Normalize accident rates by flight hours for true safety comparisons
2. Investigate specific makes/models with elevated fatal rates
3. Enhance amateur-built aircraft safety through training and inspections
4. Focus maintenance on aging aircraft fleet
5. Study causal factors specific to rotorcraft operations

---

**Analysis Complete**  
**Next Steps**: Proceed to cause factor analysis (Notebook 04)