# **Aviation Safety Analysis: Identifying Low-Risk Aircraft for Business Expansion**
___

### **Objective**
Our company is expanding into the aviation industry but lacks insights into aircraft safety risks. This project aims to analyze aviation accident data to determine which aircraft pose the lowest risk, providing data-driven recommendations for strategic decision-making.

### **Project Breakdown**
To achieve this objective, we will:
1. **Clean and preprocess the data** – Handle missing values and inconsistencies.
2. **Explore trends and patterns** – Identify factors contributing to accidents.
3. **Perform risk assessment** – Determine which aircraft have the safest records.
4. **Generate visualizations** – Communicate findings effectively.
5. **Provide business recommendations** – Translate insights into actionable decisions.

### **Dataset Overview**
We will use two datasets from the National Transportation Safety Board (NTSB):

#### **1. Aviation Accident Data (`df1`)**
- **Number of Rows:** 88,889  
- **Number of Columns:** 31  
- **Key Features:**  
  - **Event details** – Date, location, severity, phase of flight.  
  - **Aircraft information** – Make, model, category, engine type.  
  - **Injury data** – Fatal, serious, minor injuries.  
  - **Weather conditions** – Visibility, weather impact on accidents.  

#### **2. U.S. State Codes (`df2`)**
- **Number of Rows:** 62  
- **Number of Columns:** 2  
- **Key Features:**  
  - **US_State** – Full state names.  
  - **Abbreviation** – Standard two-letter state codes. 

### **Next Steps....**
We will now proceed with data exploration and cleaning to uncover meaningful insights. 
___

### 

## Importing Libraries and Loading Our Datasets

To begin our analysis, we import key Python libraries that will help us manipulate data and create visualizations:  
- **pandas (`pd`)** – For data manipulation and analysis.  
- **numpy (`np`)** – For numerical operations.  
- **matplotlib (`plt`)** – For basic data visualization.  
- **seaborn (`sns`)** – For enhanced statistical visualizations.  

We then load our datasets:  
- `df1`: The **Aviation Accident Data** file (`AviationData.csv`), which contains detailed information on aviation accidents.  
- `df2`: The **U.S. State Codes** file (`USState_Codes.csv`), which contains selected incidents in the United States and international waters.  

We use `encoding='latin-1'` to handle special characters and `low_memory=False` to optimize performance when loading `df1`, as it is a large dataset.  


In [18]:
#importing all the necessary libraries to ensure my code runs efficiently 

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 

#Opening and reading the files whose data I will be utilizing 

df1 = pd.read_csv('AviationData.csv', encoding ='latin-1', low_memory=False)

df2 = pd.read_csv('USState_Codes.csv')

In [24]:
df2.columns

Index(['US_State', 'Abbreviation'], dtype='object')