<a href="https://colab.research.google.com/github/J-Neff/MAT421/blob/main/Project_Plan.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **NTSB Air Accident Numerical Problem Analysis**
## **Introduction**
The National Transportation Safety Board (NTSB) is the federal organization that investigates accidents and significant events involving aviation, railroad, transit, highway, marine, pipeline, and commercial space in the United States and involved U.S. companies/citizens in other countries [1]. For this project, the total volume of data avaliable via the NTSB's public records is unworkable. As such, some constraints have been used to narrow the focus of the project and the dataset. These include:

**Restricting the event type to ACC (accident).**
The NTSB classifies events as accidents, incidents, and occurances [1]. An accident is classified as an instance in which injury or damage to vehicle has occured or had likely potential to have occured.

**Restricting vehicle category to AIR (airplane).**
For the interests of this project, we only wish to consider aviation vehicles. Within that category we also wish to exclude non-airplane air vehicles (i.e. hot air balloons, gliders, etc.).

**Restricting FAR category (Federal Aviation Requirements) to Part 121 (Regularly Scheduled Air Carriers)[2].**
This constraint serves multiple purposes. Firstly, we only wish to provide analysis of commercial flight accidents for this project. Secondly, this category provides a distinction in type of aircraft. All amateur aircraft are excluded from this category, which when included could cause a problem in regards to the analysis of aircraft weight, maintenance, and pilot skill. Restricting to this FAR category also provides better insight to the intended audience, which is the general public who might be interested in the safety of commercial flight.

**Using only closed accident reports**.
This makes sure data is only included from concluded investigations as there are many that are still ongoing and do not have usable data due to that status. Closed accident reports include investigations in which a report was not created (for the most part investigations in foreign countries) or has been completed.

##**Source of Data**
As a federal organization, the NTSB data on completed investigations must be avaliable to the public. Via the internet the NTSB provides CAROL (Case Analysis and Reporting Online) as a means of accessing information[3]. Within CAROL, the Aviation Investigation Search includes all public aviation accidents from 1962 to present. These include all incidents in which United States territory or citizens are involved. In addition, the NTSB is often asked to consult on international cases to share investigative skill, technique, or technology especially for countries with less established air investigation teams.

Once the previously stated constraints were entered as a custom search, the results of the search were downloaded as a CSV file. For ease of visualization, this CSV file was then imported into a Google Sheet file. This file will be used as the dataset for this project. No changes were made between the search, download as CSV, and import into Sheets.

## **Problems to Consider**
The problems to be considered in this project address:


1.   Likelyhood of accident by airline, aircraft make, location, date, and weather.
2.   Severity of accident by airline, aircraft make, location, date, and weather.
3.   Predictive analysis of accident by airline, aircraft make, location, and weather based on date.
4.   Length of time between accident and investigation completion analysed by severity, airline, aircraft make, location, and weather.

Modeling will be done using python's built-in tools, graphing, charts, and predictive tools.

## **Expectations**
The expectations of this project is to provide data analysis and graphics that would be useful to a traveler in determining the airline, airplane model, location, and weather in which they wish to travel. This project will also predict factors influencing the future safety of airline travel.







##**Prepare Data**

In [None]:
#from google.colab import drive
#drive.mount('/content/drive')
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import datetime as dt
data = 'https://docs.google.com/spreadsheets/d/1kyd3T7N7QrAQ6UDe9sLYgMo8dvDdxu8ZUbcS6W2sA3Q/export?format=csv'
df = pd.read_csv(data, keep_default_na=False)
df.head()
name = df.City + "," + df.State + "," + df.Country;

##**Initial Statistics**

In [None]:
severity = px.pie(values = [sum(df.FatalInjuryCount), sum(df.SeriousInjuryCount), sum(df.MinorInjuryCount), sum(df.HighestInjuryLevel == "None")],
             names = ['Fatal', 'Serious', 'Minor', 'No Injuries'],
             color = ['k', 'r', 'y', 'g'],
             color_discrete_map = {'k': 'black',
                                   'r': 'red',
                                   'y': 'yellow',
                                   'g': 'green'},
             title = "Percentage of Injury Severity for Accidents")
severity.show()

In [None]:
map = px.scatter_geo(df,
                    lat=df.Latitude,
                    lon=df.Longitude,
                    hover_name=name)
map.show()

## **Works Cited**
[1]"Who We Are and What We Do," NTSB.gov, 2023. [Online]. Avaliable: https://www.ntsb.gov/Pages/home.aspx. [Accessed Feb. 22, 2024]

[2]"Regularly Scheduled Air Carriers (Part 121),"faa.gov, 2023. [Online]Avaliable: https://www.faa.gov/hazmat/air_carriers/operations/part_121.[Accessed Feb. 22, 2024]

[3]"Case Analysis and Reporting Online," carol.ntsb.gov, 2023. [Online]. Avaliable: https://carol.ntsb.gov/. [Accessed Feb. 22, 2024]

[4]"NTSB Dataset," docs.google.com, 2023. [Online]. Avaliable: https://docs.google.com/spreadsheets/d/1kyd3T7N7QrAQ6UDe9sLYgMo8dvDdxu8ZUbcS6W2sA3Q/. [Accessed Feb. 22, 2024]