# Investigating Airplane Accidents

## Objective

Use search algorithms and data structures to explore airplane accident data.

Work with a data set of airplane accident statistics to analyze patterns and look for any common threads.

## Data Set

The data set that contains 77,282 aviation accidents that occurred in the U.S., and the metadata associated with them. 

The data named AviationData.txt comes from the [National Transportation Safety Board (NTSB)](National Transportation Safety Board (NTSB)) that can be downloaded from [data.gov](http://catalog.data.gov/dataset/aviation-data-and-documentation-from-the-ntsb-accident-database-system-05748/resource/4b1e95fe-91a7-4112-85fa-424d2672a906)

Here are descriptions for some of the columns:

| Column                 | Description                                            |
|------------------------|--------------------------------------------------------|
| Event Id               | The unique id for the incident                         |
| Investigation Type     | The type of investigation the NTSB conducted           |
| Event Date             | The date of the accident                               |
| Location               | Where the accident occurred                            |
| Country                | The country where the accident occurred                |
| Latitude               | The latitude where the accident occurred               |
| Longitude              | The longitude where the accident occurred              |
| Injury Severity        | The severity of any injuries                           |
| Aircraft Damage        | The extent of the damage to the aircraft               |
| Aircraft Category      | The type of aircraft                                   |
| Make                   | The make of the aircraft                               |
| Model                  | The model of the aircraft                              |
| Number of Engines      | The number of engines on the plane                     |
| Air Carrier            | The carrier operating the aircraft                     |
| Total Fatal Injuries   | The number of fatal injuries                           |
| Total Serious Injuries | The number of serious injuries                         |
| Total Minor Injuries   | The number of minor injuries                           |
| Total Uninjured        | The number of people who did not sustain injuries      |
| Broad Phase of Flight  | The phase of flight during which the accident occurred |


## Reading in the Data

In [1]:
aviation_data = [x for x in open("C:/Users/i7/csv/AviationData.txt", "r")]
aviation_list = [x.split(" | ") for x in aviation_data]
lax_code = [row for row in aviation_list if "LAX94LA336" in row]
print(lax_code)

[['20001218X45447', 'Accident', 'LAX94LA336', '07/19/1962', 'BRIDGEPORT, CA', 'United States', '', '', '', '', 'Fatal(4)', 'Destroyed', '', 'N5069P', 'PIPER', 'PA24-180', 'No', '1', 'Reciprocating', '', '', 'Personal', '', '4', '0', '0', '0', 'UNK', 'UNKNOWN', 'Probable Cause', '09/19/1996', '\n']]


In [2]:
import csv
f=open("C:/Users/i7/csv/AviationData.txt")
aviation_data=f.readlines()
aviation_list=[]
for line in aviation_data:
    aviation_list.append(line.split(" | "))
lax_code=[]
for line in aviation_list:
    if 'LAX94LA336' in line:
        lax_code.append(line)
print(lax_code)

[['20001218X45447', 'Accident', 'LAX94LA336', '07/19/1962', 'BRIDGEPORT, CA', 'United States', '', '', '', '', 'Fatal(4)', 'Destroyed', '', 'N5069P', 'PIPER', 'PA24-180', 'No', '1', 'Reciprocating', '', '', 'Personal', '', '4', '0', '0', '0', 'UNK', 'UNKNOWN', 'Probable Cause', '09/19/1996', '\n']]


Above both process takes exponential time (it had to loop through each row first, and then each column inside that row)

## Reading In the Data with Linear time algorithm

In [3]:
lax_lines = [x for x in open("C:/Users/i7/csv/AviationData.txt", "r") if "LAX94LA336" in x]
print(lax_lines)

['20001218X45447 | Accident | LAX94LA336 | 07/19/1962 | BRIDGEPORT, CA | United States |  |  |  |  | Fatal(4) | Destroyed |  | N5069P | PIPER | PA24-180 | No | 1 | Reciprocating |  |  | Personal |  | 4 | 0 | 0 | 0 | UNK | UNKNOWN | Probable Cause | 09/19/1996 | \n']


#### Hash Tables

So far, the data stored as a list of strings and a list of lists. Now, store the data as a list of dictionaries.

In [4]:
headers = aviation_data[0].split(" | ")
aviation_dict_list = [dict(zip(headers, row.split(" | "))) for row in aviation_data[1:]]
lax_dict = [row for row in aviation_dict_list if "LAX94LA336" in row.values()]
lax_dict

[{'\n': '\n',
  'Accident Number': 'LAX94LA336',
  'Air Carrier': '',
  'Aircraft Category': '',
  'Aircraft Damage': 'Destroyed',
  'Airport Code': '',
  'Airport Name': '',
  'Amateur Built': 'No',
  'Broad Phase of Flight': 'UNKNOWN',
  'Country': 'United States',
  'Engine Type': 'Reciprocating',
  'Event Date': '07/19/1962',
  'Event Id': '20001218X45447',
  'FAR Description': '',
  'Injury Severity': 'Fatal(4)',
  'Investigation Type': 'Accident',
  'Latitude': '',
  'Location': 'BRIDGEPORT, CA',
  'Longitude': '',
  'Make': 'PIPER',
  'Model': 'PA24-180',
  'Number of Engines': '1',
  'Publication Date': '09/19/1996',
  'Purpose of Flight': 'Personal',
  'Registration Number': 'N5069P',
  'Report Status': 'Probable Cause',
  'Schedule': '',
  'Total Fatal Injuries': '4',
  'Total Minor Injuries': '0',
  'Total Serious Injuries': '0',
  'Total Uninjured': '0',
  'Weather Condition': 'UNK'}]

#### Summary :
Search through a list of dictionaries is easier than a list of lists

## Accidents By U.S. State

Count how many accidents occurred in each U.S. state, then determine which state had the most accidents overall.

In [5]:
aviation_dict_list

[{'\n': '\n',
  'Accident Number': 'CEN15LA402',
  'Air Carrier': '',
  'Aircraft Category': 'Unknown',
  'Aircraft Damage': 'Substantial',
  'Airport Code': 'KFEP',
  'Airport Name': 'albertus Airport',
  'Amateur Built': '',
  'Broad Phase of Flight': 'TAKEOFF',
  'Country': 'United States',
  'Engine Type': '',
  'Event Date': '09/08/2015',
  'Event Id': '20150908X74637',
  'FAR Description': 'Part 91: General Aviation',
  'Injury Severity': 'Non-Fatal',
  'Investigation Type': 'Accident',
  'Latitude': '42.246111',
  'Location': 'Freeport, IL',
  'Longitude': '-89.581945',
  'Make': 'CLARKE REGINALD W',
  'Model': 'DRAGONFLY MK',
  'Number of Engines': '',
  'Publication Date': '09/09/2015',
  'Purpose of Flight': 'Personal',
  'Registration Number': 'N24TL',
  'Report Status': 'Preliminary',
  'Schedule': '',
  'Total Fatal Injuries': '',
  'Total Minor Injuries': '',
  'Total Serious Injuries': '1',
  'Total Uninjured': '',
  'Weather Condition': 'VMC'},
 {'\n': '\n',
  'Accident

In [6]:
from collections import defaultdict
import operator

state_accidents = defaultdict(int)
for row in aviation_dict_list:
    if row['Country'] == 'United States' and ", " in row['Location']:
        state = row['Location'].split(", ")[1]
        state_accidents[state] += 1
state_accidents = dict(state_accidents)
most_accident_state = max(state_accidents.items(), 
                          key=operator.itemgetter(1))[0]

print(most_accident_state)
print(state_accidents)


CA
{'IL': 1874, 'NH': 326, 'SD': 393, 'CA': 8030, 'NJ': 1068, 'TN': 951, 'NC': 1433, 'ID': 1228, 'TX': 5112, 'CT': 447, 'PA': 1573, 'MO': 1404, 'NV': 1054, 'LA': 1074, 'NY': 1715, 'WY': 663, 'AZ': 2502, 'AL': 1032, 'ME': 458, 'MI': 1863, 'FL': 5117, 'AR': 1389, 'MN': 1317, 'OK': 1110, 'OH': 1616, 'AK': 5049, 'ND': 514, 'OR': 1559, 'MT': 936, 'IA': 731, 'VA': 1108, 'IN': 1169, 'KY': 577, 'NM': 1219, 'WA': 2353, 'NE': 642, 'WI': 1388, 'UT': 1162, 'KS': 981, 'GA': 1747, 'CO': 2460, 'MA': 896, 'MS': 746, 'SC': 850, 'FN': 5, 'WV': 362, 'PR': 88, 'MD': 720, 'GU': 14, 'HI': 416, 'GM': 62, 'VT': 213, 'RI': 147, 'PO': 8, 'ON': 1, 'Maui': 2, 'AS': 1, 'DE': 100, 'MP': 2, 'VI': 12, 'AO': 7, 'DC': 43, 'San Juan Is.': 1, 'UN': 5, 'NYC': 1, 'Oahu': 1, 'Kauai': 2, 'MAUI': 4, 'KAUAI': 3, 'OAHU': 4, "MANU'A": 1, 'HONOL': 1, 'IS': 1, 'FT. MYER': 1}


## Fatalities And Injuries By Month

Count how many fatalities and serious injuries occurred during each month.

In [7]:
monthly_injuries = defaultdict(lambda: [0,0])
for row in aviation_dict_list:
    month = row['Event Date'].split("/")[0]
    for ix, field in enumerate(['Total Fatal Injuries', 'Total Serious Injuries']):
        if row[field] != '':
            monthly_injuries[month][ix] += int(row[field])
        else:
            monthly_injuries[month][ix] += 0

print(monthly_injuries)

defaultdict(<function <lambda> at 0x0000000004CFBBF8>, {'09': [4027, 1502], '08': [4855, 2069], '07': [5001, 2002], '06': [3557, 1634], '05': [3551, 1404], '04': [2972, 1346], '03': [2663, 1134], '02': [2974, 933], '01': [3201, 1023], '12': [3636, 1111], '11': [3848, 1147], '10': [3738, 1354], '': [2, 0]})
