  # Analyzing, Filtering, and Cleaning Aviation Database for the Safest Plane
                                          Jupyter Notebook coded by Allison Ward, Rick Lataille, and Anthony Mansion

### As usual, imported pandas to manage big data, as well as numpy for future statistical analysis

In [50]:
#Import the modules we need
import pandas as pd
import numpy as np

# And this is the big data that we will be using
df = pd.read_csv('Aviation_Data.csv', low_memory=False)

### Here, we will be dropping columns we see no need for, the information isn't relevant. These columns are dropped because the information it provides has no use in finding the results our stakeholder is looking for.

In [51]:
# Drop the columns we know that we don't need
dropped_columns = ['Schedule', 'Report.Status', 'Publication.Date']
df.drop(columns = dropped_columns, inplace=True)
print(f"{len(df)} items.")

90348 items.


### Now looking at the rest of the columns, we filter some columns for the information we need. We leave the original "df" alone and instead make another variable to hold our filtered information. The ways we filtered were: filtering for rows with data from the last 10 years, filter data for aircrafts to airplanes only since that's the data we will be using, exclude the rows for planes that are amateur built since they... kind of screw the results we need over, and lastly filtering for the United States only. Filtering for the U.S. only is mainly because the rows that aren't in here aren't filled and can't tell of anything, and it even the same case for the U.S. territories that aren't between the Altantic and Pacific Oceans. Dropped from about 90,000 items to 7,000.

In [52]:
# Convert date column to datetime, then filter event dates to include 2013 and later
df['Event.Date'] = pd.to_datetime(df['Event.Date'])
df_filtered = df.loc[df['Event.Date'] >= '2013-01-01']
print(f"{len(df_filtered)} items.")

# Creating a new column with Day of Week
df_filtered['Day_Of_Week'] = df['Event.Date'].dt.day_name()

# Filter aircraft categories for Airplanes only
df_filtered = df_filtered.loc[df_filtered['Aircraft.Category'] == 'Airplane']
print(f"{len(df_filtered)} items.")

# Exclude Amateur-built planes
df_filtered = df_filtered.loc[df_filtered['Amateur.Built'] != 'Yes']
print(f"{len(df_filtered)} items.")

# Exclude certain identified purposes as irrelevant to our stakeholder
allowed_purposes = ['Personal', np.nan, 'Business', 'Executive/corporate', \
                    'Positioning', 'Other Work Use', 'Ferry', 'Unknown', 'Public Aircraft - Federal', \
                   'Public Aircraft - State', 'Public Aircraft - Local', 'Public Aircraft', 'PUBS']
df_filtered = df_filtered.loc[df_filtered['Purpose.of.flight'].isin(allowed_purposes)]
print(f"{len(df_filtered)} items.")

# Include only events that happened in the United States or US Territories
allowed_countries = ['United States']
df_filtered = df_filtered.loc[df_filtered['Country'].isin(allowed_countries)]
print(f"{len(df_filtered)} items.")

# Drop even more columns that are no longer useful
obsolete_columns = ['Event.Id', 'Country', 'Aircraft.Category', 'Registration.Number', 'Broad.phase.of.flight']
df_filtered.drop(columns = obsolete_columns, inplace=True)

15829 items.
13262 items.
11726 items.
9497 items.
7320 items.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered['Day_Of_Week'] = df['Event.Date'].dt.day_name()


### With all that filtered data, saved it to a new csv file for an even easier time making visualizations of the data since all the info we need won't be surrounded by the other  random information. We also did this instead of overwriting the original "just in case", y'know? Just in case we wanted to undo something, we could simply overwrite the csv and throw it back into Tableau real fast.

In [53]:
# The new filtered dataframe of the original dataframe "df"
df_filtered.to_csv('Filtered_Aviation_Data.csv', index=False)

## The test of death. (Honestly, this whole part is irrelevant and would have been 1,000 times easier and waaaay more efficient rather than "hard-coding" it. 😔 But the lines have already written, so we just had to see it through...) The test here is grabbing many random categories, and using those to make comparisons between other random categories. I was going to then import matplotlib to begin making visualizations, but I already realized how unefficient I was working and the time I was wasting when I could have just thrown the csv in Tableau... 

In [54]:
# In this cell block below, I will do even further filtering to whittle down numbers and find the best plane
# The filters will be based on how safe they are, so how less they repeat by finding: the number of engines that
# appear the least, the type of engine that appears the least, non-fatal injuries, substantial damage to plane

# First, made a copy of the current filtered datamframe to filter even more
# Filtering by injuries to see comparisons between resullts
Fatal_Inj = df.loc[df['Total.Fatal.Injuries'] > 0]
Serious_Inj = df.loc[df['Total.Serious.Injuries'] > 0]
Minor_Inj = df.loc[df['Total.Minor.Injuries'] > 0]
Uninjured = df.loc[(df['Total.Uninjured'] > 0) & (df['Total.Minor.Injuries'] == 0)] 

# Hmm... lets see the difference between substantial damage, too
Sub_Damage = df.loc[df['Aircraft.damage'] == 'Substantial'] # (Substantial Damage)
Destroyed = df.loc[df['Aircraft.damage'] == 'Destroyed'] # (Destroyed planes)
Minor_Damage = df.loc[df['Aircraft.damage'] == 'Minor']            

# Looking at the difference between data using engine types as well
Reciprocating_Eng = df.loc[df['Engine.Type'] == 'Reciprocating'] 
Turbo_Shaft = df.loc[df['Engine.Type'] == 'Turbo Shaft']
Turbo_Prop = df.loc[df['Engine.Type'] == 'Turbo Prop']
Turbo_Fan = df.loc[df['Engine.Type'] == 'Turbo Fan']
Turbo_Jet = df.loc[df['Engine.Type'] == 'Turbo Jet']
Geared_Turbofan = df.loc[df['Engine.Type'] == 'Geared Turbofan']
Electric = df.loc[df['Engine.Type'] == 'Electric']
LR = df.loc[df['Engine.Type'] == 'LR'] 
NONE = df.loc[df['Engine.Type'] == 'NONE'] 
Hyrbrid_Rocket = df.loc[df['Engine.Type'] == 'Hyrbrid Rocket'] 
UNK = df.loc[df['Engine.Type'] == 'UNK']

# Seeing if there are different results using the number of engines, too
NO_ENGINES = df.loc[df['Number.of.Engines'] == 0] #??!!
One_Engine = df.loc[df['Number.of.Engines'] == 1] 
Two_Engines = df.loc[df['Number.of.Engines'] == 2] 
Three_Engines = df.loc[df['Number.of.Engines'] == 3] 
Four_Engines = df.loc[df['Number.of.Engines'] == 4] 

### Now at this point is when we WOULD HAVE imported matplotlib, but we already realized a few lines into the cell block above that we were just wasting time and that it'd be better to just use Tableau. But below, I made a third corny variable (which is why the name is very bad lol) to become the dataframe that will be used for comparisons. But in order to compare each, I have to reset the new variable by making it the old dataframe each time or it will just get smaller and smaller. (Just realized we didn't even use those variables above, haha...😭)

In [55]:
# Make a new dataframe variable so we could the filtered dataset of the original aviation dataset,
# we can compare the data we seperated. Still with us after that?
df_filterered = df_filtered

# Filtering the filtered dataset of the original aviation dataset, we can compare the data we seperated
df_filterered = df_filterered.loc[df['Aircraft.damage'] == 'Substantial']
df_filterered = df_filterered.loc[df['Total.Uninjured'] > 0]
df_filterered = df_filterered.loc[df['Number.of.Engines'] == 2] 

# Woow, its the new dataframe! Again, would have used matplotlib here, but naaaah. Let's not please. 
df_filterered

Unnamed: 0,Investigation.Type,Accident.Number,Event.Date,Location,Latitude,Longitude,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,...,Engine.Type,FAR.Description,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Day_Of_Week
73204,Accident,CEN13CA129,2013-01-11,"Alexandria, MN",455145N,0095240W,AXN,Alexindria Municipal Airport,Non-Fatal,Substantial,...,Reciprocating,135,,BEMIDJI AVIATION SERVICES INC,0.0,0.0,0.0,1.0,IMC,Friday
73286,Accident,ERA13LA129,2013-02-07,"Winston-Salem, NC",003681N,0801319W,INT,Smith Reynolds Airport,Non-Fatal,Substantial,...,Reciprocating,091,Personal,,0.0,0.0,0.0,3.0,VMC,Thursday
73313,Accident,CEN13CA162,2013-02-14,"Abilene, TX",323031N,0993626W,,,Non-Fatal,Substantial,...,Reciprocating,091,Business,FRANK LEROY BELL,0.0,0.0,0.0,1.0,VMC,Thursday
73321,Accident,ANC13CA023,2013-02-16,"Dutch Harbor, AK",535334N,1663225W,PADU,Unalaska,Non-Fatal,Substantial,...,Reciprocating,135,,GRANT AVIATION INC,0.0,0.0,0.0,3.0,VMC,Saturday
73329,Accident,WPR13LA132,2013-02-17,"Casper, WY",425417N,1062731W,CPR,Casper/Natrona County Int'l,Non-Fatal,Substantial,...,Reciprocating,091,Personal,BRAVENEC DANIEL W,0.0,0.0,0.0,2.0,VMC,Sunday
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90275,Accident,CEN23LA052,2022-11-22,"Denton, TX",292548N,1005924W,,,Non-Fatal,Substantial,...,,091,Personal,Pilot,0.0,0.0,0.0,7.0,,Tuesday
90293,Accident,CEN23LA056,2022-11-29,"Batesville, AR",354334N,0913851W,BVX,Batesville Regional Airport,Minor,Substantial,...,,091,Business,Creamer Pilot Services LLC,0.0,2.0,0.0,6.0,IMC,Tuesday
90295,Accident,ERA23LA075,2022-11-30,"Newport News, VA",037816N,0762838W,PHF,NEWPORT NEWS/WILLIAMSBURG INTL,Non-Fatal,Substantial,...,,091,Other Work Use,AERY AVIATION,0.0,0.0,0.0,3.0,VMC,Wednesday
90328,Accident,WPR23LA065,2022-12-13,"Lewistown, MT",047257N,0109280W,KLWT,Lewiston Municipal Airport,Non-Fatal,Substantial,...,,NUSC,,,0.0,0.0,0.0,1.0,,Tuesday


## Alriight, now that we've done as much filtering we believe we need, we move on to filtering the remaining columns.  

In [56]:
# Filter for foreign locations not noted as foreign using the 'OF' state code in Location
df_filtered['State_Code'] = df_filtered['Location'].str.slice(-2)
df_filtered = df_filtered.loc[df_filtered['State_Code'] != 'OF']
print(f"{len(df_filtered)} items.")

# Drop rows that are missing latitude coordinates (also captures missing Longitude)
df_filtered.dropna(subset=['Latitude'], inplace=True)
print(f"{len(df_filtered)} items.")

#Converting latitude and longitude from Degrees, Minutes, and Seconds to Decimal Degrees

df_filtered.dropna(subset=['Latitude', 'Longitude'], inplace=True)

def convert_latitude(x):
    degrees = float(x[:2])
    minutes = float(x[2:4])
    seconds = float(x[4:6])
    return degrees + minutes/60 + seconds/3600

df_filtered["new_lats"] = df_filtered['Latitude'].map(convert_latitude)

def convert_longitude(x):
    degrees = float(x[:3])
    minutes = float(x[3:5])
    seconds = float(x[5:7])
    return -(degrees + minutes/60 + seconds/3600)

df_filtered["new_longs"] = df_filtered['Longitude'].map(convert_longitude)

# Record original makes for later comparison
Original_makes = len(df_filtered['Make'].unique())
# These map functions will clean the 'Make' column, A-C
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Aerofab" if x.lower().strip()[:7]=="aerofab" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Aeroprakt" if x.lower().strip()[:9]=="aeroprakt" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Aeropro" if x.lower().strip()[:7]=="aeropro" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Aerostar" if x.lower().strip()[:8]=="aerostar" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Aerostar" if x.lower().strip()[:3]=="s c" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Aerotek" if x.lower().strip()[:7]=="aerotek" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Air Tractor" if x.lower().strip()[:11]=="air tractor" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Airbus" if x.lower().strip()[:6]=="airbus" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Airbus" if x.lower().strip()[:5]=="fouga" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Aircraft Mfg" if x.lower().strip()[:12]=="aircraft mfg" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "American Champion" if x.lower().strip()[:17]=="american champion" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "American Legend" if x.lower().strip()[:15]=="american legend" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Arion" if x.lower().strip()[:5]=="arion" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Aviat" if x.lower().strip()[:5]=="aviat" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Avions" if x.lower().strip()[:6]=="avions" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "BAE" if x.lower().strip()[:3]=="bae" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Boeing" if x.lower().strip()[:6]=="boeing" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Boeing" if x.lower().strip()[:9]=="mcdonnell" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Boeing" if x.lower().strip()[:7]=="douglas" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Boeing" if x.lower().strip()[:8]=="rockwell" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Bombardier" if x.lower().strip()[:10]=="bombardier" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Bombardier" if x.lower().strip()[:5]=="gates" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Bombardier" if x.lower().strip()[:7]=="learjet" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Bombardier" if x.lower().strip()[:8]=="canadair" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "BAE" if x.lower().strip()[:12]=="british aero" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Britten-Norman" if x.lower().strip()[:7]=="britten" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Bucker" if x.lower().strip()[:6]=="bucker" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Cirrus" if x.lower().strip()[:6]=="cirrus" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Convair" if x.lower().strip()[:12]=="consolidated" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "CubCrafters" if x.lower().strip()[:3]=="cub" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Czech" if x.lower().strip()[:5]=="czech" else x)
# These map functions will clean the 'Make' column, D-N
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Daher" if x.lower().strip()[:3]=="s.o" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Daher" if x.lower().strip()[:3]=="soc" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Dassault" if x.lower().strip()[:8]=="dassault" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "De Havilland" if x.lower().strip()[:6]=="de hav" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "De Havilland" if x.lower().strip()[:5]=="dehav" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Diamond" if x.lower().strip()[:7]=="diamond" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Eclipse" if x.lower().strip()[:7]=="eclipse" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Embraer" if x.lower().strip()[:7]=="embraer" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Evektor" if x.lower().strip()[:7]=="evektor" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Evolution" if x.lower().strip()[:9]=="evolution" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Extra" if x.lower().strip()[:5]=="extra" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Fairchild" if x.lower().strip()[:9]=="fairchild" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Fantasy Air" if x.lower().strip()[:7]=="fantasy" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Flight Design" if x.lower().strip()[:8]=="flight d" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Flightstar" if x.lower().strip()[:7]=="flights" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "FPNA" if x.lower().strip()[:4]=="fpna" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Glasair" if x.lower().strip()[:7]=="glasair" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Golden Circle" if x.lower().strip()[:8]=="golden c" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Gulfstream" if x.lower().strip()[:10]=="gulfstream" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Gulfstream" if x.lower().strip()[:3]=="iai" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Honda" if x.lower().strip()[:5]=="honda" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Jabiru" if x.lower().strip()[:6]=="jabiru" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Lancair" if x.lower().strip()[:7]=="lancair" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Maxair" if x.lower().strip()[:6]=="maxair" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Meyers" if x.lower().strip()[:6]=="meyers" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Mooney" if x.lower().strip()[:6]=="mooney" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "M-Squared" if x.lower().strip()[:9]=="m-squared" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Nanchang" if x.lower().strip()[:8]=="nanchang" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Northrop Grumman" if x.lower().strip()[:7]=="grumman" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Northrop Grumman" if x.lower().strip()[:8]=="northrop" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "North American" if x.lower().strip()[:8]=="north am" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "North Wing" if x.lower().strip()[:7]=="north w" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "North Wing" if x.lower().strip()[:6]=="northw" else x)
# These map functions will clean the 'Make' column, N-Z
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Orlican" if x.lower().strip()[:7]=="orlican" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Phantom" if x.lower().strip()[:7]=="phantom" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Pilatus" if x.lower().strip()[:7]=="pilatus" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Piper" if x.lower().strip()[:9]=="new piper" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Piper" if x.lower().strip()[:5]=="piper" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Pipistrel" if x.lower().strip()[:4]=="pipi" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Pitts" if x.lower().strip()[:5]=="pitts" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Pzl Okecie" if x.lower().strip()[:3]=="pzl" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Quad City" if x.lower().strip()[:4]=="quad" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Quest" if x.lower().strip()[:5]=="quest" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Quicksilver" if x.lower().strip()[:5]=="quick" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Rans" if x.lower().strip()[:4]=="rans" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Remos" if x.lower().strip()[:5]=="remos" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Rockwell" if x.lower().strip()[:8]=="rockwell" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Ryan" if x.lower().strip()[:4]=="ryan" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Scoda" if x.lower().strip()[:5]=="scoda" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Short" if x.lower().strip()[:5]=="short" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Stearman" if x.lower().strip()[:8]=="stearman" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Taylorcraft" if x.lower().strip()[:7]=="taylorc" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Textron" if x.lower().strip()[:6]=="cessna" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Textron" if x.lower().strip()[:4]=="rath" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Textron" if x.lower().strip()[:4]=="rayt" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Textron" if x.lower().strip()[:7]=="textron" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Textron" if x.lower().strip()[:5]=="beech" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Textron" if x.lower().strip()[:6]=="hawker" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "TL Ultralight" if x.lower().strip()[:2]=="tl" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Vans" if x.lower().strip()[:4]=="vans" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Waco" if x.lower().strip()[:4]=="waco" else x)
df_filtered['Make'] = df_filtered['Make'].map(lambda x: "Zlin" if x.lower().strip()[:4]=="zlin" else x)

# Show the amount of consolidation in makes
print(f"The original {Original_makes} makes have been reduced to {len(df_filtered['Make'].unique())} makes.")

# Change "NaN" to 'None' or 'Unknown', as appropriate
df_filtered['Injury.Severity'].fillna('None', inplace=True)
df_filtered['Aircraft.damage'].fillna('Unknown', inplace=True)
df_filtered['Purpose.of.flight'].fillna('Unknown', inplace=True)
df_filtered['Engine.Type'].fillna('Unknown', inplace=True)
df_filtered['FAR.Description'].fillna('Unknown', inplace=True)
df_filtered['Number.of.Engines'].fillna('Unknown', inplace=True)

# This will convert all 'unknown' type entries to 'Unknown' in the Air.carrier field
df_filtered['Air.carrier'].fillna('Unknown', inplace=True)
df_filtered['Air.carrier'] = df_filtered['Air.carrier'].astype(str).map(
    lambda x: "Unknown" if x.lower().strip()[:3]=="unk" else x)

# This will convert all 'unknown' type entries to 'Unknown' in the Weather.Condition field
df_filtered['Weather.Condition'].fillna('Unknown', inplace=True)
df_filtered['Weather.Condition'] = df_filtered['Weather.Condition'].astype(str).map(
    lambda x: "Unknown" if x.lower().strip()[:3]=="unk" else x)

# Put all Makes into Title case, for readability
df_filtered['Make'] = df_filtered['Make'].map(lambda x: x.title())

# Use dt functions to extract year and month and create new columns
df_filtered['Year'] = df['Event.Date'].dt.year
df_filtered['Month'] = df['Event.Date'].dt.month

# Create a new column to simplify the large jet analysis
separate_large_jets = ["Airbus", "Boeing", "Embraer"]
df_filtered['Large_Jets'] = df_filtered['Make'].map(lambda x: "Other" if x not in separate_large_jets else x)

# Create a new column to simplify the large jet analysis
separate_small_jets = ["Bombardier", "Dassault", "Gulfstream", "Honda", "Textron"]
df_filtered['Small_Jets'] = df_filtered['Make'].map(lambda x: "Other" if x not in separate_small_jets else x)

# Create a new column summing fatal and serious injuries
df_filtered['Major_Injuries'] = df_filtered['Total.Fatal.Injuries'] + df_filtered['Total.Serious.Injuries']
# Write to a new CSV file
df_filtered.to_csv('Filtered_Aviation_Data.csv', index=False)

7307 items.
7302 items.
The original 670 makes have been reduced to 433 makes.
