##  Setup

With this Google Colaboratory (Colab) notebook open, click the "Copy to Drive" button that appears in the menu bar. The notebook will then be attached to your own Google user account, so you can edit it in any way you like -- you can even take notes directly in the notebook.

# Python Open Labs: Data wrangling with Pandas

## Welcome!

### Instructors
- Scott Bailey
- Ashley Evans Bandy
- Claire Cahoon
- Walt Gurley
- Natalia Lopez

### Open Labs agenda

1.   **Guided activity**: One of the instructors will share their screen to work through the guided activity and teach concepts along the way.

2.   **Open lab time**: After the guided portion of the Open Lab, the rest of the time is for you to ask questions, work collaboratively, or have self-guided practice time. You will have access to instructors and peers for questions and support.

Breakout rooms will be available if you would like to work in small groups. If you have trouble joining a room, ask in the chat to be moved into a room.

### Learning objectives

By the end of our workshop today, we hope you'll understand what the Pandas library is and be able to use Pandas to manipulate data within DataFrames.

### Today's Topics
- Editing DataFrame labels and headers
- Concatonating DataFrames
- Merging DataFrames
- Adding and removing columns


### Using Zoom

Please make sure that your mic is muted during the workshop.

We will have live captioning enabled, you can switch this on and off from your toolbar at the bottom of the screen.

### Asking questions

Please feel free to ask questions in the Zoom chat throughout the demonstration.

Other instructors will be monitoring chat on Zoom. They will answer as able, and will collect questions with answers that might help everyone to answer at the end of the demonstration.

The open lab time is when you will be able to ask more questions and work together on the exercises.

### Using Jupyter Notebooks and Google Colaboratory

Jupyter notebooks are a way to write and run Python code in an interactive way. They're quickly becoming a standard way of putting together data, code, and written explanations or visualizations into a single document and sharing that. There are a lot of ways that you can run Jupyter notebooks, including just locally on your computer, but we've decided to use Google's Colaboratory notebook platform for this workshop.  Colaboratory is “a Google research project created to help disseminate machine learning education and research.”  If you would like to know more about Colaboratory in general, you can visit the [Welcome Notebook](https://colab.research.google.com/notebooks/welcome.ipynb).

Using the Google Colaboratory platform allows us to focus on learning and writing Python in the workshop rather than on setting up Python, which can sometimes take a bit of extra work depending on platforms, operating systems, and other installed applications. If you'd like to install a Python distribution locally, though, we're happy to help. Feel free to [get help from our graduate consultants](https://www.lib.ncsu.edu/dxl) or [schedule an appointment with Libraries staff](https://go.ncsu.edu/dvs-request).

## Guided Instruction
This week we're focusing on data wrangling using Python Pandas. We're going to manipulate our dataset in order to make it more usable for answering questions about the information.

Content Warning: This dataset contains information relating to violence towards animals. We understand that this may be distressing, and if you need to step away from the workshop we understand.

In this section, we will work through examples using data from the [Federal Aviation Administration (FAA) Wildlife Strikes Database](https://wildlife.faa.gov/search). We have filtered the data to only include North Carolina.

> "The FAA Wildlife Strike Database contains records of reported wildlife strikes since 1990. Strike reporting is voluntary. Therefore, this database only represents the information we have received from airlines, airports, pilots, and other sources." - [FAA website](https://wildlife.faa.gov/home)

In [1]:
# Import the Pandas library as pd (callable in our code as pd)
import pandas as pd

### Importing datasets

We have prepared the data from the FAA website for this workshop. We will import those datasets into our notebook to use them for data analysis.

- [Preview the CSV file (opens on GitHub)](https://github.com/NCSU-Libraries/data-viz-workshops/blob/master/Python_Open_Labs/data/FAA_Wildlife_strikes_1990-1999.csv) - wildlife strike data from the years 1990-1999
-[Preview the Excel file (this link will download the file)](https://github.com/NCSU-Libraries/data-viz-workshops/blob/master/Python_Open_Labs/data/FAA_Wildlife_strikes_2000-2009.xlsx?raw=true) - wildlife strike data from the years 2000-2009
- [Preview the JSON file (opens on GitHub)](https://raw.githubusercontent.com/NCSU-Libraries/data-viz-workshops/master/Python_Open_Labs/data/FAA_Wildlife_strikes_2010-2019.json) - wildlife strike data from the years 2010-2019

In [2]:
# Import the CSV file (wildlife strike data from the years 1990-1999)
csv_file_url = 'https://raw.githubusercontent.com/NCSU-Libraries/data-viz-workshops/master/Python_Open_Labs/data/FAA_Wildlife_strikes_1990-1999.csv'
wl_strikes_csv = pd.read_csv(csv_file_url)

# Print out the first five columns of the dataset
wl_strikes_csv.head()

Unnamed: 0,INDX_NR,INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_YEAR,TIME,TIME_OF_DAY,AIRPORT_ID,AIRPORT,RUNWAY,STATE,FAAREGION,LOCATION,ENROUTE STATE,OPID,REG,FLT,AIRCRAFT,AMA,AMO,EMA,EMO,AC_CLASS,AC_MASS,TYPE_ENG,NUM_ENGS,ENG_1_POS,ENG_2_POS,ENG_3_POS,ENG_4_POS,PHASE_OF_FLIGHT,HEIGHT,SPEED,DISTANCE,SKY,PRECIPITATION,AOS,COST_REPAIRS,OTHER_COST,COST_REPAIRS_INFL_ADJ,COST_OTHER_INFL_ADJ,...,STR_ENG2,DAM_ENG2,STR_ENG3,DAM_ENG3,STR_ENG4,DAM_ENG4,STR_PROP,DAM_PROP,STR_WING_ROT,DAM_WING_ROT,STR_FUSE,DAM_FUSE,STR_LG,DAM_LG,STR_TAIL,DAM_TAIL,STR_LGHTS,DAM_LGHTS,STR_OTHER,DAM_OTHER,OTHER_SPECIFY,EFFECT,EFFECT_OTHER,SPECIES_ID,REMARKS,REMAINS_COLLECTED,REMAINS_SENT,WARNED,BIRDS_SEEN,BIRDS_STRUCK,SIZE,NR_INJURIES,NR_FATALITIES,COMMENT,REPORTER_NAME,REPORTER_TITLE,SOURCE,PERSON,LUPDATE,TRANSFER
0,633309,1999-12-24,12,1999,10:15,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,18R,NC,ASO,,,USA,N523AU,,B-737-300,148,24.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Landing Roll,0.0,,0.0,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,O2111,4 BIRDS.FLT 1539. STRIKE REPTS DIFFER AS TO PH...,False,False,Unknown,2-10,2-10,Small,,,SOURCE = TWO XXXX-X REPTS /Legacy Record=XXXXXX/,REDACTED,REDACTED,FAA Form 5200-7,Air Transport Operations,2000-03-10,False
1,634726,1999-12-15,12,1999,,Day,KRDU,RALEIGH-DURHAM INTL,23R,NC,ASO,,,CDR,C-FPCR,,FOKKER F28 MK 1000,372,4.0,37.0,43.0,A,4.0,D,2.0,5.0,5.0,,,Approach,20.0,130.0,,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,NO EVIDENCE OF BEING STRUCK OR DAMAGE REPTD BY...,False,False,Yes,1,1,Small,,,/Legacy Record=XXXXXX/,REDACTED,REDACTED,FAA Form 5200-7,,2000-03-09,False
2,636216,1999-12-14,12,1999,07:40,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,18R,NC,ASO,,,USA,N955VJ,,DC-9-30,583,21.0,34.0,10.0,A,4.0,D,2.0,5.0,5.0,,,Approach,100.0,132.0,,Some Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,YL001,,False,False,No,2-10,2-10,Small,,,SOURCE = TWO XXXX-X REPTS /Legacy Record=XXXXXX/,REDACTED,REDACTED,FAA Form 5200-7,Tower,2000-03-09,False
3,633739,1999-12-11,12,1999,17:00,Dusk,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36R,NC,ASO,,,USA,N440US,,B-737-400,148,32.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Climb,100.0,150.0,,Some Cloud,,3.0,7500.0,5000.0,11498.0,7665.0,...,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,,"None, Precautionary Landing",,UNKBL,FLT 2225. RADOME REPLACED. 2.5 HRS DOWN TIME.,False,False,No,1,1,Large,,,SOURCE = TWO XXXX-X REPTS /Legacy Record=XXXXXX/,REDACTED,REDACTED,FAA Form 5200-7,Tower,2000-03-09,False
4,636671,1999-12-11,12,1999,,,KRDU,RALEIGH-DURHAM INTL,,NC,ASO,,,BLR,N303UE,,BA-41 JETSTR,168,11.0,19.0,4.0,A,3.0,C,2.0,4.0,4.0,,,Climb,,,0.0,,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,Precautionary Landing,,UNKBM,"AFTER ROTATION, FOUL ODOR FILLED FLT DECK. BIR...",False,False,Unknown,,1,Medium,,,SOURCE = X AIRLINE REPTS /Legacy Record=XXXXXX/,REDACTED,REDACTED,Air Transport Report,Air Transport Operations,2001-06-25,False


In [3]:
# Import the Excel file (wildlife strike data from the years 2000-2009)
xls_file_url = 'https://github.com/NCSU-Libraries/data-viz-workshops/blob/master/Python_Open_Labs/data/FAA_Wildlife_strikes_2000-2009.xlsx?raw=true'
wl_strikes_xls = pd.read_excel(xls_file_url)

# Print out the first five columns of the dataset
wl_strikes_xls.head()

Unnamed: 0,INDX_NR,INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_YEAR,TIME,TIME_OF_DAY,AIRPORT_ID,AIRPORT,RUNWAY,STATE,FAAREGION,LOCATION,ENROUTE STATE,OPID,REG,FLT,AIRCRAFT,AMA,AMO,EMA,EMO,AC_CLASS,AC_MASS,TYPE_ENG,NUM_ENGS,ENG_1_POS,ENG_2_POS,ENG_3_POS,ENG_4_POS,PHASE_OF_FLIGHT,HEIGHT,SPEED,DISTANCE,SKY,PRECIPITATION,AOS,COST_REPAIRS,OTHER_COST,COST_REPAIRS_INFL_ADJ,COST_OTHER_INFL_ADJ,...,STR_ENG2,DAM_ENG2,STR_ENG3,DAM_ENG3,STR_ENG4,DAM_ENG4,STR_PROP,DAM_PROP,STR_WING_ROT,DAM_WING_ROT,STR_FUSE,DAM_FUSE,STR_LG,DAM_LG,STR_TAIL,DAM_TAIL,STR_LGHTS,DAM_LGHTS,STR_OTHER,DAM_OTHER,OTHER_SPECIFY,EFFECT,EFFECT_OTHER,SPECIES_ID,REMARKS,REMAINS_COLLECTED,REMAINS_SENT,WARNED,BIRDS_SEEN,BIRDS_STRUCK,SIZE,NR_INJURIES,NR_FATALITIES,COMMENT,REPORTER_NAME,REPORTER_TITLE,SOURCE,PERSON,LUPDATE,TRANSFER
0,707074,2009-12-24,12,2009,07:52,Dawn,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36C,NC,ASO,,,JIA,N218PS,,CRJ100/200,188,10.0,22.0,4.0,A,3.0,D,2.0,5.0,5.0,,,Approach,100.0,138.0,,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,UNKBS,,False,False,Unknown,2-10,1,Small,,,/Legacy Record=XXXXXX/,REDACTED,REDACTED,FAA Form 5200-7,Airport Operations,2010-04-29,False
1,707361,2009-12-13,12,2009,,Day,KILM,WILMINGTON INTL,17,NC,ASO,,,ASH,,,EMB-145,332,14.0,1.0,10.0,A,3.0,D,2.0,5.0,5.0,,,Climb,,,,Overcast,Rain,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,PART NOT REPTD,,,UNKBM,UNKNOWN TYPE OF BIRD STRUCK. PILOT REPTD HITTI...,False,False,No,,1,Medium,,,XXXX-XX-XX-XXXXXX /Legacy Record=XXXXXX/,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2010-04-29,False
2,707050,2009-12-11,12,2009,07:26,Day,KILM,WILMINGTON INTL,35,NC,ASO,,,1ASQ,N683AS?,4939.0,CRJ100/200,188,10.0,22.0,4.0,A,3.0,D,2.0,5.0,5.0,,,Take-off Run,0.0,130.0,0.0,Some Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,YL001,WE SAW 2 SML BIRDS AND 2 HIT WINDSHLD. PILOT R...,True,False,No,2-10,2-10,Small,,,SOURCE = TWO XXXX-X (XXXX-XX-XX-XXXXXX & XXXXX...,REDACTED,REDACTED,FAA Form 5200-7-E,Air Transport Operations,2010-04-08,False
3,707146,2009-12-10,12,2009,16:45,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36C,NC,ASO,,,JIA,N718PS,215.0,CRJ700,188,16.0,22.0,4.0,A,4.0,D,2.0,5.0,5.0,,,Take-off Run,0.0,,0.0,Some Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,"None, Precautionary Landing",,YL001,ID BY SMITHSONIAN. FAA 3952. DNA.,True,True,Yes,,2-10,Small,,,SOURCE = THREE XXXX-X (XXXX-XX-XX-XXXXXX & RX)...,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2010-08-19,False
4,707624,2009-12-08,12,2009,07:30,Dawn,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,,NC,ASO,,,ASH,N935LR,2604.0,CRJ900,188,17.0,22.0,4.0,A,4.0,D,2.0,5.0,5.0,,,Approach,500.0,133.0,,,,,,,,,...,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,NO DMG TO A/C.,False,False,Yes,2-10,1,Small,,,/Legacy Record=XXXXXX/,REDACTED,REDACTED,Air Transport Report,Air Transport Operations,2010-04-29,False


In [4]:
#Import the JSON file (wildlife strike data from the years 2010-2019)
json_file_url = 'https://raw.githubusercontent.com/NCSU-Libraries/data-viz-workshops/master/Python_Open_Labs/data/FAA_Wildlife_strikes_2010-2019.json'
wl_strikes_json = pd.read_json(json_file_url)

# Print out the first five columns of the dataset
wl_strikes_json.head()

Unnamed: 0,INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_YEAR,TIME,TIME_OF_DAY,AIRPORT_ID,AIRPORT,RUNWAY,STATE,FAAREGION,LOCATION,ENROUTE STATE,OPID,REG,FLT,AIRCRAFT,AMA,AMO,EMA,EMO,AC_CLASS,AC_MASS,TYPE_ENG,NUM_ENGS,ENG_1_POS,ENG_2_POS,ENG_3_POS,ENG_4_POS,PHASE_OF_FLIGHT,HEIGHT,SPEED,DISTANCE,SKY,PRECIPITATION,AOS,COST_REPAIRS,OTHER_COST,COST_REPAIRS_INFL_ADJ,COST_OTHER_INFL_ADJ,INGESTED,...,STR_ENG2,DAM_ENG2,STR_ENG3,DAM_ENG3,STR_ENG4,DAM_ENG4,STR_PROP,DAM_PROP,STR_WING_ROT,DAM_WING_ROT,STR_FUSE,DAM_FUSE,STR_LG,DAM_LG,STR_TAIL,DAM_TAIL,STR_LGHTS,DAM_LGHTS,STR_OTHER,DAM_OTHER,OTHER_SPECIFY,EFFECT,EFFECT_OTHER,SPECIES_ID,REMARKS,REMAINS_COLLECTED,REMAINS_SENT,WARNED,BIRDS_SEEN,BIRDS_STRUCK,SIZE,NR_INJURIES,NR_FATALITIES,COMMENT,REPORTER_NAME,REPORTER_TITLE,SOURCE,PERSON,LUPDATE,TRANSFER
1080125,2020-11-05,11,2020,05:00,,KRDU,RALEIGH-DURHAM INTL,23R,NC,ASO,,,UPS,N253UP,1276,MD-11,583,39.0,22.0,7.0,A,4.0,D,3.0,1.0,6.0,1.0,,Approach,,,,No Cloud,,,,,,,False,...,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,ZX004,"Fuselage, above FO clearview sliding window.",True,True,Unknown,,1,Small,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
1080118,2020-11-05,11,2020,22:35,,KGSO,PIEDMONT TRIAD INTL,5R,NC,ASO,,,FDX,N950FD,1624,B-757-200,148,26.0,37.0,37.0,A,4.0,D,2.0,1.0,1.0,,,Take-off Run,0.0,,0.0,Some Cloud,,,,,,,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,part not reported,,,1F132,Aircraft landed RWY 5R and reported a dead fox...,True,False,Unknown,1.0,1,Large,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
1080126,2020-11-05,11,2020,11:00,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36C,NC,ASO,,,AAL,N809NN,1433,B-737-800,148,43.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,,,,No Cloud,,,,,,,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKB,Aircraft reportedly struck a bird during appro...,True,False,Yes,,1,,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
1080130,2020-11-05,11,2020,06:52,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36C,NC,ASO,,,RPA,N643RW,3512,EMB-170,332,,22.0,4.0,A,4.0,D,2.0,1.0,1.0,,,Climb,,,,No Cloud,,,,,,,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,PART NOT REPORTED,,,UNKB,Flight crew reported to CLT ATC that the aircr...,False,False,Yes,,1,,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
1078243,2020-11-04,11,2020,05:20,Dawn,KRDU,RALEIGH-DURHAM INTL,23R,NC,ASO,,,UPS,N259UP,1276,MD-11,583,39.0,22.0,7.0,A,4.0,D,3.0,1.0,6.0,1.0,,Approach,,,,No Cloud,,,,,,,False,...,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,Z6007,Captain side - clearview window forward frame....,True,True,Unknown,,1,Small,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-11-27,False


### Reset DataFrame index labels
The JSON file we imported does not include the column `INDX_NR`. Instead, these values are used as the index labels. We want this dataset to match the format of our other datasets, so we first need to reset the index using the DataFrame method `reset_index()`.

In [5]:
# Reset the JSON DataFrame index and rename the column
wl_strikes_json_reset = wl_strikes_json.reset_index()

# Print out the first five columns of the dataset
wl_strikes_json_reset.head()

Unnamed: 0,index,INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_YEAR,TIME,TIME_OF_DAY,AIRPORT_ID,AIRPORT,RUNWAY,STATE,FAAREGION,LOCATION,ENROUTE STATE,OPID,REG,FLT,AIRCRAFT,AMA,AMO,EMA,EMO,AC_CLASS,AC_MASS,TYPE_ENG,NUM_ENGS,ENG_1_POS,ENG_2_POS,ENG_3_POS,ENG_4_POS,PHASE_OF_FLIGHT,HEIGHT,SPEED,DISTANCE,SKY,PRECIPITATION,AOS,COST_REPAIRS,OTHER_COST,COST_REPAIRS_INFL_ADJ,COST_OTHER_INFL_ADJ,...,STR_ENG2,DAM_ENG2,STR_ENG3,DAM_ENG3,STR_ENG4,DAM_ENG4,STR_PROP,DAM_PROP,STR_WING_ROT,DAM_WING_ROT,STR_FUSE,DAM_FUSE,STR_LG,DAM_LG,STR_TAIL,DAM_TAIL,STR_LGHTS,DAM_LGHTS,STR_OTHER,DAM_OTHER,OTHER_SPECIFY,EFFECT,EFFECT_OTHER,SPECIES_ID,REMARKS,REMAINS_COLLECTED,REMAINS_SENT,WARNED,BIRDS_SEEN,BIRDS_STRUCK,SIZE,NR_INJURIES,NR_FATALITIES,COMMENT,REPORTER_NAME,REPORTER_TITLE,SOURCE,PERSON,LUPDATE,TRANSFER
0,1080125,2020-11-05,11,2020,05:00,,KRDU,RALEIGH-DURHAM INTL,23R,NC,ASO,,,UPS,N253UP,1276,MD-11,583,39.0,22.0,7.0,A,4.0,D,3.0,1.0,6.0,1.0,,Approach,,,,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,ZX004,"Fuselage, above FO clearview sliding window.",True,True,Unknown,,1,Small,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
1,1080118,2020-11-05,11,2020,22:35,,KGSO,PIEDMONT TRIAD INTL,5R,NC,ASO,,,FDX,N950FD,1624,B-757-200,148,26.0,37.0,37.0,A,4.0,D,2.0,1.0,1.0,,,Take-off Run,0.0,,0.0,Some Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,part not reported,,,1F132,Aircraft landed RWY 5R and reported a dead fox...,True,False,Unknown,1.0,1,Large,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
2,1080126,2020-11-05,11,2020,11:00,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36C,NC,ASO,,,AAL,N809NN,1433,B-737-800,148,43.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,,,,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKB,Aircraft reportedly struck a bird during appro...,True,False,Yes,,1,,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
3,1080130,2020-11-05,11,2020,06:52,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36C,NC,ASO,,,RPA,N643RW,3512,EMB-170,332,,22.0,4.0,A,4.0,D,2.0,1.0,1.0,,,Climb,,,,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,PART NOT REPORTED,,,UNKB,Flight crew reported to CLT ATC that the aircr...,False,False,Yes,,1,,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
4,1078243,2020-11-04,11,2020,05:20,Dawn,KRDU,RALEIGH-DURHAM INTL,23R,NC,ASO,,,UPS,N259UP,1276,MD-11,583,39.0,22.0,7.0,A,4.0,D,3.0,1.0,6.0,1.0,,Approach,,,,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,Z6007,Captain side - clearview window forward frame....,True,True,Unknown,,1,Small,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-11-27,False


### Renaming column labels

When we reset our index a new column `index` was created. Let's change the name of this column to `INDX_NR` to match our other datasets using the DataFrame `rename()` method.

In [6]:
# Rename the column we created
wl_strikes_json_rename = wl_strikes_json_reset.rename(columns={"index":"INDX_NR"})

# Print out the first five columns of the dataset
wl_strikes_json_rename.head()

Unnamed: 0,INDX_NR,INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_YEAR,TIME,TIME_OF_DAY,AIRPORT_ID,AIRPORT,RUNWAY,STATE,FAAREGION,LOCATION,ENROUTE STATE,OPID,REG,FLT,AIRCRAFT,AMA,AMO,EMA,EMO,AC_CLASS,AC_MASS,TYPE_ENG,NUM_ENGS,ENG_1_POS,ENG_2_POS,ENG_3_POS,ENG_4_POS,PHASE_OF_FLIGHT,HEIGHT,SPEED,DISTANCE,SKY,PRECIPITATION,AOS,COST_REPAIRS,OTHER_COST,COST_REPAIRS_INFL_ADJ,COST_OTHER_INFL_ADJ,...,STR_ENG2,DAM_ENG2,STR_ENG3,DAM_ENG3,STR_ENG4,DAM_ENG4,STR_PROP,DAM_PROP,STR_WING_ROT,DAM_WING_ROT,STR_FUSE,DAM_FUSE,STR_LG,DAM_LG,STR_TAIL,DAM_TAIL,STR_LGHTS,DAM_LGHTS,STR_OTHER,DAM_OTHER,OTHER_SPECIFY,EFFECT,EFFECT_OTHER,SPECIES_ID,REMARKS,REMAINS_COLLECTED,REMAINS_SENT,WARNED,BIRDS_SEEN,BIRDS_STRUCK,SIZE,NR_INJURIES,NR_FATALITIES,COMMENT,REPORTER_NAME,REPORTER_TITLE,SOURCE,PERSON,LUPDATE,TRANSFER
0,1080125,2020-11-05,11,2020,05:00,,KRDU,RALEIGH-DURHAM INTL,23R,NC,ASO,,,UPS,N253UP,1276,MD-11,583,39.0,22.0,7.0,A,4.0,D,3.0,1.0,6.0,1.0,,Approach,,,,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,ZX004,"Fuselage, above FO clearview sliding window.",True,True,Unknown,,1,Small,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
1,1080118,2020-11-05,11,2020,22:35,,KGSO,PIEDMONT TRIAD INTL,5R,NC,ASO,,,FDX,N950FD,1624,B-757-200,148,26.0,37.0,37.0,A,4.0,D,2.0,1.0,1.0,,,Take-off Run,0.0,,0.0,Some Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,part not reported,,,1F132,Aircraft landed RWY 5R and reported a dead fox...,True,False,Unknown,1.0,1,Large,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
2,1080126,2020-11-05,11,2020,11:00,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36C,NC,ASO,,,AAL,N809NN,1433,B-737-800,148,43.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,,,,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKB,Aircraft reportedly struck a bird during appro...,True,False,Yes,,1,,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
3,1080130,2020-11-05,11,2020,06:52,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36C,NC,ASO,,,RPA,N643RW,3512,EMB-170,332,,22.0,4.0,A,4.0,D,2.0,1.0,1.0,,,Climb,,,,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,PART NOT REPORTED,,,UNKB,Flight crew reported to CLT ATC that the aircr...,False,False,Yes,,1,,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False
4,1078243,2020-11-04,11,2020,05:20,Dawn,KRDU,RALEIGH-DURHAM INTL,23R,NC,ASO,,,UPS,N259UP,1276,MD-11,583,39.0,22.0,7.0,A,4.0,D,3.0,1.0,6.0,1.0,,Approach,,,,No Cloud,,,,,,,...,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,Z6007,Captain side - clearview window forward frame....,True,True,Unknown,,1,Small,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-11-27,False


### Concatenate the three DataFrames

We want to be able to work with all of the data we have imported at once, so we need to pull all three DataFrames into one. They all have the same columns now, so we can concatenate them based on columns (similar to adding them together, one on top of another) using the pandas method `concat()`. We also need to consider the current index labels for each dataset. We will create a new zero-based integer index label for the concatenated dataset by passing the keyword argument `ignore_index=True` into the `concat()` method.

In [7]:
# Concatenate all the datasets into one
wl_strikes_full = pd.concat([wl_strikes_csv, wl_strikes_xls,
                             wl_strikes_json_rename], ignore_index=True)

# Print the shape (number of rows and columns) of the full DataFrame
wl_strikes_full.shape

(4964, 91)

### Merge DataFrames

Our dataset includes a column of species IDs (`SPECIES_ID`) that consist of alpha-numeric codes that reference a specific species of animal. This code is not very helpful if we want to know the species name of an animal involved in a strike. Let's join our dataset with another dataset containing unique species IDs and species names using the shared column `SPECIES_ID` to generate a new column of data (`SPECIES`) containing species name using `merge()`. The URL to the dataset of species IDs and names is stored in the variable `species_names_file_url`.

In [8]:
# Load the species ID table (stored in a CSV file)
species_names_file_url = 'https://raw.githubusercontent.com/NCSU-Libraries/data-viz-workshops/master/Python_Open_Labs/data/FAA_Wildlife_species_id_table.csv'
species_names = pd.read_csv(species_names_file_url)

# Print the loaded species ID table
species_names

Unnamed: 0,SPECIES_ID,SPECIES
0,ZX004,Dark-eyed junco
1,1F132,Common gray fox
2,UNKB,Unknown bird
3,Z6007,American robin
4,N5111,Killdeer
...,...,...
204,J2118,Hooded merganser
205,J2209,Brant
206,J2202,Snow goose
207,I13,Egrets


![Left join visual example](https://github.com/NCSU-Libraries/data-viz-workshops/raw/master/Python_Open_Labs/Data_wrangling_with_Pandas/left-join.png)

In [9]:
# Create a new DataFrame from a "left" join of the full dataset and the species
# ID table based on the shared column "SPECIES_ID"
merged_dataset = pd.merge(wl_strikes_full, species_names, how='left', on='SPECIES_ID')

# Print out the new column in the merged dataset
merged_dataset[['SPECIES', 'SPECIES_ID']]

Unnamed: 0,SPECIES,SPECIES_ID
0,Rock pigeon,O2111
1,Unknown bird - small,UNKBS
2,European starling,YL001
3,Unknown bird - large,UNKBL
4,Unknown bird - medium,UNKBM
...,...,...
4959,Gulls,NE1
4960,Gulls,NE1
4961,Gulls,NE1
4962,Unknown bird - small,UNKBS


### Removing unnecessary columns

We can reduce the size of our dataset by removing unnecessary columns of data using the DataFrame `drop()` method. We will remove the following columns: `"STATE", "FAAREGION", "COMMENT"`.

In [10]:
# Remove the "STATE", "FAAREGION", and "COMMENT" columns using "drop()""
wl_strikes_clean = merged_dataset.drop(columns=['STATE', 'FAAREGION', 'COMMENT'])

# Print out the first five records of the DataFrame
wl_strikes_clean.head()

Unnamed: 0,INDX_NR,INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_YEAR,TIME,TIME_OF_DAY,AIRPORT_ID,AIRPORT,RUNWAY,LOCATION,ENROUTE STATE,OPID,REG,FLT,AIRCRAFT,AMA,AMO,EMA,EMO,AC_CLASS,AC_MASS,TYPE_ENG,NUM_ENGS,ENG_1_POS,ENG_2_POS,ENG_3_POS,ENG_4_POS,PHASE_OF_FLIGHT,HEIGHT,SPEED,DISTANCE,SKY,PRECIPITATION,AOS,COST_REPAIRS,OTHER_COST,COST_REPAIRS_INFL_ADJ,COST_OTHER_INFL_ADJ,INGESTED,INDICATED_DAMAGE,...,STR_ENG2,DAM_ENG2,STR_ENG3,DAM_ENG3,STR_ENG4,DAM_ENG4,STR_PROP,DAM_PROP,STR_WING_ROT,DAM_WING_ROT,STR_FUSE,DAM_FUSE,STR_LG,DAM_LG,STR_TAIL,DAM_TAIL,STR_LGHTS,DAM_LGHTS,STR_OTHER,DAM_OTHER,OTHER_SPECIFY,EFFECT,EFFECT_OTHER,SPECIES_ID,REMARKS,REMAINS_COLLECTED,REMAINS_SENT,WARNED,BIRDS_SEEN,BIRDS_STRUCK,SIZE,NR_INJURIES,NR_FATALITIES,REPORTER_NAME,REPORTER_TITLE,SOURCE,PERSON,LUPDATE,TRANSFER,SPECIES
0,633309,1999-12-24,12,1999,10:15,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,18R,,,USA,N523AU,,B-737-300,148,24.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Landing Roll,0.0,,0.0,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,O2111,4 BIRDS.FLT 1539. STRIKE REPTS DIFFER AS TO PH...,False,False,Unknown,2-10,2-10,Small,,,REDACTED,REDACTED,FAA Form 5200-7,Air Transport Operations,2000-03-10,False,Rock pigeon
1,634726,1999-12-15,12,1999,,Day,KRDU,RALEIGH-DURHAM INTL,23R,,,CDR,C-FPCR,,FOKKER F28 MK 1000,372,4.0,37.0,43.0,A,4.0,D,2.0,5.0,5.0,,,Approach,20.0,130.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,NO EVIDENCE OF BEING STRUCK OR DAMAGE REPTD BY...,False,False,Yes,1,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,,2000-03-09,False,Unknown bird - small
2,636216,1999-12-14,12,1999,07:40,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,18R,,,USA,N955VJ,,DC-9-30,583,21.0,34.0,10.0,A,4.0,D,2.0,5.0,5.0,,,Approach,100.0,132.0,,Some Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,YL001,,False,False,No,2-10,2-10,Small,,,REDACTED,REDACTED,FAA Form 5200-7,Tower,2000-03-09,False,European starling
3,633739,1999-12-11,12,1999,17:00,Dusk,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36R,,,USA,N440US,,B-737-400,148,32.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Climb,100.0,150.0,,Some Cloud,,3.0,7500.0,5000.0,11498.0,7665.0,False,True,...,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,,"None, Precautionary Landing",,UNKBL,FLT 2225. RADOME REPLACED. 2.5 HRS DOWN TIME.,False,False,No,1,1,Large,,,REDACTED,REDACTED,FAA Form 5200-7,Tower,2000-03-09,False,Unknown bird - large
4,636671,1999-12-11,12,1999,,,KRDU,RALEIGH-DURHAM INTL,,,,BLR,N303UE,,BA-41 JETSTR,168,11.0,19.0,4.0,A,3.0,C,2.0,4.0,4.0,,,Climb,,,0.0,,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,Precautionary Landing,,UNKBM,"AFTER ROTATION, FOUL ODOR FILLED FLT DECK. BIR...",False,False,Unknown,,1,Medium,,,REDACTED,REDACTED,Air Transport Report,Air Transport Operations,2001-06-25,False,Unknown bird - medium


### Calculating new columns

#### Create a new column using an expression

We may want to add a new column that is calculated based on other columns. In this example, we create a new column (`SINGLE_OR_MULTI_ENGINE`) of boolean values that tells us if the plane was a single-engine (TRUE) or a multi-engine (FALSE) plane using a comparison operator to test if the value in the column `NUM_ENGS` equals 1.

In [11]:
# Create a new column of boolean values indicating single or multi-engine
wl_strikes_clean['SINGLE_OR_MULTI_ENGINE'] = wl_strikes_clean['NUM_ENGS'] == 1

wl_strikes_clean['SINGLE_OR_MULTI_ENGINE']

0       False
1       False
2       False
3       False
4       False
        ...  
4959    False
4960    False
4961    False
4962    False
4963     True
Name: SINGLE_OR_MULTI_ENGINE, Length: 4964, dtype: bool

#### Create a new column using `apply()`

Sometimes you need to create a new column based on more complex manipulation of existing data. In this example, we use the `apply()` method to apply a function along the rows in the column `TIME` to that parses an integer value of the hour from a string containing the time at which a strike occurred. We create a new column `HOUR` that contains a numerical representation of the hour in which a strike occurred.

In [12]:
# Define a function that takes a time string in the form "HH:MM" and returns the
# hour as an integer if the hour value is valid
def calc_hour(time_str):
    hour = time_str.split(':')[0]
    if hour.strip(' ') != '':
        return int(hour)

# Use the DataFrame apply() method to call calc_hour on the "TIME" column and
# create a new column "HOUR" in our DataFrame
wl_strikes_clean['HOUR'] = wl_strikes_clean['TIME'].apply(calc_hour)

# Print out the "TIME" and "HOUR" columns from our DataFrame
wl_strikes_clean[['TIME', 'HOUR']]

Unnamed: 0,TIME,HOUR
0,10:15,10.0
1,,
2,07:40,7.0
3,17:00,17.0
4,,
...,...,...
4959,10:12,10.0
4960,10:40,10.0
4961,14:42,14.0
4962,,


### Replace values in a column

We can replace values in a column based on conditions, similar to "find and replace." In this example, we make our new `SINGLE_OR_MULTI_ENGINE` column more descriptive by changing `True` into " Single engine" and `False` into "Multi engine".



In [13]:
# Replace True or False values with new strings, "Single engine" or "Multi engine"
wl_strikes_clean['SINGLE_OR_MULTI_ENGINE'].replace(
  {True: 'Single engine', False: 'Multi engine'}, inplace=True
)

wl_strikes_clean[['NUM_ENGS', 'SINGLE_OR_MULTI_ENGINE']]

Unnamed: 0,NUM_ENGS,SINGLE_OR_MULTI_ENGINE
0,2.0,Multi engine
1,2.0,Multi engine
2,2.0,Multi engine
3,2.0,Multi engine
4,2.0,Multi engine
...,...,...
4959,2.0,Multi engine
4960,2.0,Multi engine
4961,2.0,Multi engine
4962,3.0,Multi engine


### Filtering

We can filter our data using conditional statements. This can help us remove unecessary rows of data or observe a specific range of data.

In [14]:
# Filter the data to only see incidents that happened at night
wl_strikes_night = wl_strikes_clean[wl_strikes_clean['TIME_OF_DAY'] == 'Night']

# Print out the new filtered DataFrame
wl_strikes_night

Unnamed: 0,INDX_NR,INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_YEAR,TIME,TIME_OF_DAY,AIRPORT_ID,AIRPORT,RUNWAY,LOCATION,ENROUTE STATE,OPID,REG,FLT,AIRCRAFT,AMA,AMO,EMA,EMO,AC_CLASS,AC_MASS,TYPE_ENG,NUM_ENGS,ENG_1_POS,ENG_2_POS,ENG_3_POS,ENG_4_POS,PHASE_OF_FLIGHT,HEIGHT,SPEED,DISTANCE,SKY,PRECIPITATION,AOS,COST_REPAIRS,OTHER_COST,COST_REPAIRS_INFL_ADJ,COST_OTHER_INFL_ADJ,INGESTED,INDICATED_DAMAGE,...,STR_ENG3,DAM_ENG3,STR_ENG4,DAM_ENG4,STR_PROP,DAM_PROP,STR_WING_ROT,DAM_WING_ROT,STR_FUSE,DAM_FUSE,STR_LG,DAM_LG,STR_TAIL,DAM_TAIL,STR_LGHTS,DAM_LGHTS,STR_OTHER,DAM_OTHER,OTHER_SPECIFY,EFFECT,EFFECT_OTHER,SPECIES_ID,REMARKS,REMAINS_COLLECTED,REMAINS_SENT,WARNED,BIRDS_SEEN,BIRDS_STRUCK,SIZE,NR_INJURIES,NR_FATALITIES,REPORTER_NAME,REPORTER_TITLE,SOURCE,PERSON,LUPDATE,TRANSFER,SPECIES,SINGLE_OR_MULTI_ENGINE,HOUR
6,633113,1999-11-22,11,1999,,Night,KILM,WILMINGTON INTL,17,,,BLR,N331UE,,BA-41 JETSTR,168,11.0,19.0,4.0,A,3.0,C,2.0,4.0,4.0,,,Landing Roll,0.0,,0.0,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,,,,UNKBS,"NO DMG, BUT PER POLICY, PILOT WILL HAVE CONTRA...",False,False,No,11-100,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,Pilot,2005-04-20,False,Unknown bird - small,Multi engine,
8,632444,1999-11-07,11,1999,,Night,KRDU,RALEIGH-DURHAM INTL,5R,,,USA,N707UW,,A-319,04A,6.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,1200.0,150.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,NO DMG FLT 148,False,False,No,2-10,2-10,Small,,,REDACTED,REDACTED,FAA Form 5200-7,Tower,2013-04-16,False,Unknown bird - small,Multi engine,
12,633963,1999-10-22,10,1999,,Night,KGSO,PIEDMONT TRIAD INTL,23,10 MI E,,MDC,N9525B,,C-208,226,42.0,31.0,4.0,A,2.0,C,1.0,7.0,,,,Approach,2500.0,170.0,10.0,Some Cloud,,,,,,,False,True,...,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,,,,UNKBL,"FULL MOON. LRG BIRD, POSSIBLY A GOOSE HIT L WI...",False,False,No,,1,Large,,,REDACTED,REDACTED,Multiple,Tower,2015-09-09,False,Unknown bird - large,Single engine,
16,633693,1999-10-18,10,1999,21:53,Night,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36R,,,USA,N518AU,,B-737-300,148,24.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,3000.0,,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,,False,False,No,,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,,2000-01-11,False,Unknown bird - small,Multi engine,21.0
17,634576,1999-10-18,10,1999,,Night,KRDU,RALEIGH-DURHAM INTL,5L,,,AAL,N861AA,,B-727-200,148,11.0,34.0,10.0,A,4.0,D,3.0,5.0,6.0,5.0,,Approach,500.0,150.0,,Some Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,,False,False,Yes,,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,,2000-01-11,False,Unknown bird - small,Multi engine,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4944,710479,2010-04-23,4,2010,03:41,Night,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,5,,,ASH,N502MJ,7248,CRJ700,188,16.0,22.0,4.0,A,4.0,D,2.0,5.0,5.0,,,Approach,1800.0,170.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,"NO DMG. DATA ENTRY NOTE: # STRUCK NOT REPTD, A...",False,False,Yes,1,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,,2010-08-05,False,Unknown bird - small,Multi engine,3.0
4945,710826,2010-04-23,4,2010,22:20,Night,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,5,,,AWE,N745VJ,1238,A-319,04A,6.0,23.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Climb,5000.0,250.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBM,BOTH PILOTS REPTD FEELING A BIRDSTRIKE ON CLIM...,False,False,No,,1,Medium,,,REDACTED,REDACTED,FAA Form 5200-7-E,Air Transport Operations,2010-08-05,False,Unknown bird - medium,Multi engine,22.0
4950,709949,2010-03-21,3,2010,,Night,KRDU,RALEIGH-DURHAM INTL,23R,,,DAL,N316NB,,A-319,04A,6.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,1200.0,170.0,,Overcast,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,,False,False,No,,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,,2010-07-08,False,Unknown bird - small,Multi engine,
4954,709805,2010-02-21,2,2010,21:15,Night,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,18C,,,AWE,N426US,,B-737-400,148,32.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,4000.0,170.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,UNKBM,NO A/C DMG.,False,False,No,,1,Medium,,,REDACTED,REDACTED,FAA Form 5200-7,Pilot,2010-06-09,False,Unknown bird - medium,Multi engine,21.0


In [16]:
# Filter the data to only see data from 2010 and after
wl_strikes_10s = wl_strikes_clean[wl_strikes_clean['INCIDENT_YEAR'] >= 2010]

# Print out the new filtered DataFrame
wl_strikes_10s

Unnamed: 0,INDX_NR,INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_YEAR,TIME,TIME_OF_DAY,AIRPORT_ID,AIRPORT,RUNWAY,LOCATION,ENROUTE STATE,OPID,REG,FLT,AIRCRAFT,AMA,AMO,EMA,EMO,AC_CLASS,AC_MASS,TYPE_ENG,NUM_ENGS,ENG_1_POS,ENG_2_POS,ENG_3_POS,ENG_4_POS,PHASE_OF_FLIGHT,HEIGHT,SPEED,DISTANCE,SKY,PRECIPITATION,AOS,COST_REPAIRS,OTHER_COST,COST_REPAIRS_INFL_ADJ,COST_OTHER_INFL_ADJ,INGESTED,INDICATED_DAMAGE,...,STR_ENG3,DAM_ENG3,STR_ENG4,DAM_ENG4,STR_PROP,DAM_PROP,STR_WING_ROT,DAM_WING_ROT,STR_FUSE,DAM_FUSE,STR_LG,DAM_LG,STR_TAIL,DAM_TAIL,STR_LGHTS,DAM_LGHTS,STR_OTHER,DAM_OTHER,OTHER_SPECIFY,EFFECT,EFFECT_OTHER,SPECIES_ID,REMARKS,REMAINS_COLLECTED,REMAINS_SENT,WARNED,BIRDS_SEEN,BIRDS_STRUCK,SIZE,NR_INJURIES,NR_FATALITIES,REPORTER_NAME,REPORTER_TITLE,SOURCE,PERSON,LUPDATE,TRANSFER,SPECIES,SINGLE_OR_MULTI_ENGINE,HOUR
1630,1080125,2020-11-05,11,2020,05:00,,KRDU,RALEIGH-DURHAM INTL,23R,,,UPS,N253UP,1276,MD-11,583,39.0,22.0,7.0,A,4.0,D,3.0,1.0,6.0,1.0,,Approach,,,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,ZX004,"Fuselage, above FO clearview sliding window.",True,True,Unknown,,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False,Dark-eyed junco,Multi engine,5.0
1631,1080118,2020-11-05,11,2020,22:35,,KGSO,PIEDMONT TRIAD INTL,5R,,,FDX,N950FD,1624,B-757-200,148,26.0,37.0,37.0,A,4.0,D,2.0,1.0,1.0,,,Take-off Run,0.0,,0.0,Some Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,part not reported,,,1F132,Aircraft landed RWY 5R and reported a dead fox...,True,False,Unknown,1,1,Large,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False,Common gray fox,Multi engine,22.0
1632,1080126,2020-11-05,11,2020,11:00,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36C,,,AAL,N809NN,1433,B-737-800,148,43.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,,,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKB,Aircraft reportedly struck a bird during appro...,True,False,Yes,,1,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False,Unknown bird,Multi engine,11.0
1633,1080130,2020-11-05,11,2020,06:52,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36C,,,RPA,N643RW,3512,EMB-170,332,,22.0,4.0,A,4.0,D,2.0,1.0,1.0,,,Climb,,,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,PART NOT REPORTED,,,UNKB,Flight crew reported to CLT ATC that the aircr...,False,False,Yes,,1,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False,Unknown bird,Multi engine,6.0
1634,1078243,2020-11-04,11,2020,05:20,Dawn,KRDU,RALEIGH-DURHAM INTL,23R,,,UPS,N259UP,1276,MD-11,583,39.0,22.0,7.0,A,4.0,D,3.0,1.0,6.0,1.0,,Approach,,,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,Z6007,Captain side - clearview window forward frame....,True,True,Unknown,,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-11-27,False,American robin,Multi engine,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4959,709088,2010-01-18,1,2010,10:12,Day,KRDU,RALEIGH-DURHAM INTL,5L,,,EGF,N837AE,,EMB-135,332,13.0,1.0,10.0,A,3.0,D,2.0,5.0,5.0,,,Take-off Run,0.0,,0.0,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,PART NOT REPTD,,,NE1,EGF4457 ADVISED THEY HAD A POSSIBLE BIRDSTRIKE...,True,False,Unknown,11-100,1,Medium,,,REDACTED,REDACTED,FAA Form 5200-7,,2010-05-28,False,Gulls,Multi engine,10.0
4960,709085,2010-01-17,1,2010,10:40,Day,KRDU,RALEIGH-DURHAM INTL,23R,,,UAL,N421UA,197,A-320,04A,3.0,23.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Climb,20.0,140.0,0.0,Overcast,Rain,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,,,,NE1,BIRDS WERE PROBABLY GULLS ACCORDING TO ARPT MGR.,False,False,Yes,11-100,2-10,Medium,,,REDACTED,REDACTED,Multiple,Air Transport Operations,2010-05-28,False,Gulls,Multi engine,10.0
4961,710034,2010-01-17,1,2010,14:42,Day,KRDU,RALEIGH-DURHAM INTL,23R,,,TCF,,1946,EMB-170,332,,22.0,4.0,A,4.0,D,2.0,1.0,1.0,,,Take-off Run,0.0,,0.0,Overcast,Rain,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,NE1,,False,False,Yes,11-100,1,Medium,,,REDACTED,REDACTED,FAA Form 5200-7,Tower,2010-05-28,False,Gulls,Multi engine,14.0
4962,709048,2010-01-11,1,2010,,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,23,,,BUS,N520AF,,DA-50 FALCON,300,8.0,19.0,1.0,A,3.0,D,3.0,5.0,6.0,5.0,,Approach,800.0,150.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,NO DMG. REMVD SMALL PATCH OF REMAINS FROM TOP ...,False,False,No,1,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,Pilot,2010-05-28,False,Unknown bird - small,Multi engine,


In [17]:
# Filter the data to only include incidents from 2010 and after that happened at night
wl_strikes_night_10s = wl_strikes_clean[(wl_strikes_clean['INCIDENT_YEAR'] >= 2010) & (wl_strikes_clean['TIME_OF_DAY'] == 'Night')]
wl_strikes_night_10s

Unnamed: 0,INDX_NR,INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_YEAR,TIME,TIME_OF_DAY,AIRPORT_ID,AIRPORT,RUNWAY,LOCATION,ENROUTE STATE,OPID,REG,FLT,AIRCRAFT,AMA,AMO,EMA,EMO,AC_CLASS,AC_MASS,TYPE_ENG,NUM_ENGS,ENG_1_POS,ENG_2_POS,ENG_3_POS,ENG_4_POS,PHASE_OF_FLIGHT,HEIGHT,SPEED,DISTANCE,SKY,PRECIPITATION,AOS,COST_REPAIRS,OTHER_COST,COST_REPAIRS_INFL_ADJ,COST_OTHER_INFL_ADJ,INGESTED,INDICATED_DAMAGE,...,STR_ENG3,DAM_ENG3,STR_ENG4,DAM_ENG4,STR_PROP,DAM_PROP,STR_WING_ROT,DAM_WING_ROT,STR_FUSE,DAM_FUSE,STR_LG,DAM_LG,STR_TAIL,DAM_TAIL,STR_LGHTS,DAM_LGHTS,STR_OTHER,DAM_OTHER,OTHER_SPECIFY,EFFECT,EFFECT_OTHER,SPECIES_ID,REMARKS,REMAINS_COLLECTED,REMAINS_SENT,WARNED,BIRDS_SEEN,BIRDS_STRUCK,SIZE,NR_INJURIES,NR_FATALITIES,REPORTER_NAME,REPORTER_TITLE,SOURCE,PERSON,LUPDATE,TRANSFER,SPECIES,SINGLE_OR_MULTI_ENGINE,HOUR
1635,1080107,2020-11-02,11,2020,21:45,Night,KILM,WILMINGTON INTL,24,,,DAL,,5060,CRJ900,188,17.0,22.0,4.0,A,4.0,D,2.0,5.0,5.0,,,Approach,150.0,,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,N5111,N7607LR REPORTED AS REGISTRATION NUMBER BUT NO...,True,True,Unknown,1,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-12-04,False,Killdeer,Multi engine,21.0
1637,1067129,2020-10-30,10,2020,22:40,Night,KOAJ,ALBERT J ELLIS,5,,,EDV,N918XJ,4726,CRJ900,188,17.0,22.0,4.0,A,4.0,D,2.0,5.0,5.0,,,Approach,,135.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,,,,UNKB,Final approach for runway 5. NOTE: NUMBER STRU...,False,False,No,,1,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Pilot,2020-11-13,False,Unknown bird,Multi engine,22.0
1638,1066925,2020-10-27,10,2020,03:34,Night,KGSO,PIEDMONT TRIAD INTL,23L,,,FDX,N973FD,1061,B-757-200,148,26.0,37.0,37.0,A,4.0,D,2.0,1.0,1.0,,,Climb,6500.0,245.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,No injuries or visible damage,False,False,Unknown,1,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7-E,Pilot,2020-11-12,False,Unknown bird - small,Multi engine,3.0
1639,1066740,2020-10-26,10,2020,21:37,Night,KOAJ,ALBERT J ELLIS,5/23,,,EDV,N695CA,4726,CRJ900,188,17.0,22.0,4.0,A,4.0,D,2.0,5.0,5.0,,,Descent,15000.0,290.0,80.0,No Cloud,,,,,,,False,True,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBM,Lake Waccamaw. NOTE: WAITING FOR SI ID AS OF 1...,True,False,Unknown,1,1,Medium,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2020-11-10,False,Unknown bird - medium,Multi engine,21.0
1642,1062378,2020-10-19,10,2020,22:11,Night,KRDU,RALEIGH-DURHAM INTL,,,,AAL,N740UW,1605,A-319,04A,6.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,1500.0,133.0,4.0,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,,False,False,Yes,1,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7-E,Pilot,2020-11-03,False,Unknown bird - small,Multi engine,22.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4944,710479,2010-04-23,4,2010,03:41,Night,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,5,,,ASH,N502MJ,7248,CRJ700,188,16.0,22.0,4.0,A,4.0,D,2.0,5.0,5.0,,,Approach,1800.0,170.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,"NO DMG. DATA ENTRY NOTE: # STRUCK NOT REPTD, A...",False,False,Yes,1,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,,2010-08-05,False,Unknown bird - small,Multi engine,3.0
4945,710826,2010-04-23,4,2010,22:20,Night,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,5,,,AWE,N745VJ,1238,A-319,04A,6.0,23.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Climb,5000.0,250.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBM,BOTH PILOTS REPTD FEELING A BIRDSTRIKE ON CLIM...,False,False,No,,1,Medium,,,REDACTED,REDACTED,FAA Form 5200-7-E,Air Transport Operations,2010-08-05,False,Unknown bird - medium,Multi engine,22.0
4950,709949,2010-03-21,3,2010,,Night,KRDU,RALEIGH-DURHAM INTL,23R,,,DAL,N316NB,,A-319,04A,6.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,1200.0,170.0,,Overcast,,,,,,,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,,,,UNKBS,,False,False,No,,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,,2010-07-08,False,Unknown bird - small,Multi engine,
4954,709805,2010-02-21,2,2010,21:15,Night,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,18C,,,AWE,N426US,,B-737-400,148,32.0,10.0,1.0,A,4.0,D,2.0,1.0,1.0,,,Approach,4000.0,170.0,,No Cloud,,,,,,,False,False,...,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,,,,UNKBM,NO A/C DMG.,False,False,No,,1,Medium,,,REDACTED,REDACTED,FAA Form 5200-7,Pilot,2010-06-09,False,Unknown bird - medium,Multi engine,21.0




---


## Open work time
You can use this time to ask questions, collaborate, or work on the following activities (on your own or in a group)

### Exercise 1: Rename column headers

Rename the column `REG` to the more descriptive `AIRCRAFT_REGISTRATION`

In [18]:
# Change the column name "REG" to "AIRCRAFT_REGISTRATION"
wl_strikes_clean_rename = wl_strikes_clean.rename(columns={'REG':'AIRCRAFT_REGISTRATION'})

# Print out the new DataFrame columns
wl_strikes_clean_rename.columns

Index(['INDX_NR', 'INCIDENT_DATE ', 'INCIDENT_MONTH', 'INCIDENT_YEAR', 'TIME',
       'TIME_OF_DAY', 'AIRPORT_ID', 'AIRPORT', 'RUNWAY', 'LOCATION',
       'ENROUTE STATE', 'OPID', 'AIRCRAFT_REGISTRATION', 'FLT', 'AIRCRAFT',
       'AMA', 'AMO', 'EMA', 'EMO', 'AC_CLASS', 'AC_MASS', 'TYPE_ENG',
       'NUM_ENGS', 'ENG_1_POS', 'ENG_2_POS', 'ENG_3_POS', 'ENG_4_POS',
       'PHASE_OF_FLIGHT', 'HEIGHT', 'SPEED', 'DISTANCE', 'SKY',
       'PRECIPITATION', 'AOS', 'COST_REPAIRS', 'OTHER_COST',
       'COST_REPAIRS_INFL_ADJ', 'COST_OTHER_INFL_ADJ', 'INGESTED',
       'INDICATED_DAMAGE', 'DAMAGE_LEVEL', 'STR_RAD', 'DAM_RAD',
       'STR_WINDSHLD', 'DAM_WINDSHLD', 'STR_NOSE', 'DAM_NOSE', 'STR_ENG1',
       'DAM_ENG1', 'STR_ENG2', 'DAM_ENG2', 'STR_ENG3', 'DAM_ENG3', 'STR_ENG4',
       'DAM_ENG4', 'STR_PROP', 'DAM_PROP', 'STR_WING_ROT', 'DAM_WING_ROT',
       'STR_FUSE', 'DAM_FUSE', 'STR_LG', 'DAM_LG', 'STR_TAIL', 'DAM_TAIL',
       'STR_LGHTS', 'DAM_LGHTS', 'STR_OTHER', 'DAM_OTHER', 'OTHER_SPECIFY'

### Exercise 2: Remove unnecessary columns

The are several columns of data that are not relevant for our analyses. Remove all columns related to engine and damage location (e.g., all the columns that begin with `ENG_`, `DAM_`, and `STR_`). A list of these column names is provided in the variable `drop_columns`.

**Bonus:** See if you can derive the column names in the list `drop_columns` from the dataset

In [19]:
# A list of column names to remove from the DataFrame
drop_columns = ['ENG_1_POS', 'ENG_2_POS', 'ENG_3_POS', 'ENG_4_POS', 'STR_RAD',
                'DAM_RAD', 'STR_WINDSHLD', 'DAM_WINDSHLD', 'STR_NOSE',
                'DAM_NOSE', 'STR_ENG1', 'DAM_ENG1', 'STR_ENG2', 'DAM_ENG2',
                'STR_ENG3', 'DAM_ENG3', 'STR_ENG4', 'DAM_ENG4', 'STR_PROP',
                'DAM_PROP', 'STR_WING_ROT', 'DAM_WING_ROT', 'STR_FUSE',
                'DAM_FUSE', 'STR_LG', 'DAM_LG', 'STR_TAIL', 'DAM_TAIL',
                'STR_LGHTS', 'DAM_LGHTS', 'STR_OTHER', 'DAM_OTHER']

# Remove unnecessary columns
wl_strikes_clean_drop = wl_strikes_clean_rename.drop(columns=drop_columns)

# Print out the new DataFrame columns
wl_strikes_clean_drop.columns

Index(['INDX_NR', 'INCIDENT_DATE ', 'INCIDENT_MONTH', 'INCIDENT_YEAR', 'TIME',
       'TIME_OF_DAY', 'AIRPORT_ID', 'AIRPORT', 'RUNWAY', 'LOCATION',
       'ENROUTE STATE', 'OPID', 'AIRCRAFT_REGISTRATION', 'FLT', 'AIRCRAFT',
       'AMA', 'AMO', 'EMA', 'EMO', 'AC_CLASS', 'AC_MASS', 'TYPE_ENG',
       'NUM_ENGS', 'PHASE_OF_FLIGHT', 'HEIGHT', 'SPEED', 'DISTANCE', 'SKY',
       'PRECIPITATION', 'AOS', 'COST_REPAIRS', 'OTHER_COST',
       'COST_REPAIRS_INFL_ADJ', 'COST_OTHER_INFL_ADJ', 'INGESTED',
       'INDICATED_DAMAGE', 'DAMAGE_LEVEL', 'OTHER_SPECIFY', 'EFFECT',
       'EFFECT_OTHER', 'SPECIES_ID', 'REMARKS', 'REMAINS_COLLECTED',
       'REMAINS_SENT', 'WARNED', 'BIRDS_SEEN', 'BIRDS_STRUCK', 'SIZE',
       'NR_INJURIES', 'NR_FATALITIES', 'REPORTER_NAME', 'REPORTER_TITLE',
       'SOURCE', 'PERSON', 'LUPDATE', 'TRANSFER', 'SPECIES',
       'SINGLE_OR_MULTI_ENGINE', 'HOUR'],
      dtype='object')

### Exercise 3: Filter out unnecessary rows

Our dataset should only contain data from the years 1990-2019. Remove any rows of data that contain strikes that occurred outside of this year range.

In [20]:
# Filter out rows of data that contain strikes that occurred outside of the year
# range 1990-2019
wl_strikes_clean_filter = wl_strikes_clean_drop[
                            (wl_strikes_clean_drop['INCIDENT_YEAR'] >= 1990)
                            & (wl_strikes_clean_drop['INCIDENT_YEAR'] <= 2019)]

# Print out the new filtered DataFrame
wl_strikes_clean_filter.sort_values('INCIDENT_YEAR')

Unnamed: 0,INDX_NR,INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_YEAR,TIME,TIME_OF_DAY,AIRPORT_ID,AIRPORT,RUNWAY,LOCATION,ENROUTE STATE,OPID,AIRCRAFT_REGISTRATION,FLT,AIRCRAFT,AMA,AMO,EMA,EMO,AC_CLASS,AC_MASS,TYPE_ENG,NUM_ENGS,PHASE_OF_FLIGHT,HEIGHT,SPEED,DISTANCE,SKY,PRECIPITATION,AOS,COST_REPAIRS,OTHER_COST,COST_REPAIRS_INFL_ADJ,COST_OTHER_INFL_ADJ,INGESTED,INDICATED_DAMAGE,DAMAGE_LEVEL,OTHER_SPECIFY,EFFECT,EFFECT_OTHER,SPECIES_ID,REMARKS,REMAINS_COLLECTED,REMAINS_SENT,WARNED,BIRDS_SEEN,BIRDS_STRUCK,SIZE,NR_INJURIES,NR_FATALITIES,REPORTER_NAME,REPORTER_TITLE,SOURCE,PERSON,LUPDATE,TRANSFER,SPECIES,SINGLE_OR_MULTI_ENGINE,HOUR
628,609817,1990-10-13,10,1990,,Night,KRDU,RALEIGH-DURHAM INTL,,,,AAL,,,B-727,148,94.0,34.0,10.0,A,4.0,D,3.0,Approach,1000.0,140.0,,No Cloud,,,,,,,False,False,N,,,,UNKBS,NO DAMAGE.,False,False,No,1,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,,1998-12-04,False,Unknown bird - small,Multi engine,
640,608727,1990-09-15,9,1990,,Day,KRDU,RALEIGH-DURHAM INTL,,,,USA,N205AU,,B-737-200,148,13.0,34.0,10.0,A,4.0,D,2.0,Approach,6000.0,250.0,,No Cloud,,,,,,,False,False,N,,,,UNKBM,FLT 856. NO DAMAGE.,False,False,No,,1,Medium,,,REDACTED,REDACTED,FAA Form 5200-7,Pilot,1999-01-12,False,Unknown bird - medium,Multi engine,
639,610993,1990-09-16,9,1990,,Night,KRDU,RALEIGH-DURHAM INTL,,,,AAL,,,B-727,148,94.0,34.0,10.0,A,4.0,D,3.0,Climb,6000.0,250.0,,No Cloud,,,,,,,False,False,N,,,,UNKBM,"# BIRDS NOT REPTD, ASSUME 1. NO DAMAGE.",False,False,No,,1,Medium,,,REDACTED,REDACTED,FAA Form 5200-7,,1999-01-12,False,Unknown bird - medium,Multi engine,
638,610762,1990-09-18,9,1990,,Day,KRDU,RALEIGH-DURHAM INTL,5L,,,AAL,N1976,,B-727-100,148,10.0,34.0,10.0,A,4.0,D,3.0,Approach,300.0,130.0,,No Cloud,,,,,,,False,False,N,,,,YL001,NO APPARENT DAMAGE TO AIRCRAFT. BIRD STRIKE R...,False,False,No,1,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7,Pilot,1999-01-21,False,European starling,Multi engine,
637,610180,1990-09-23,9,1990,,Night,KRDU,RALEIGH-DURHAM INTL,,,,AAL,,,MD-80,583,37.0,34.0,10.0,A,4.0,D,2.0,Approach,3000.0,230.0,,No Cloud,,,,,,,False,False,N,,,,J21,NO DAMAGE,False,False,No,1,1,Medium,,,REDACTED,REDACTED,FAA Form 5200-7,Pilot,1998-12-22,False,Ducks,Multi engine,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2186,895427,2019-07-05,7,2019,19:7,Dusk,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,18L,,,AAL,N915US,2068,A-321,04A,7.0,23.0,1.0,A,4.0,D,2.0,Landing Roll,0.0,,0.0,Some Cloud,,,,,,,False,False,N,,,,UNKB,The inbound flight crew reported seeing and st...,True,True,Yes,1,1,,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2019-10-29,False,Unknown bird,Multi engine,19.0
2187,863792,2019-07-04,7,2019,7:,,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36L,,,UNK,,,UNKNOWN,,,,,,,,,,,,0.0,,,,,,,,False,False,,,,,YL001,One small bird carcass was found and retrieved...,True,True,Unknown,,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7-E,Carcass Found,2019-08-27,False,European starling,Multi engine,7.0
2188,863739,2019-07-03,7,2019,7:,,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,36L,,,UNK,,,UNKNOWN,,,,,,,,,,,,0.0,,,,,,,,False,False,,,,,YL001,CLT Airside Operations personnel found and ret...,True,True,Unknown,,2-10,Small,,,REDACTED,REDACTED,FAA Form 5200-7-E,Carcass Found,2019-08-26,False,European starling,Multi engine,7.0
2175,897233,2019-07-10,7,2019,19:42,Day,KCLT,CHARLOTTE/DOUGLAS INTL ARPT,18R,,,JIA,N708PS,5107,CRJ700,188,16.0,22.0,4.0,A,4.0,D,2.0,Approach,,,,No Cloud,,,,,,,False,False,N,,,,YI005,The inbound flight crew reported to CLT Air Tr...,True,True,Yes,,1,Small,,,REDACTED,REDACTED,FAA Form 5200-7-E,Airport Operations,2019-11-15,False,Barn swallow,Multi engine,19.0


### Exercise 4: Join airline operator names with the full dataset

Our dataset contains a column of airline operator IDs (`OPID`). These IDs correspond with airline operator names (e.g., Delta Airlines, Military, United Airlines, etc.). We have another dataset that contains arline operator IDs (in a column named `OPID`) and the corresponding airline operator name (in a column named `OPERATOR`). The URL to this dataset is stored in the variable `op_name_file_url`. Load this dataset and use a left join to merge the operator name with the full dataset.

In [21]:
# URL to the CSV file containing unique airline operator IDs and names
op_name_file_url = 'https://github.com/NCSU-Libraries/data-viz-workshops/blob/master/Python_Open_Labs/data/FAA_Wildlife_operator_id_table.csv?raw=true'

# Load the operator ID and name dataset into a DataFrame
operator_name_table = pd.read_csv(op_name_file_url)

# Join airline operator names to the full dataset using matching operater IDs
wl_strikes_clean_join = pd.merge(wl_strikes_clean_filter, operator_name_table,
                                 how='left', on='OPID')

# Print out the columns "OPID" and "OPERATOR" from the new merged DataFrame
wl_strikes_clean_join[['OPID', 'OPERATOR']]

Unnamed: 0,OPID,OPERATOR
0,USA,1US AIRWAYS
1,CDR,CANADIAN REGIONAL AIRLINES
2,USA,1US AIRWAYS
3,USA,1US AIRWAYS
4,BLR,ATLANTIC COAST AIRLINES
...,...,...
4746,EGF,AMERICAN EAGLE AIRLINES
4747,UAL,UNITED AIRLINES
4748,TCF,SHUTTLE AMERICA
4749,BUS,BUSINESS


### Exercise 5: Create a new column containing month names

Our dataset currently contains a column of integer values representing the month number in which a strike occurred (1-12). It would be helpful to have a column containing the month name (e.g., January, February, etc.). Calculate a new column labeled `MONTH_NAME` containing the month name in which a stike occurred.

**TIP:** There are multiple ways you could consider creating this new column (e.g., using `replace()` or `apply()`), but it might be helpful to have a way to map month numbers (1-12) to month names (January - December) (e.g., a list or dictionary).

In [22]:
# METHOD 1: Create a dictionary of month number keys (1-12) and matching month
# name values ("January" - "December") (e.g., {1: 'January', 2: 'February', ...})
# and use the replace() method on the column "INCIDENT_MONTH" containing month
# number values.

# Create a month name lookup dictionary (key = month number, value = month name)
month_lookup = {1: 'January', 2: 'February', 3: 'March', 4: 'April',
                    5: 'May', 6: 'June', 7: 'July', 8: 'August', 9: 'September',
                    10: 'October', 11: 'November', 12: 'December'}

# Create the new column using the replace() method on the column "INCIDENT_MONTH"
wl_strikes_clean_join['MONTH_NAME'] = wl_strikes_clean_join['INCIDENT_MONTH'].replace(
    month_lookup)

# Print out the columns "INCIDENT_MONTH" and "MONTH_NAME"
wl_strikes_clean_join[['INCIDENT_MONTH', 'MONTH_NAME']]      

Unnamed: 0,INCIDENT_MONTH,MONTH_NAME
0,12,December
1,12,December
2,12,December
3,12,December
4,12,December
...,...,...
4746,1,January
4747,1,January
4748,1,January
4749,1,January


In [24]:
# METHOD 2: Create a function that returns a month name based on a provided 
# month number value. Use apply on the "INCIDENT_MONTH" column to create the new
# column.

# Create a function that takes in a month number argument (1-12) and returns the
# month name based on this number (month number matches the values in the column
# "INCIDENT MONTH")
def calc_month_name(month_num):
    # Create a month name lookup dictionary (key = month number, value = month name)
    month_lookup = {1: 'January', 2: 'February', 3: 'March', 4: 'April',
                    5: 'May', 6: 'June', 7: 'July', 8: 'August', 9: 'September',
                    10: 'October', 11: 'November', 12: 'December'}
    return month_lookup[month_num]

# Create the new column using apply() and calling calc_month_name
wl_strikes_clean_join['MONTH_NAME'] = wl_strikes_clean_join['INCIDENT_MONTH'].apply(calc_month_name)

# Print out the columns "INCIDENT_MONTH" and "MONTH_NAME"
wl_strikes_clean_join[['INCIDENT_MONTH', 'MONTH_NAME']]

Unnamed: 0,INCIDENT_MONTH,MONTH_NAME
0,12,December
1,12,December
2,12,December
3,12,December
4,12,December
...,...,...
4746,1,January
4747,1,January
4748,1,January
4749,1,January


In [25]:
# METHOD 3: Use a list comprehension to loop over the values in the column
# "INCIDENT_MONTH" and use the value as a key to access the month name from the
# month_lookup dictionary

# Create a month name lookup dictionary (key = month number, value = month name)
month_lookup = {1: 'January', 2: 'February', 3: 'March', 4: 'April',
                    5: 'May', 6: 'June', 7: 'July', 8: 'August', 9: 'September',
                    10: 'October', 11: 'November', 12: 'December'}

wl_strikes_clean_join['MONTH_NAME'] = [month_lookup[val] for val in wl_strikes_clean_join['INCIDENT_MONTH']]

# Print out the columns "INCIDENT_MONTH" and "MONTH_NAME"
wl_strikes_clean_join[['INCIDENT_MONTH', 'MONTH_NAME']] 

Unnamed: 0,INCIDENT_MONTH,MONTH_NAME
0,12,December
1,12,December
2,12,December
3,12,December
4,12,December
...,...,...
4746,1,January
4747,1,January
4748,1,January
4749,1,January


In [26]:
# METHOD 4: Use pandas DateTime functionality to convert the "INCIDENT DATE "
# column of strings to a datetime data type "pd.to_datetime()" and return the
# month name of the new datetime data "dt.month_name()"

wl_strikes_clean_join['MONTH_NAME'] = pd.to_datetime(
    wl_strikes_clean_join['INCIDENT_DATE ']).dt.month_name()

# Print out the columns "INCIDENT_MONTH" and "MONTH_NAME"
wl_strikes_clean_join[['INCIDENT_MONTH', 'MONTH_NAME']]

Unnamed: 0,INCIDENT_MONTH,MONTH_NAME
0,12,December
1,12,December
2,12,December
3,12,December
4,12,December
...,...,...
4746,1,January
4747,1,January
4748,1,January
4749,1,January


## Further resources

### Filled version of this notebook

[Python Open Labs Week 2 filled notebook](https://colab.research.google.com/github/NCSU-Libraries/data-viz-workshops/blob/master/Python_Open_Labs/Data_wrangling_with_Pandas/Python_Open_Labs_Week2_filled.ipynb) - a version of this notebook with all code filled in for the guided activity and exercises.

### Learning resources

- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html) - a free, online version of Jake VanderPlas' introduction to data science with Python, includes a chapter on data manipulation with pandas.
- [Python Programming for Data Science](https://www.tomasbeuzen.com/python-programming-for-data-science/README.html) - a website providing a great overview of conducting data science with Python including pandas.

### Finding help with pandas

The [Pandas website](https://pandas.pydata.org/) and [online documentation](http://pandas.pydata.org/pandas-docs/stable/) are useful resources, and of course the indispensible [Stack Overflow has a "pandas" tag](https://stackoverflow.com/questions/tagged/pandas).  There is also a (much younger, much smaller) [sister site dedicated to Data Science questions that has a "pandas" tag](https://datascience.stackexchange.com/questions/tagged/pandas) too.

## Evaluation Survey
Please, spend 1 minute answering these questions that help improve future workshops.

https://go.ncsu.edu/dvs-eval

## Credits

This workshop was created by Walt Gurley and Claire Cahoon, adapted from previous workshop materials by Scott Bailey and Simon Wiles, of Stanford Libraries.