# More transformations in Pandas

In this lecture we will learn about a few critical transformations that are frequently used. Those are:

- Joins
- Pivot Table
- Group by

Part of the magic of using pandas for doing transformations is that aggregation functions for **Group By** can be custom. We will solve a problem which requires us to do that.

In addition to that, we will learn a bit about MultiIndex - which is something we have to know as it's automatically created when we do a pivot table or a group by transormation.



In [1]:
import pandas as pd

In [3]:
codes_df = pd.read_csv('airport-codes.csv')
del codes_df['continent']
del codes_df['iata_code']
codes_df = codes_df[codes_df['type'] != 'closed']
codes_df.head(5)

Unnamed: 0,ident,type,name,elevation_ft,iso_country,iso_region,municipality,gps_code,local_code,coordinates
0,00A,heliport,Total Rf Heliport,11.0,US,US-PA,Bensalem,00A,00A,"-74.93360137939453, 40.07080078125"
1,00AA,small_airport,Aero B Ranch Airport,3435.0,US,US-KS,Leoti,00AA,00AA,"-101.473911, 38.704022"
2,00AK,small_airport,Lowell Field,450.0,US,US-AK,Anchor Point,00AK,00AK,"-151.695999146, 59.94919968"
3,00AL,small_airport,Epps Airpark,820.0,US,US-AL,Harvest,00AL,00AL,"-86.77030181884766, 34.86479949951172"
5,00AS,small_airport,Fulton Airport,1100.0,US,US-OK,Alex,00AS,00AS,"-97.8180194, 34.9428028"


In [5]:
stats_df = pd.read_csv('us_air_transport_stats.csv')
stats_df.head(5)

Unnamed: 0,PASSENGERS,FREIGHT,MAIL,DISTANCE,UNIQUE_CARRIER,AIRLINE_ID,CARRIER,CARRIER_NAME,ORIGIN_AIRPORT_ID,ORIGIN,DEST_AIRPORT_ID,DEST,YEAR,MONTH
0,0,0,0,156,0MQ,21253.0,0MQ,"Multi-Aero, Inc. d/b/a Air Choice One",13930.0,ORD,11288.0,DEC,2018,9
1,0,0,0,110,0MQ,21253.0,0MQ,"Multi-Aero, Inc. d/b/a Air Choice One",15016.0,STL,11288.0,DEC,2018,9
2,0,0,0,949,0WQ,21352.0,0WQ,Avjet Corporation,10279.0,AMA,10800.0,BUR,2018,9
3,0,0,0,725,0WQ,21352.0,0WQ,Avjet Corporation,10372.0,ASE,10800.0,BUR,2018,9
4,0,0,0,18,0WQ,21352.0,0WQ,Avjet Corporation,10800.0,BUR,12892.0,LAX,2018,9


## Join and Pivot

Let's solve the problem of comparing the number of passengers flown by each major airline flying out the large airports in the US. Here are the steps we need to perform to be able to do that:

1. Identify all large airports in the US from airport codes dataframe.
2. Inner join the stats df with a df containing only the large airports
3. pivot on the origin airport code to get the stats for each airline in different columns.
4. Filter only large airlines.

In [77]:
codes_df['type'].unique()

array(['heliport', 'small_airport', 'seaplane_base', 'balloonport',
       'medium_airport', 'large_airport'], dtype=object)

In [78]:
large_airports_df = codes_df[(codes_df['type'] == 'large_airport') & (codes_df['iso_country'] == 'US')][['iso_region', 'name', 'municipality', 'local_code']]
large_airports_df.head(10)

Unnamed: 0,iso_region,name,municipality,local_code
25914,US-NM,Albuquerque International Sunport,Albuquerque,ABQ
25933,US-MD,Andrews Air Force Base,Camp Springs,ADW
25946,US-TX,Fort Worth Alliance Airport,Fort Worth,AFW
25950,US-GA,Augusta Regional At Bush Field,Augusta,AGS
25983,US-TX,Rick Husband Amarillo International Airport,Amarillo,AMA
26035,US-GA,Hartsfield Jackson Atlanta International Airport,Atlanta,ATL
26044,US-TX,Austin Bergstrom International Airport,Austin,AUS
26048,US-NC,Asheville Regional Airport,Asheville,AVL
26072,US-CA,Beale Air Force Base,Marysville,BAB
26073,US-LA,Barksdale Air Force Base,Bossier City,BAD


In [14]:
large_airport_stats_df = pd.merge(stats_df, large_airports_df, left_on='ORIGIN', right_on='local_code', how='inner')
large_airport_stats_df.head(10)

Unnamed: 0,PASSENGERS,FREIGHT,MAIL,DISTANCE,UNIQUE_CARRIER,AIRLINE_ID,CARRIER,CARRIER_NAME,ORIGIN_AIRPORT_ID,ORIGIN,DEST_AIRPORT_ID,DEST,YEAR,MONTH,iso_region,name,municipality,local_code
0,0,0,0,156,0MQ,21253.0,0MQ,"Multi-Aero, Inc. d/b/a Air Choice One",13930.0,ORD,11288.0,DEC,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD
1,0,1309768,0,2846,KAQ,20370.0,KAQ,Kalitta Air LLC,13930.0,ORD,10299.0,ANC,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD
2,0,748398,0,606,KAQ,20370.0,KAQ,Kalitta Air LLC,13930.0,ORD,10397.0,ATL,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD
3,0,1476087,0,264,KAQ,20370.0,KAQ,Kalitta Air LLC,13930.0,ORD,11193.0,CVG,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD
4,0,397114,0,740,KAQ,20370.0,KAQ,Kalitta Air LLC,13930.0,ORD,12478.0,JFK,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD
5,0,196912,0,1197,KAQ,20370.0,KAQ,Kalitta Air LLC,13930.0,ORD,13303.0,MIA,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD
6,0,0,0,1846,KAQ,20370.0,KAQ,Kalitta Air LLC,13930.0,ORD,14771.0,SFO,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD
7,0,1532654,0,2846,KD,21629.0,KD,Western Global,13930.0,ORD,10299.0,ANC,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD
8,0,0,0,0,KD,21629.0,KD,Western Global,13930.0,ORD,13930.0,ORD,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD
9,0,1594787,0,286,KD,21629.0,KD,Western Global,13930.0,ORD,14730.0,SDF,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD


In [151]:
pivot_df = pd.pivot_table(large_airport_stats_df, index=['ORIGIN', 'iso_region', 'name', 'municipality'], columns=['CARRIER_NAME'], values='PASSENGERS', aggfunc=sum, fill_value=0)
pivot_df.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,CARRIER_NAME,40-Mile Air,ABX Air Inc,"ADVANCED AIR, LLC",Aerodynamics Inc. d/b/a SkyValue d/b/a SkyValue Airways,Air Transport International,Air Wisconsin Airlines Corp,Alaska Airlines Inc.,Alaska Central Express,Allegiant Air,Aloha Air Cargo,...,USA Jet Airlines Inc.,Ultimate JetCharters LLC dba Ultimate Air Shuttle,United Air Lines Inc.,United Parcel Service,Via Airlines d/b/a Charter Air Transport,Virgin America,Warbelow,Western Global,Wright Air Service,XTRA Airways
ORIGIN,iso_region,name,municipality,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
ABQ,US-NM,Albuquerque International Sunport,Albuquerque,0,0,0,0,0,0,33682,0,28698,0,...,0,0,108961,0,0,0,0,0,0,15
ADW,US-MD,Andrews Air Force Base,Camp Springs,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
AFW,US-TX,Fort Worth Alliance Airport,Fort Worth,0,0,0,29,0,0,0,0,804,0,...,0,0,108,0,0,0,0,0,0,0
AGS,US-GA,Augusta Regional At Bush Field,Augusta,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,389
AMA,US-TX,Rick Husband Amarillo International Airport,Amarillo,0,0,13,1,0,0,0,0,0,0,...,0,0,0,0,910,0,0,0,0,0
ANC,US-AK,Ted Stevens Anchorage International Airport,Anchorage,4,0,0,0,0,0,1707632,2569,0,0,...,0,0,135994,0,0,0,0,0,0,0
ATL,US-GA,Hartsfield Jackson Atlanta International Airport,Atlanta,0,0,6,70,0,0,101508,0,528,0,...,0,0,546920,0,0,0,0,0,0,391
AUS,US-TX,Austin Bergstrom International Airport,Austin,0,0,0,0,24,0,207375,0,139113,0,...,0,0,919824,0,11487,17456,0,0,0,0
AVL,US-NC,Asheville Regional Airport,Asheville,0,0,0,0,0,0,0,0,212253,0,...,0,0,23273,0,0,0,0,0,0,0
BAB,US-CA,Beale Air Force Base,Marysville,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [131]:
len(pivot_df.columns)

99

In [132]:
imp_airlines = pivot_df.columns[pivot_df.apply(lambda x: sum(x) > 10 * 1000000, axis=0)]
imp_airlines

Index(['Alaska Airlines Inc.', 'American Airlines Inc.',
       'Delta Air Lines Inc.', 'Endeavor Air Inc.', 'Envoy Air',
       'Frontier Airlines Inc.', 'JetBlue Airways', 'Mesa Airlines Inc.',
       'PSA Airlines Inc.', 'Republic Airline', 'SkyWest Airlines Inc.',
       'Southwest Airlines Co.', 'Spirit Air Lines', 'United Air Lines Inc.'],
      dtype='object', name='CARRIER_NAME')

In [133]:
airlines_compare_df = pivot_df[imp_airlines]
airlines_compare_df.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,CARRIER_NAME,Alaska Airlines Inc.,American Airlines Inc.,Delta Air Lines Inc.,Endeavor Air Inc.,Envoy Air,Frontier Airlines Inc.,JetBlue Airways,Mesa Airlines Inc.,PSA Airlines Inc.,Republic Airline,SkyWest Airlines Inc.,Southwest Airlines Co.,Spirit Air Lines,United Air Lines Inc.
ORIGIN,iso_region,name,municipality,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABQ,US-NM,Albuquerque International Sunport,Albuquerque,33682,307425,191373,0,7301,54361,42836,178666,0,25454,146675,1282912,0,108961
ADW,US-MD,Andrews Air Force Base,Camp Springs,0,0,0,0,0,0,0,0,0,0,0,0,0,0
AFW,US-TX,Fort Worth Alliance Airport,Fort Worth,0,0,0,0,0,0,0,0,0,0,0,0,0,108
AGS,US-GA,Augusta Regional At Bush Field,Augusta,0,0,97318,54472,970,0,0,5,86523,1032,32411,0,0,0
AMA,US-TX,Rick Husband Amarillo International Airport,Amarillo,0,291,0,0,55964,0,0,77298,0,0,541,164082,0,0
ANC,US-AK,Ted Stevens Anchorage International Airport,Anchorage,1707632,37168,317423,0,0,0,24814,0,0,0,0,0,0,135994
ATL,US-GA,Hartsfield Jackson Atlanta International Airport,Atlanta,101508,1270635,33239588,1472242,40113,530481,367303,154716,59166,293761,1058641,4890261,1119869,546920
AUS,US-TX,Austin Bergstrom International Airport,Austin,207375,1321041,974791,21174,4055,524069,282721,87094,2314,65473,174239,2729373,0,919824
AVL,US-NC,Asheville Regional Airport,Asheville,0,180,57854,38273,0,0,0,0,117489,0,87677,0,12316,23273
BAB,US-CA,Beale Air Force Base,Marysville,0,0,0,0,22,0,0,0,0,0,0,0,0,0


In [134]:
airlines_compare_df.index

MultiIndex(levels=[['ABQ', 'ADW', 'AFW', 'AGS', 'AMA', 'ANC', 'ATL', 'AUS', 'AVL', 'BAB', 'BAD', 'BDL', 'BFI', 'BGR', 'BHM', 'BIL', 'BLV', 'BMI', 'BNA', 'BOI', 'BOS', 'BUF', 'BWI', 'CAE', 'CHA', 'CHS', 'CID', 'CLE', 'CLT', 'CMH', 'CPR', 'CRP', 'CRW', 'CVG', 'DAB', 'DAL', 'DAY', 'DBQ', 'DCA', 'DEN', 'DFW', 'DLH', 'DOV', 'DSM', 'DTW', 'DYS', 'EDW', 'ERI', 'EWR', 'FAI', 'FFO', 'FLL', 'FSM', 'FTW', 'FWA', 'GEG', 'GPT', 'GRB', 'GSB', 'GSO', 'GSP', 'GUS', 'HIB', 'HNL', 'HOU', 'HSV', 'HTS', 'IAD', 'IAH', 'ICT', 'IND', 'JAN', 'JAX', 'JFK', 'JLN', 'LAS', 'LAX', 'LBB', 'LCK', 'LEX', 'LFI', 'LFT', 'LGA', 'LIT', 'MBS', 'MCI', 'MCO', 'MDW', 'MEM', 'MGE', 'MGM', 'MIA', 'MKE', 'MLI', 'MLU', 'MOB', 'MSN', 'MSP', 'MSY', 'MUO', 'OAK', 'OKC', 'OMA', 'ONT', 'ORD', 'ORF', 'PAM', 'PBI', 'PDX', 'PHF', 'PHL', 'PHX', 'PIA', 'PIT', 'PWM', 'RDU', 'RFD', 'RIC', 'RNO', 'ROA', 'ROC', 'RST', 'RSW', 'SAN', 'SAT', 'SAV', 'SBN', 'SDF', 'SEA', 'SFB', 'SFO', 'SGF', 'SJC', 'SLC', 'SMF', 'SNA', 'SPI', 'SPS', 'SRQ', 'SSC', 

In [152]:
airlines_compare_df.loc['SFO', 'US-CA']

Unnamed: 0_level_0,CARRIER_NAME,Alaska Airlines Inc.,American Airlines Inc.,Delta Air Lines Inc.,Endeavor Air Inc.,Envoy Air,Frontier Airlines Inc.,JetBlue Airways,Mesa Airlines Inc.,PSA Airlines Inc.,Republic Airline,SkyWest Airlines Inc.,Southwest Airlines Co.,Spirit Air Lines,United Air Lines Inc.
name,municipality,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
San Francisco International Airport,San Francisco,2533361,1881172,2005324,0,0,160889,787705,0,0,0,1825789,1825005,0,8721866


In [154]:
final_airline_compare = airlines_compare_df.reorder_levels([1,3,0,2], axis=0)
final_airline_compare.head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,CARRIER_NAME,Alaska Airlines Inc.,American Airlines Inc.,Delta Air Lines Inc.,Endeavor Air Inc.,Envoy Air,Frontier Airlines Inc.,JetBlue Airways,Mesa Airlines Inc.,PSA Airlines Inc.,Republic Airline,SkyWest Airlines Inc.,Southwest Airlines Co.,Spirit Air Lines,United Air Lines Inc.
iso_region,municipality,ORIGIN,name,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
US-NM,Albuquerque,ABQ,Albuquerque International Sunport,33682,307425,191373,0,7301,54361,42836,178666,0,25454,146675,1282912,0,108961
US-MD,Camp Springs,ADW,Andrews Air Force Base,0,0,0,0,0,0,0,0,0,0,0,0,0,0
US-TX,Fort Worth,AFW,Fort Worth Alliance Airport,0,0,0,0,0,0,0,0,0,0,0,0,0,108
US-GA,Augusta,AGS,Augusta Regional At Bush Field,0,0,97318,54472,970,0,0,5,86523,1032,32411,0,0,0
US-TX,Amarillo,AMA,Rick Husband Amarillo International Airport,0,291,0,0,55964,0,0,77298,0,0,541,164082,0,0


In [156]:
final_airline_compare.sort_index(level=[0, 1, 2], inplace=True)
final_airline_compare.head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,CARRIER_NAME,Alaska Airlines Inc.,American Airlines Inc.,Delta Air Lines Inc.,Endeavor Air Inc.,Envoy Air,Frontier Airlines Inc.,JetBlue Airways,Mesa Airlines Inc.,PSA Airlines Inc.,Republic Airline,SkyWest Airlines Inc.,Southwest Airlines Co.,Spirit Air Lines,United Air Lines Inc.
iso_region,municipality,ORIGIN,name,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
US-AK,Anchorage,ANC,Ted Stevens Anchorage International Airport,1707632,37168,317423,0,0,0,24814,0,0,0,0,0,0,135994
US-AK,Fairbanks,FAI,Fairbanks International Airport,389930,0,66385,0,0,0,0,0,0,0,0,0,0,11096
US-AL,Birmingham,BHM,Birmingham-Shuttlesworth International Airport,0,136,392466,27377,58441,30553,0,102574,158490,15083,54236,440786,0,139
US-AL,Huntsville,HSV,Huntsville International Carl T Jones Field,0,0,227371,11674,8391,8915,0,77395,78984,30626,24495,0,0,0
US-AL,Mobile,MOB,Mobile Regional Airport,0,0,38394,17116,56,0,90,49057,49151,0,63620,0,0,0


In [158]:
final_airline_compare.reset_index(level=[2,3], inplace=True)
final_airline_compare.head(5)

Unnamed: 0_level_0,CARRIER_NAME,ORIGIN,name,Alaska Airlines Inc.,American Airlines Inc.,Delta Air Lines Inc.,Endeavor Air Inc.,Envoy Air,Frontier Airlines Inc.,JetBlue Airways,Mesa Airlines Inc.,PSA Airlines Inc.,Republic Airline,SkyWest Airlines Inc.,Southwest Airlines Co.,Spirit Air Lines,United Air Lines Inc.
iso_region,municipality,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
US-AK,Anchorage,ANC,Ted Stevens Anchorage International Airport,1707632,37168,317423,0,0,0,24814,0,0,0,0,0,0,135994
US-AK,Fairbanks,FAI,Fairbanks International Airport,389930,0,66385,0,0,0,0,0,0,0,0,0,0,11096
US-AL,Birmingham,BHM,Birmingham-Shuttlesworth International Airport,0,136,392466,27377,58441,30553,0,102574,158490,15083,54236,440786,0,139
US-AL,Huntsville,HSV,Huntsville International Carl T Jones Field,0,0,227371,11674,8391,8915,0,77395,78984,30626,24495,0,0,0
US-AL,Mobile,MOB,Mobile Regional Airport,0,0,38394,17116,56,0,90,49057,49151,0,63620,0,0,0


In [164]:
final_airline_compare.columns.name = None
final_airline_compare

Unnamed: 0_level_0,Unnamed: 1_level_0,ORIGIN,name,Alaska Airlines Inc.,American Airlines Inc.,Delta Air Lines Inc.,Endeavor Air Inc.,Envoy Air,Frontier Airlines Inc.,JetBlue Airways,Mesa Airlines Inc.,PSA Airlines Inc.,Republic Airline,SkyWest Airlines Inc.,Southwest Airlines Co.,Spirit Air Lines,United Air Lines Inc.
iso_region,municipality,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
US-AK,Anchorage,ANC,Ted Stevens Anchorage International Airport,1707632,37168,317423,0,0,0,24814,0,0,0,0,0,0,135994
US-AK,Fairbanks,FAI,Fairbanks International Airport,389930,0,66385,0,0,0,0,0,0,0,0,0,0,11096
US-AL,Birmingham,BHM,Birmingham-Shuttlesworth International Airport,0,136,392466,27377,58441,30553,0,102574,158490,15083,54236,440786,0,139
US-AL,Huntsville,HSV,Huntsville International Carl T Jones Field,0,0,227371,11674,8391,8915,0,77395,78984,30626,24495,0,0,0
US-AL,Mobile,MOB,Mobile Regional Airport,0,0,38394,17116,56,0,90,49057,49151,0,63620,0,0,0
US-AL,Montgomery,MGM,Montgomery Regional (Dannelly Field) Airport,0,0,309,35608,31549,0,0,7443,24360,0,58460,161,0,331
US-AR,Fort Smith,FSM,Fort Smith Regional Airport,0,0,0,10524,20391,0,0,29793,0,0,14101,0,0,0
US-AR,Little Rock,LIT,Bill & Hillary Clinton National Airport/Adams ...,0,2609,255105,1990,214986,22605,0,38044,63397,13154,43319,260645,0,0
US-AZ,Phoenix,PHX,Phoenix Sky Harbor International Airport,417780,7626835,1195311,0,50,341282,95466,1237966,0,0,649781,7502139,102855,1046681
US-AZ,Tucson,TUS,Tucson International Airport,58151,463881,138626,0,0,4766,0,154563,0,0,349718,482046,0,47189


In [162]:
final_airline_compare.loc['US-CA']

Unnamed: 0_level_0,ORIGIN,name,Alaska Airlines Inc.,American Airlines Inc.,Delta Air Lines Inc.,Endeavor Air Inc.,Envoy Air,Frontier Airlines Inc.,JetBlue Airways,Mesa Airlines Inc.,PSA Airlines Inc.,Republic Airline,SkyWest Airlines Inc.,Southwest Airlines Co.,Spirit Air Lines,United Air Lines Inc.
municipality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Edwards,EDW,Edwards Air Force Base,0,0,0,0,19,0,0,0,0,0,0,0,0,0
Fairfield,SUU,Travis Air Force Base,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Lompoc,VBG,Vandenberg Air Force Base,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Los Angeles,LAX,Los Angeles International Airport,2416490,5825125,5223407,0,0,179168,891197,0,0,0,1644517,4834977,1305133,4602766
Marysville,BAB,Beale Air Force Base,0,0,0,0,22,0,0,0,0,0,0,0,0,0
Oakland,OAK,Metropolitan Oakland International Airport,352758,132511,146234,0,0,0,188995,24821,0,0,30611,4516438,404116,2493
Ontario,ONT,Ontario International Airport,153162,375494,30931,0,0,120847,16060,42672,0,0,149958,1356857,0,127096
Sacramento,SMF,Sacramento International Airport,274500,560065,355610,0,0,21862,163098,5249,0,0,312504,3187500,0,441881
San Diego,SAN,San Diego International Airport,1105456,1338206,1220542,0,98,274772,253409,0,0,0,561375,4524041,338731,1489952
San Francisco,SFO,San Francisco International Airport,2533361,1881172,2005324,0,0,160889,787705,0,0,0,1825789,1825005,0,8721866


In [138]:
pivot_df2 = pd.pivot_table(large_airport_stats_df, index=['ORIGIN', 'iso_region', 'name', 'municipality'], columns=['CARRIER_NAME'], values=['PASSENGERS', 'FREIGHT'], aggfunc=sum, fill_value=0)
pivot_df2.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,FREIGHT,FREIGHT,FREIGHT,FREIGHT,FREIGHT,FREIGHT,FREIGHT,FREIGHT,FREIGHT,FREIGHT,...,PASSENGERS,PASSENGERS,PASSENGERS,PASSENGERS,PASSENGERS,PASSENGERS,PASSENGERS,PASSENGERS,PASSENGERS,PASSENGERS
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,CARRIER_NAME,40-Mile Air,ABX Air Inc,"ADVANCED AIR, LLC",Aerodynamics Inc. d/b/a SkyValue d/b/a SkyValue Airways,Air Transport International,Air Wisconsin Airlines Corp,Alaska Airlines Inc.,Alaska Central Express,Allegiant Air,Aloha Air Cargo,...,USA Jet Airlines Inc.,Ultimate JetCharters LLC dba Ultimate Air Shuttle,United Air Lines Inc.,United Parcel Service,Via Airlines d/b/a Charter Air Transport,Virgin America,Warbelow,Western Global,Wright Air Service,XTRA Airways
ORIGIN,iso_region,name,municipality,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2
ABQ,US-NM,Albuquerque International Sunport,Albuquerque,0,0,0,0,0,0,2,0,0,0,...,0,0,108961,0,0,0,0,0,0,15
ADW,US-MD,Andrews Air Force Base,Camp Springs,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
AFW,US-TX,Fort Worth Alliance Airport,Fort Worth,0,0,0,0,0,0,0,0,0,0,...,0,0,108,0,0,0,0,0,0,0
AGS,US-GA,Augusta Regional At Bush Field,Augusta,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,389
AMA,US-TX,Rick Husband Amarillo International Airport,Amarillo,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,910,0,0,0,0,0
ANC,US-AK,Ted Stevens Anchorage International Airport,Anchorage,100,0,0,0,106979,0,37119375,7162128,0,0,...,0,0,135994,0,0,0,0,0,0,0
ATL,US-GA,Hartsfield Jackson Atlanta International Airport,Atlanta,0,10142627,0,0,0,0,22228,0,0,0,...,0,0,546920,0,0,0,0,0,0,391
AUS,US-TX,Austin Bergstrom International Airport,Austin,0,11625,0,0,3191193,0,23947,0,0,0,...,0,0,919824,0,11487,17456,0,0,0,0
AVL,US-NC,Asheville Regional Airport,Asheville,0,0,0,0,0,0,0,0,0,0,...,0,0,23273,0,0,0,0,0,0,0
BAB,US-CA,Beale Air Force Base,Marysville,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [139]:
pivot_df2.columns

MultiIndex(levels=[['FREIGHT', 'PASSENGERS'], ['40-Mile Air', 'ABX Air Inc', 'ADVANCED AIR, LLC', 'Aerodynamics Inc. d/b/a SkyValue d/b/a SkyValue Airways', 'Air Transport International', 'Air Wisconsin Airlines Corp', 'Alaska Airlines Inc.', 'Alaska Central Express', 'Allegiant Air', 'Aloha Air Cargo', 'American Airlines Inc.', 'Amerijet International', 'Ameristar Air Cargo', 'Asia Pacific', 'Atlas Air Inc.', 'Avjet Corporation', 'Bemidji Airlines', 'Bering Air Inc.', 'Boutique Air', 'CFM Inc d/b/a Contour Airlines d/b/a One Jet Shuttle', 'Cape Air', 'Caribbean Sun Airlines, Inc. d/b/a World Atlantic Airlines', 'Commutair Aka Champlain Enterprises, Inc.', 'Compass Airlines', 'Corvus Airlines, Inc d/b/a Era Aviation d/b/a Ravn Alaska', 'Delta Air Lines Inc.', 'Delux Public Charter LLC', 'Eastern Airlines f/k/a Dynamic Airways, LLC', 'Elite Airways LLC', 'Empire Airlines Inc.', 'Endeavor Air Inc.', 'Envoy Air', 'ExpressJet Airlines Inc.', 'Federal Express Corporation', 'Frontier Airline

In [140]:
pivot_df2['FREIGHT', 'United Parcel Service']

ORIGIN  iso_region  name                                                          municipality             
ABQ     US-NM       Albuquerque International Sunport                             Albuquerque                    39087941
ADW     US-MD       Andrews Air Force Base                                        Camp Springs                          0
AFW     US-TX       Fort Worth Alliance Airport                                   Fort Worth                            0
AGS     US-GA       Augusta Regional At Bush Field                                Augusta                               0
AMA     US-TX       Rick Husband Amarillo International Airport                   Amarillo                              0
ANC     US-AK       Ted Stevens Anchorage International Airport                   Anchorage                     523298938
ATL     US-GA       Hartsfield Jackson Atlanta International Airport              Atlanta                        47303207
AUS     US-TX       Austin Bergstrom I

## Group By

Let's solve a couple of problems to helps us understand the `DataFrame`'s `groupby()` method by better.

### Problem 1

Find the passenger count and freight load for each route among the large airports in the US. Find the busiest routes for passengers and freights in the US.

In [176]:
large_route_stats = pd.merge(large_airport_stats_df, large_airports_df, left_on='DEST', right_on='local_code')
large_route_stats.head(10)

Unnamed: 0,PASSENGERS,FREIGHT,MAIL,DISTANCE,UNIQUE_CARRIER,AIRLINE_ID,CARRIER,CARRIER_NAME,ORIGIN_AIRPORT_ID,ORIGIN,...,YEAR,MONTH,iso_region_x,name_x,municipality_x,local_code_x,iso_region_y,name_y,municipality_y,local_code_y
0,0,1309768,0,2846,KAQ,20370.0,KAQ,Kalitta Air LLC,13930.0,ORD,...,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD,US-AK,Ted Stevens Anchorage International Airport,Anchorage,ANC
1,0,1532654,0,2846,KD,21629.0,KD,Western Global,13930.0,ORD,...,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD,US-AK,Ted Stevens Anchorage International Airport,Anchorage,ANC
2,0,1011342,0,2846,PO,20100.0,PO,Polar Air Cargo Airways,13930.0,ORD,...,2018,9,US-IL,Chicago O'Hare International Airport,Chicago,ORD,US-AK,Ted Stevens Anchorage International Airport,Anchorage,ANC
3,0,2687090,0,2846,5Y,20007.0,5Y,Atlas Air Inc.,13930.0,ORD,...,2018,1,US-IL,Chicago O'Hare International Airport,Chicago,ORD,US-AK,Ted Stevens Anchorage International Airport,Anchorage,ANC
4,0,1494827,0,2846,5Y,20007.0,5Y,Atlas Air Inc.,13930.0,ORD,...,2018,2,US-IL,Chicago O'Hare International Airport,Chicago,ORD,US-AK,Ted Stevens Anchorage International Airport,Anchorage,ANC
5,0,854986,0,2846,KAQ,20370.0,KAQ,Kalitta Air LLC,13930.0,ORD,...,2018,2,US-IL,Chicago O'Hare International Airport,Chicago,ORD,US-AK,Ted Stevens Anchorage International Airport,Anchorage,ANC
6,0,1335407,0,2846,KD,21629.0,KD,Western Global,13930.0,ORD,...,2018,5,US-IL,Chicago O'Hare International Airport,Chicago,ORD,US-AK,Ted Stevens Anchorage International Airport,Anchorage,ANC
7,0,1074212,0,2846,5Y,20007.0,5Y,Atlas Air Inc.,13930.0,ORD,...,2018,7,US-IL,Chicago O'Hare International Airport,Chicago,ORD,US-AK,Ted Stevens Anchorage International Airport,Anchorage,ANC
8,0,2113243,0,2846,KAQ,20370.0,KAQ,Kalitta Air LLC,13930.0,ORD,...,2018,8,US-IL,Chicago O'Hare International Airport,Chicago,ORD,US-AK,Ted Stevens Anchorage International Airport,Anchorage,ANC
9,0,1830741,0,2846,KD,21629.0,KD,Western Global,13930.0,ORD,...,2018,8,US-IL,Chicago O'Hare International Airport,Chicago,ORD,US-AK,Ted Stevens Anchorage International Airport,Anchorage,ANC


In [178]:
summary_stats = large_route_stats.groupby(['ORIGIN', 'municipality_x', 'DEST', 'municipality_y'])['PASSENGERS', 'FREIGHT'].sum()
summary_stats.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,PASSENGERS,FREIGHT
ORIGIN,municipality_x,DEST,municipality_y,Unnamed: 4_level_1,Unnamed: 5_level_1
ABQ,Albuquerque,ABQ,Albuquerque,0,1655
ABQ,Albuquerque,AFW,Fort Worth,227,0
ABQ,Albuquerque,AMA,Amarillo,22,171
ABQ,Albuquerque,ANC,Anchorage,0,0
ABQ,Albuquerque,ATL,Atlanta,132649,31513
ABQ,Albuquerque,AUS,Austin,45961,6360
ABQ,Albuquerque,BDL,Hartford,64,0
ABQ,Albuquerque,BHM,Birmingham,34,0
ABQ,Albuquerque,BNA,Nashville,1908,9669
ABQ,Albuquerque,BOI,Boise,306,33


In [181]:
summary_stats.sort_values(by='PASSENGERS', ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,PASSENGERS,FREIGHT
ORIGIN,municipality_x,DEST,municipality_y,Unnamed: 4_level_1,Unnamed: 5_level_1
LAX,Los Angeles,SFO,San Francisco,2000078,91402022
SFO,San Francisco,LAX,Los Angeles,1984416,36348631
LAX,Los Angeles,JFK,New York,1722418,27424062
JFK,New York,LAX,Los Angeles,1706587,24032840
LGA,New York,ORD,Chicago,1601599,308486
ORD,Chicago,LGA,New York,1597225,129267
LAS,Las Vegas,LAX,Los Angeles,1489330,1072182
LAX,Los Angeles,LAS,Las Vegas,1479529,2378981
ATL,Atlanta,MCO,Orlando,1467143,1884409
MCO,Orlando,ATL,Atlanta,1456746,2065948


### Problem 2

List all routes which are serviced by at least 5 different airlines. A route is called serviced by an airline if more than 1,000 passengers fly that airline on that route each month in a year.

In [206]:
def num_carriers_service(group_df):
    serviced_rows = group_df[group_df['PASSENGERS'] > 1000]
    unique_carriers = list(serviced_rows['CARRIER_NAME'].unique())
    return pd.Series([len(unique_carriers), str(unique_carriers)], index=['num_carriers', 'carriers'])

In [207]:
route_carriers = pd.DataFrame(large_route_stats.groupby(['ORIGIN', 'municipality_x', 'DEST', 'municipality_y']).apply(num_carriers_service), )
route_carriers.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,num_carriers,carriers
ORIGIN,municipality_x,DEST,municipality_y,Unnamed: 4_level_1,Unnamed: 5_level_1
ABQ,Albuquerque,ABQ,Albuquerque,0,[]
ABQ,Albuquerque,AFW,Fort Worth,0,[]
ABQ,Albuquerque,AMA,Amarillo,0,[]
ABQ,Albuquerque,ANC,Anchorage,0,[]
ABQ,Albuquerque,ATL,Atlanta,1,['Delta Air Lines Inc.']
ABQ,Albuquerque,AUS,Austin,3,"['Allegiant Air', 'Southwest Airlines Co.', 'F..."
ABQ,Albuquerque,BDL,Hartford,0,[]
ABQ,Albuquerque,BHM,Birmingham,0,[]
ABQ,Albuquerque,BNA,Nashville,0,[]
ABQ,Albuquerque,BOI,Boise,0,[]


In [209]:
route_carriers[route_carriers['num_carriers'] >= 5]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,num_carriers,carriers
ORIGIN,municipality_x,DEST,municipality_y,Unnamed: 4_level_1,Unnamed: 5_level_1
ABQ,Albuquerque,DEN,Denver,6,"['Trans States Airlines', 'Frontier Airlines I..."
ABQ,Albuquerque,IAH,Houston,5,"['ExpressJet Airlines Inc.', 'United Air Lines..."
ABQ,Albuquerque,ORD,Chicago,6,"['Republic Airline', 'GoJet Airlines LLC d/b/a..."
ATL,Atlanta,DCA,Washington,5,"['PSA Airlines Inc.', 'Envoy Air', 'Republic A..."
ATL,Atlanta,DEN,Denver,7,"['Republic Airline', 'SkyWest Airlines Inc.', ..."
ATL,Atlanta,IAH,Houston,7,"['Republic Airline', 'Endeavor Air Inc.', 'Sky..."
ATL,Atlanta,LAX,Los Angeles,5,"['Frontier Airlines Inc.', 'Spirit Air Lines',..."
ATL,Atlanta,LGA,New York,5,"['Frontier Airlines Inc.', 'American Airlines ..."
ATL,Atlanta,MCO,Orlando,5,"['Delta Air Lines Inc.', 'JetBlue Airways', 'F..."
ATL,Atlanta,ORD,Chicago,8,"['Republic Airline', 'GoJet Airlines LLC d/b/a..."
