Requirements

1. Input the two csv files
2. Union the files together
3. Convert the Date field to a Quarter Number instead. Name this field Quarter
4. Aggregate the data in the following ways:
    Median price per Quarter, Flow Card? and Class.
    Minimum price per Quarter, Flow Card? and Class.
    Maximum price per Quarter, Flow Card? and Class.
5. Create three separate flows where you have only one of the aggregated measures in each. 
    One for the minimum price.
    One for the median price.
    One for the maximum price.
6. Now pivot the data to have a column per class for each quarter and whether the passenger had a flow card or not
7. Union these flows back together

What's this you see??? Economy is the most expensive seats and first class is the cheapest? When you go and check with your manager you realise the original data has been incorrectly classified so you need to the names of these columns.

8. Change the name of the following columns:
    Economy to First.
    First Class to Economy.
    Business Class to Premium.
    Premium Economy to Business
9. Output the data

In [2]:
import pandas as pd

In [3]:
# Input the 2 csv files

flow_card_df = pd.read_csv('Preppin Data Inputs/PD 2024 Wk 1 Output Flow Card.csv')
non_flow_card_df = pd.read_csv('Preppin Data Inputs/PD 2024 Wk 1 Output Non-Flow Card.csv')

In [4]:
flow_card_df

Unnamed: 0,Date,Flight Number,From,To,Class,Price,Flow Card?,Bags Checked,Meal Type
0,22/07/2024,PA010,Tokyo,New York,Economy,2380.0,Yes,0,Egg Free
1,20/04/2024,PA002,New York,London,Economy,3490.0,Yes,1,Vegan
2,23/01/2024,PA010,Tokyo,New York,Premium Economy,825.0,Yes,1,Vegetarian
3,05/06/2024,PA006,Tokyo,London,First Class,618.0,Yes,3,Vegan
4,30/03/2024,PA004,Perth,London,First Class,446.0,Yes,1,Nut Free
...,...,...,...,...,...,...,...,...,...
1878,23/11/2024,PA005,London,Tokyo,Economy,2070.0,Yes,2,Egg Free
1879,04/11/2024,PA003,London,Perth,First Class,210.0,Yes,3,Nut Free
1880,29/04/2024,PA012,Tokyo,Perth,Economy,3490.0,Yes,0,Dairy Free
1881,26/09/2024,PA001,London,New York,First Class,207.0,Yes,2,Vegetarian


In [5]:
non_flow_card_df

Unnamed: 0,Date,Flight Number,From,To,Class,Price,Flow Card?,Bags Checked,Meal Type
0,28/09/2024,PA008,Perth,New York,Economy,1855.0,No,2,Vegetarian
1,01/10/2024,PA008,Perth,New York,Business Class,634.8,No,0,Vegetarian
2,04/03/2024,PA007,New York,Perth,Business Class,458.4,No,3,Nut Free
3,25/02/2024,PA010,Tokyo,New York,Premium Economy,1435.0,No,0,
4,29/03/2024,PA004,Perth,London,Economy,2730.0,No,2,Vegan
...,...,...,...,...,...,...,...,...,...
1890,06/03/2024,PA006,Tokyo,London,Premium Economy,940.0,No,2,Vegetarian
1891,05/05/2024,PA009,New York,Tokyo,Economy,1360.0,No,3,Nut Free
1892,14/06/2024,PA008,Perth,New York,First Class,245.0,No,1,Dairy Free
1893,16/01/2024,PA010,Tokyo,New York,Economy,2410.0,No,2,Egg Free


In [6]:
# Union the files together

union_df = pd.concat([flow_card_df,non_flow_card_df])

union_df

Unnamed: 0,Date,Flight Number,From,To,Class,Price,Flow Card?,Bags Checked,Meal Type
0,22/07/2024,PA010,Tokyo,New York,Economy,2380.0,Yes,0,Egg Free
1,20/04/2024,PA002,New York,London,Economy,3490.0,Yes,1,Vegan
2,23/01/2024,PA010,Tokyo,New York,Premium Economy,825.0,Yes,1,Vegetarian
3,05/06/2024,PA006,Tokyo,London,First Class,618.0,Yes,3,Vegan
4,30/03/2024,PA004,Perth,London,First Class,446.0,Yes,1,Nut Free
...,...,...,...,...,...,...,...,...,...
1890,06/03/2024,PA006,Tokyo,London,Premium Economy,940.0,No,2,Vegetarian
1891,05/05/2024,PA009,New York,Tokyo,Economy,1360.0,No,3,Nut Free
1892,14/06/2024,PA008,Perth,New York,First Class,245.0,No,1,Dairy Free
1893,16/01/2024,PA010,Tokyo,New York,Economy,2410.0,No,2,Egg Free


In [7]:
union_df['Quarter'] = pd.to_datetime(union_df['Date'], format='%d/%m/%Y').dt.quarter

In [8]:
union_df

Unnamed: 0,Date,Flight Number,From,To,Class,Price,Flow Card?,Bags Checked,Meal Type,Quarter
0,22/07/2024,PA010,Tokyo,New York,Economy,2380.0,Yes,0,Egg Free,3
1,20/04/2024,PA002,New York,London,Economy,3490.0,Yes,1,Vegan,2
2,23/01/2024,PA010,Tokyo,New York,Premium Economy,825.0,Yes,1,Vegetarian,1
3,05/06/2024,PA006,Tokyo,London,First Class,618.0,Yes,3,Vegan,2
4,30/03/2024,PA004,Perth,London,First Class,446.0,Yes,1,Nut Free,1
...,...,...,...,...,...,...,...,...,...,...
1890,06/03/2024,PA006,Tokyo,London,Premium Economy,940.0,No,2,Vegetarian,1
1891,05/05/2024,PA009,New York,Tokyo,Economy,1360.0,No,3,Nut Free,2
1892,14/06/2024,PA008,Perth,New York,First Class,245.0,No,1,Dairy Free,2
1893,16/01/2024,PA010,Tokyo,New York,Economy,2410.0,No,2,Egg Free,1


In [9]:
# Aggregate into 3 seprate flows - Max price

max_price = union_df[['Quarter','Flow Card?','Class','Price']]

max_price.groupby(['Quarter','Flow Card?', 'Class']).max('Price').reset_index()

Unnamed: 0,Quarter,Flow Card?,Class,Price
0,1,No,Business Class,834.0
1,1,No,Economy,3455.0
2,1,No,First Class,699.0
3,1,No,Premium Economy,1702.5
4,1,Yes,Business Class,840.0
5,1,Yes,Economy,3500.0
6,1,Yes,First Class,698.0
7,1,Yes,Premium Economy,1737.5
8,2,No,Business Class,828.0
9,2,No,Economy,3480.0


In [10]:
# Aggregate into 3 seprate flows - Min price

min_price = union_df[['Quarter','Flow Card?','Class','Price']]

min_price.groupby(['Quarter','Flow Card?', 'Class']).min('Price').reset_index()

Unnamed: 0,Quarter,Flow Card?,Class,Price
0,1,No,Business Class,241.2
1,1,No,Economy,1030.0
2,1,No,First Class,204.0
3,1,No,Premium Economy,515.0
4,1,Yes,Business Class,249.6
5,1,Yes,Economy,1020.0
6,1,Yes,First Class,201.0
7,1,Yes,Premium Economy,502.5
8,2,No,Business Class,240.0
9,2,No,Economy,1000.0


In [11]:
# Aggregate into 3 seprate flows - Median price

median_price = union_df[['Quarter','Flow Card?','Class','Price']]

median_price.groupby(['Quarter','Flow Card?', 'Class']).median('Price').reset_index()

Unnamed: 0,Quarter,Flow Card?,Class,Price
0,1,No,Business Class,574.8
1,1,No,Economy,2340.0
2,1,No,First Class,438.0
3,1,No,Premium Economy,1075.0
4,1,Yes,Business Class,523.2
5,1,Yes,Economy,2325.0
6,1,Yes,First Class,447.5
7,1,Yes,Premium Economy,1160.0
8,2,No,Business Class,553.8
9,2,No,Economy,2325.0


In [12]:
# Pivot the data to have a column per class for each quarter and flow card, then union them together.

median_price = median_price.pivot_table(index=['Quarter','Flow Card?'], columns='Class', values='Price', aggfunc='median').reset_index()

In [13]:
median_price

Class,Quarter,Flow Card?,Business Class,Economy,First Class,Premium Economy
0,1,No,574.8,2340.0,438.0,1075.0
1,1,Yes,523.2,2325.0,447.5,1160.0
2,2,No,553.8,2325.0,445.0,1205.0
3,2,Yes,517.8,2290.0,459.0,1071.25
4,3,No,490.8,2285.0,487.0,1125.0
5,3,Yes,553.8,2347.5,457.0,1090.0
6,4,No,555.6,2202.5,428.0,1062.5
7,4,Yes,522.6,2212.5,424.0,1108.75


In [14]:
max_price = max_price.pivot_table(index=['Quarter','Flow Card?'], columns='Class', values='Price', aggfunc='max').reset_index()

In [15]:
max_price

Class,Quarter,Flow Card?,Business Class,Economy,First Class,Premium Economy
0,1,No,834.0,3455.0,699.0,1702.5
1,1,Yes,840.0,3500.0,698.0,1737.5
2,2,No,828.0,3480.0,694.0,1745.0
3,2,Yes,840.0,3490.0,696.0,1737.5
4,3,No,838.8,3475.0,691.0,1747.5
5,3,Yes,840.0,3495.0,697.0,1750.0
6,4,No,835.2,3465.0,698.0,1730.0
7,4,Yes,834.0,3460.0,697.0,1722.5


In [16]:
min_price = min_price.pivot_table(index=['Quarter','Flow Card?'], columns='Class', values='Price', aggfunc='min').reset_index()

In [17]:
min_price

Class,Quarter,Flow Card?,Business Class,Economy,First Class,Premium Economy
0,1,No,241.2,1030.0,204.0,515.0
1,1,Yes,249.6,1020.0,201.0,502.5
2,2,No,240.0,1000.0,202.0,507.5
3,2,Yes,240.0,1020.0,200.0,500.0
4,3,No,240.0,1000.0,201.0,517.5
5,3,Yes,241.2,1005.0,206.0,502.5
6,4,No,240.0,1015.0,200.0,510.0
7,4,Yes,249.6,1030.0,205.0,505.0


In [18]:
union_df2 = pd.concat([min_price,max_price,median_price])

In [19]:
union_df2

Class,Quarter,Flow Card?,Business Class,Economy,First Class,Premium Economy
0,1,No,241.2,1030.0,204.0,515.0
1,1,Yes,249.6,1020.0,201.0,502.5
2,2,No,240.0,1000.0,202.0,507.5
3,2,Yes,240.0,1020.0,200.0,500.0
4,3,No,240.0,1000.0,201.0,517.5
5,3,Yes,241.2,1005.0,206.0,502.5
6,4,No,240.0,1015.0,200.0,510.0
7,4,Yes,249.6,1030.0,205.0,505.0
0,1,No,834.0,3455.0,699.0,1702.5
1,1,Yes,840.0,3500.0,698.0,1737.5


In [20]:
# Renaming columns 

union_df2 = union_df2.rename(columns= {'Economy':'First Class', 'First Class':'Economy', 'Business Class':'Premium Economy', 'Premium Economy':'Business Class'})

In [21]:
union_df2

Class,Quarter,Flow Card?,Premium Economy,First Class,Economy,Business Class
0,1,No,241.2,1030.0,204.0,515.0
1,1,Yes,249.6,1020.0,201.0,502.5
2,2,No,240.0,1000.0,202.0,507.5
3,2,Yes,240.0,1020.0,200.0,500.0
4,3,No,240.0,1000.0,201.0,517.5
5,3,Yes,241.2,1005.0,206.0,502.5
6,4,No,240.0,1015.0,200.0,510.0
7,4,Yes,249.6,1030.0,205.0,505.0
0,1,No,834.0,3455.0,699.0,1702.5
1,1,Yes,840.0,3500.0,698.0,1737.5


In [22]:
# Output Data

union_df2.to_csv('Preppin Data Outputs/pd2024wk2_output.csv', index=False)