2024: Week 4 - Unpopular Seats
January 24, 2024
 Created by: Carl Allchin

Last week you needed to use a Join technique to pair the flight data with the sales targets. This week you'll be using Joins again but this time in a different way. 

When using Joins, there are two things you need to set up:

Join Condition - what logic will join similar rows of data together from each data set
Join Type - determines what data you will bring back based on the Join Condition
This challenge will test using join types to return the data you require for the output.

This week we are trying to understand which seats aren't chosen on our planes as we're thinking of applying fees for customers to choose their seat when booking. 

Input

Seat allocation per customer. We have had three downloads sent to us, one for the Flow Card customers and two where they aren't Flow Card customers. They're available here.

![input](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgZ6DfzaA8gyIbr80Kg_I2y9drhX5BjEnCLBIECm2Vphrs-_ccY3r5kbBM9Jgq5eCPMGjSCwK_1Lyz8bAVtMLttG4gI3BhkLRj2oRzQsT2WHc5LffmvOt_0iuZxGhdoF1bXJYKnhtmKV1MSpJZdFsww3dR73hRs2ZWEN_YImhaQPTm2MhngxuSNdMJjL8p/s758/Screenshot%202024-01-22%20at%2011.19.19.png)


The seating plan for our planes

![input](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnM3ZWHxG9OKjYpft24xviflJs5f3xf3u2hFNHfsIPr4tGWixLrt8TJFv350TRb85rfAMgWmFeQnCcJU6NSYVrx3cNY0AZnQpI2aV1M-FPiCl2thDlMp1ljhM51Y4EjXE9l5Zw1tOrtS-t8YA8LCctvkBsLvIS-UDGOc4VuXDaQknUKKZptygBZ69DVxfY/s614/Screenshot%202024-01-22%20at%2011.19.30.png)

Requirements
- Input the Excel workbook containing the four worksheets of data
- Union the Flow Card and Non-Flow card data sets together
- Create a data field to show whether the seat was booked by someone with the Flow Card or not
- Call this field 'Flow Card?'
- Aggregate the Seat Bookings to count how many bookings there are for:
- Each Seat
- In each Row
- In each Class
- For Flow and Non-Flow Card holders
- Join on the Seating Plan data to ensure you have a data set for every seat on the plane, even if it hasn't been book
- Only return the records for the seats that haven't been booked
- Output the data set showing what seat, rows and class have NOT been booked


Output


![output](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFLBOh5liCZcy12GMMhpwVEcArqQ8yIlz6fLPe2Hl_c7TMN3sqzdCcaIbOZEdKIl1aG0GwLcRStJv4kfYNfE7_Gv7BjN3taq1BR2VdEDY6xg_CX96ccc7gCZo2AAxu269UiyLRjTgx9DrybejYn1o0Q0NEXT8hlcT2FHu2uxkwpFY9ctZ5EzWlFcTDvU7U/s292/Screenshot%202024-01-22%20at%2014.41.59.png)


3 data fields:
- Class
- Seat
- Row

In [41]:
import pandas as pd

# Read the Excel file
excel_file = 'PD 2024 WK 4 Input.xlsx'
df = pd.ExcelFile(excel_file)

# List all sheet names
sheet_names = df.sheet_names
print(sheet_names)

['Flow Card', 'Non_flow Card', 'Non_flow Card2', 'Seat Plan']


In [42]:
# Read the first three sheets into dataframes
df_flow_card = pd.read_excel(excel_file, sheet_name='Flow Card')
df_non_flow_card1 = pd.read_excel(excel_file, sheet_name='Non_flow Card')
df_non_flow_card2 = pd.read_excel(excel_file, sheet_name='Non_flow Card2')

# Add a column to indicate Flow Card status
df_flow_card['Flow Card?'] = 'Yes'
df_non_flow_card1['Flow Card?'] = 'No'
df_non_flow_card2['Flow Card?'] = 'No'

# Concatenate the dataframes
df_union = pd.concat([df_flow_card, df_non_flow_card1, df_non_flow_card2], ignore_index=True)
print(df_union)

       CustomerID  Seat  Row Class Flow Card?
0             654     2    2    FC        Yes
1             466     4    5    FC        Yes
2              27     4    3    FC        Yes
3             519     1    4    FC        Yes
4             933     2    3    FC        Yes
...           ...   ...  ...   ...        ...
29208        3005     7   35     E         No
29209        4685     4   27     E         No
29210        2512     8   38     E         No
29211        3863     4   37     E         No
29212        2872     8   32     E         No

[29213 rows x 5 columns]


In [43]:
# Group by Seat, Row, Class, and Flow Card? and count the number of CustomerID in each group
df_grouped = df_union.groupby(['Seat', 'Row', 'Class', 'Flow Card?']).size().reset_index(name='Total Customers')
print(df_grouped)

     Seat  Row Class Flow Card?  Total Customers
0       1    1    FC         No               38
1       1    1    FC        Yes               20
2       1    2    FC         No               39
3       1    2    FC        Yes               20
4       1    3    FC         No               38
..    ...  ...   ...        ...              ...
573    10   40     E        Yes               37
574    10   41     E         No               68
575    10   41     E        Yes               32
576    10   42     E         No               90
577    10   42     E        Yes               29

[578 rows x 5 columns]


In [44]:
df_seat_plan = pd.read_excel(excel_file, sheet_name='Seat Plan')
print(df_seat_plan)

    Class  Seat  Row
0      FC     1    1
1      FC     2    1
2      FC     3    1
3      FC     4    1
4      FC     1    2
..    ...   ...  ...
291     E     6   42
292     E     7   42
293     E     8   42
294     E     9   42
295     E    10   42

[296 rows x 3 columns]


In [None]:
# Perform a left join
df_merged = pd.merge(df_seat_plan, df_grouped, on=['Seat', 'Row', 'Class'], how='left', indicator=True)
print(df_merged)



    Class  Seat  Row Flow Card?  Total Customers _merge
0      FC     1    1         No             38.0   both
1      FC     1    1        Yes             20.0   both
2      FC     2    1         No             40.0   both
3      FC     2    1        Yes             19.0   both
4      FC     3    1         No             51.0   both
..    ...   ...  ...        ...              ...    ...
580     E     8   42        Yes             51.0   both
581     E     9   42         No             81.0   both
582     E     9   42        Yes             36.0   both
583     E    10   42         No             90.0   both
584     E    10   42        Yes             29.0   both

[585 rows x 6 columns]


In [46]:
# Filter to keep only the records which are not in df_grouped
output = df_merged[df_merged['_merge'] == 'left_only'].drop(columns=['_merge', 'Total Customers', 'Flow Card?'])
output.reset_index(drop=True, inplace=True)

print(output)

  Class  Seat  Row
0     E     5   28
1     E     6   32
2     E     5   36
3     E     6   37
4     E     5   40
5     E     6   40
6     E     5   41
