# 2024: Week 9 - Prep Air Capacity

February 28, 2024

Challenge by: Jenny Martin

Prep Air would like to do some analysis on how their flights are filling up over time. They've given a small sample of flights that will be taking off next month, and the actions that customers who have booked those flights have been taking. 

### Inputs
1. A customer actions table in which a new row appears each time a customer takes an action relating to their flight booking 

![1](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggYmA90jKbzcPL8KncgH67F9RTIffep6QaLRKO5aRqCcmJRnY8OPUjYrUYAc1ENkzgpWYCbvZdcFuigdyfu4Un-VkP9teap3JRHXDKbIdrXucF1Kei66ruUJrUwgxFr4bP-bYk2PZaBo0Srg0zvwiRjTgfOBSsH_f9ewmW-iE6igg7DHZIYfgg-_N_APh5/s1227/24W9%20in1.png)

2. A flight details table detailing how many seats are available for each class on the flight 

![2](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDxI6LTB3G3Vd4XwtBacE8Uwy2atYEN4-fFy3OEW-6qAqNqCGfv_MBFjEJIvWnAL2WHi33XYuCJfhvviWw_xRp4z1IEXt2S-9iyUGxpUHLmvj6jPBi61X5vyhm8WoXDjiFYqGXvZWHBtxpZOmPBznyndg4pcUA9W5IXR2swR7eKRXSXti4wbYtSGH_GXAF/s706/24w9%20in2.png)

### Requirements
- Input the data
- If the customer has cancelled their flight, make sure all rows are filtered out for that flight 
- For each customer on a flight, filter the dataset to their most recent action
- Based on the Date field, create a field which shows how many seats in total have been booked as of that date for each flight and class
- Hint: Running Sum could be useful here!
- Bring in information about the Flight Details
- Calculate the Capacity %: of the available seats on the flight for each class, what percentage have been booked so far
- For classes which are yet to be booked for a flight, ensure the Capacity % shows as 0% for these rows
- The Date for these rows should be today's date (28/02/2024) 
- Output the data

### Output

![3](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZgT3JrE_2vvgUhiowYlwCvtcGKEqQ-OyKfcF9OvYp-oXTw-D6XEQqErMVJ6jxrlWxIeAaoagCuKT6HCAVhvWmq8OF71hXXfJxIAWzhgMM8hj5j6Eq7O_jCl812RHpT9VeV7UzOYsYf30bp24-5-LW9Wqo9KH3uERZYZyg3WUmuL-1d5z82qsz6EcyAq-Z/s1107/24w9%20out.png)

- 11 fields
- Flight Number
- Flight Date
- Class
- Total Seats booked over time
- Capacity
- Capacity %
- Customer ID
- Action
- Date
- Row
- Seat
- 500 rows (501 including headers)

In [203]:
import pandas as pd

# Read the Excel file
excel_file = 'PD 2024 Week 9 Input.xlsx'

# List all sheet names
sheet_names = pd.ExcelFile(excel_file).sheet_names
print(sheet_names)

['Customer Actions', 'Flight Details']


In [204]:
# Read the 'Customer Actions' sheet
df = pd.read_excel(excel_file, sheet_name='Customer Actions')
print(df)
check_df = df

     Flight Number Flight Date  Customer ID        Action       Date  \
0            PA001  2024-03-05           20        Booked 2023-12-01   
1            PA001  2024-03-05           20     Cancelled 2023-12-24   
2            PA001  2024-03-05           72        Booked 2023-12-25   
3            PA001  2024-03-05           82        Booked 2024-01-31   
4            PA001  2024-03-05          190        Booked 2024-01-07   
...            ...         ...          ...           ...        ...   
1694         PA012  2024-03-10         9547     Cancelled 2023-12-23   
1695         PA012  2024-03-10         9956        Booked 2024-02-07   
1696         PA012  2024-03-10         9957        Booked 2024-01-29   
1697         PA012  2024-03-10         9957  Seat Changed 2024-02-07   
1698         PA012  2024-03-10         9957  Seat Changed 2024-03-03   

         Class   Row  Seat  
0        First   7.0   3.0  
1          NaN   NaN   NaN  
2        First   8.0   2.0  
3        First   5.

In [205]:
# Read the 'Flight Details' sheet
df2 = pd.read_excel(excel_file, sheet_name='Flight Details')
print(df2)

   Flight Number Flight Date            Class  Capacity
0          PA001  2024-03-05            First        32
1          PA001  2024-03-05         Business        40
2          PA001  2024-03-05  Premium Economy        64
3          PA001  2024-03-05          Economy       160
4          PA002  2024-03-30            First        32
5          PA002  2024-03-30         Business        40
6          PA002  2024-03-30  Premium Economy        64
7          PA002  2024-03-30          Economy       160
8          PA003  2024-03-06            First        32
9          PA003  2024-03-06         Business        40
10         PA003  2024-03-06  Premium Economy        64
11         PA003  2024-03-06          Economy       160
12         PA004  2024-03-17            First        32
13         PA004  2024-03-17         Business        40
14         PA004  2024-03-17  Premium Economy        64
15         PA004  2024-03-17          Economy       160
16         PA005  2024-03-22            First   

In [206]:
cancelled_flights_df = check_df[check_df['Action'] == 'Cancelled'][['Flight Number', 'Customer ID']]
print(cancelled_flights_df)

     Flight Number  Customer ID
1            PA001           20
11           PA001          253
14           PA001          324
16           PA001          326
20           PA001          903
...            ...          ...
1677         PA012         9093
1684         PA012         9336
1688         PA012         9433
1692         PA012         9493
1694         PA012         9547

[204 rows x 2 columns]


In [207]:
# Merge df with cancelled_flights_df to identify cancelled flights
merged_cancelled_df = pd.merge(df, cancelled_flights_df, on=['Flight Number', 'Customer ID'], how='left', indicator=True)

# Filter out the cancelled flights
df = merged_cancelled_df[merged_cancelled_df['_merge'] == 'left_only'].drop(columns=['_merge'])

print(df)

     Flight Number Flight Date  Customer ID        Action       Date  \
2            PA001  2024-03-05           72        Booked 2023-12-25   
3            PA001  2024-03-05           82        Booked 2024-01-31   
4            PA001  2024-03-05          190        Booked 2024-01-07   
5            PA001  2024-03-05          190      Upgraded 2024-01-31   
6            PA001  2024-03-05          190  Seat Changed 2024-02-28   
...            ...         ...          ...           ...        ...   
1681         PA012  2024-03-10         9157  Seat Changed 2024-03-06   
1695         PA012  2024-03-10         9956        Booked 2024-02-07   
1696         PA012  2024-03-10         9957        Booked 2024-01-29   
1697         PA012  2024-03-10         9957  Seat Changed 2024-02-07   
1698         PA012  2024-03-10         9957  Seat Changed 2024-03-03   

         Class   Row  Seat  
2        First   8.0   2.0  
3        First   5.0   2.0  
4     Business  12.0   3.0  
5        First   5.

In [208]:
# Filter the dataset to the most recent action for each customer on a flight
df = df.loc[df.groupby(['Flight Number', 'Customer ID'])['Date'].idxmax()]
print(df)

     Flight Number Flight Date  Customer ID        Action       Date  \
2            PA001  2024-03-05           72        Booked 2023-12-25   
3            PA001  2024-03-05           82        Booked 2024-01-31   
6            PA001  2024-03-05          190  Seat Changed 2024-02-28   
8            PA001  2024-03-05          228      Upgraded 2024-01-02   
18           PA001  2024-03-05          330  Seat Changed 2024-02-13   
...            ...         ...          ...           ...        ...   
1674         PA012  2024-03-10         8779        Booked 2024-01-06   
1678         PA012  2024-03-10         9109        Booked 2023-12-14   
1681         PA012  2024-03-10         9157  Seat Changed 2024-03-06   
1695         PA012  2024-03-10         9956        Booked 2024-02-07   
1698         PA012  2024-03-10         9957  Seat Changed 2024-03-03   

                Class   Row  Seat  
2               First   8.0   2.0  
3               First   5.0   2.0  
6               First   3.0

In [209]:
# Sort the dataframe by 'Flight Number', 'Class', and 'Date'
df = df.sort_values(by=['Flight Number', 'Class', 'Date'])

# Calculate the running sum of booked seats for each flight and class
df['Total Seats Booked'] = df.groupby(['Flight Number', 'Class'])['Customer ID'].cumcount() + 1

print(df)

     Flight Number Flight Date  Customer ID        Action       Date  \
22           PA001  2024-03-05         1027  Seat Changed 2023-12-12   
52           PA001  2024-03-05         3632  Seat Changed 2023-12-29   
42           PA001  2024-03-05         3111  Seat Changed 2023-12-31   
77           PA001  2024-03-05         4963  Seat Changed 2024-01-04   
148          PA001  2024-03-05         8518        Booked 2024-01-18   
...            ...         ...          ...           ...        ...   
1527         PA012  2024-03-10         1374        Booked 2024-02-03   
1583         PA012  2024-03-10         4721  Seat Changed 2024-02-15   
1673         PA012  2024-03-10         8720  Seat Changed 2024-02-23   
1633         PA012  2024-03-10         6692  Seat Changed 2024-02-29   
1635         PA012  2024-03-10         6781  Seat Changed 2024-02-29   

                Class   Row  Seat  Total Seats Booked  
22           Business  16.0   4.0                   1  
52           Business  

In [210]:
# Merge df and df2 on 'Flight Number' and 'Class'
merged_df = pd.merge(df, df2, on=['Flight Number', 'Class', 'Flight Date'], how='left')

print(merged_df)

    Flight Number Flight Date  Customer ID        Action       Date  \
0           PA001  2024-03-05         1027  Seat Changed 2023-12-12   
1           PA001  2024-03-05         3632  Seat Changed 2023-12-29   
2           PA001  2024-03-05         3111  Seat Changed 2023-12-31   
3           PA001  2024-03-05         4963  Seat Changed 2024-01-04   
4           PA001  2024-03-05         8518        Booked 2024-01-18   
..            ...         ...          ...           ...        ...   
486         PA012  2024-03-10         1374        Booked 2024-02-03   
487         PA012  2024-03-10         4721  Seat Changed 2024-02-15   
488         PA012  2024-03-10         8720  Seat Changed 2024-02-23   
489         PA012  2024-03-10         6692  Seat Changed 2024-02-29   
490         PA012  2024-03-10         6781  Seat Changed 2024-02-29   

               Class   Row  Seat  Total Seats Booked  Capacity  
0           Business  16.0   4.0                   1        40  
1           Busin

In [211]:
# Calculate the Capacity % for each row
merged_df['Capacity %'] = merged_df['Total Seats Booked'] / merged_df['Capacity']
print(merged_df)

    Flight Number Flight Date  Customer ID        Action       Date  \
0           PA001  2024-03-05         1027  Seat Changed 2023-12-12   
1           PA001  2024-03-05         3632  Seat Changed 2023-12-29   
2           PA001  2024-03-05         3111  Seat Changed 2023-12-31   
3           PA001  2024-03-05         4963  Seat Changed 2024-01-04   
4           PA001  2024-03-05         8518        Booked 2024-01-18   
..            ...         ...          ...           ...        ...   
486         PA012  2024-03-10         1374        Booked 2024-02-03   
487         PA012  2024-03-10         4721  Seat Changed 2024-02-15   
488         PA012  2024-03-10         8720  Seat Changed 2024-02-23   
489         PA012  2024-03-10         6692  Seat Changed 2024-02-29   
490         PA012  2024-03-10         6781  Seat Changed 2024-02-29   

               Class   Row  Seat  Total Seats Booked  Capacity  Capacity %  
0           Business  16.0   4.0                   1        40    0.02

In [212]:
# Perform a right join with df2 to include all flight numbers and classes
result_df = pd.merge(merged_df, df2, on=['Flight Number', 'Class', 'Flight Date', 'Capacity'], how='right')

# Fill NaN values in 'Total Seats Booked' and 'Capacity %' with 0
result_df['Total Seats Booked'] = result_df['Total Seats Booked'].fillna(0)
result_df['Capacity %'] = result_df['Capacity %'].fillna(0)

print(result_df)

    Flight Number Flight Date  Customer ID        Action       Date  \
0           PA001  2024-03-05       7019.0        Booked 2023-12-03   
1           PA001  2024-03-05       9481.0      Upgraded 2023-12-19   
2           PA001  2024-03-05       7961.0        Booked 2023-12-22   
3           PA001  2024-03-05       9695.0        Booked 2023-12-23   
4           PA001  2024-03-05         72.0        Booked 2023-12-25   
..            ...         ...          ...           ...        ...   
495         PA012  2024-03-10       4721.0  Seat Changed 2024-02-15   
496         PA012  2024-03-10       8720.0  Seat Changed 2024-02-23   
497         PA012  2024-03-10       6692.0  Seat Changed 2024-02-29   
498         PA012  2024-03-10       6781.0  Seat Changed 2024-02-29   
499         PA012  2024-03-10          NaN           NaN        NaT   

               Class   Row  Seat  Total Seats Booked  Capacity  Capacity %  
0              First   3.0   3.0                 1.0        32    0.03

In [213]:
output = result_df