## Testing Search Problem Algorithm
In this notebook, we will test for different scenarios using the custom defined classes and solver algorithm in `search_problem.py`.
We will set up the search problem varying the fiollowing parameters:
* n_runway
* max_delay
* duration of disruption (by chaning the paramter `resume_hour` and/or `resume_min`)
* Number of flights to consider (by changing the df passed into the `search_pronble` class)

By default, the `resschedule_problem` class defines the problem with the following parameters:
* n_runway = 1
* disruption_dur = 60 (in minutes)
* duration of time slot = 5 (in minutes)
* Maximum delay permissible before a flight has to be diverted = 120


In [1]:
# import the packages
import pandas as pd 
from search_problem import *

  from pandas.core.computation.check import NUMEXPR_INSTALLED


In [2]:
df = pd.read_csv("./data/21DEC2023_AMS_processed.csv", parse_dates = ['time_sch','time_act'])
df = df.sort_values("time_sch").reset_index(drop = True)
df.head()

Unnamed: 0,time_sch,time_act,code,dest,stat,orig,pass_load,time_diff
0,2023-12-21 00:10:00,2023-12-21 01:19:00,HV 6888 Transavia,Amsterdam,BAGGAGE HANDLED,Reykjavik (KEF),383,4140.0
1,2023-12-21 00:25:00,2023-12-21 00:25:00,HV 5336 Transavia,Amsterdam,BAGGAGE HANDLED,Sharm El Sheikh (SSH),364,0.0
2,2023-12-21 01:00:00,2023-12-21 01:00:00,HV 6676 Transavia,Amsterdam,BAGGAGE HANDLED,Tenerife (TFS),319,0.0
3,2023-12-21 05:45:00,2023-12-21 05:45:00,KL 590 KLM,Amsterdam,BAGGAGE HANDLED,Accra (ACC),238,0.0
4,2023-12-21 05:50:00,2023-12-21 06:44:00,KL 810 KLM,Amsterdam,BAGGAGE HANDLED,Jakarta (CGK),327,3240.0


## Test 1: Uniform Cost Search
In this first case, we will consider:
* a dataframe of size 20
* n_runway = 2
* duration of disruption = 60

We set up the problem using the `reschedule_problem` class which defines the path cost as the time delayed of the flight scheduled at the child node.

In [3]:
# subset the tail of the dataframe
df_subset = df.tail(20).reset_index(drop = True)
df_subset.head()

Unnamed: 0,time_sch,time_act,code,dest,stat,orig,pass_load,time_diff
0,2023-12-21 22:30:00,2023-12-21 22:30:00,OR 3802 TUI fly,Amsterdam,BAGGAGE HANDLED,Tenerife (TFS),236,0.0
1,2023-12-21 22:35:00,2023-12-21 22:35:00,LH 2310 Lufthansa,Amsterdam,CANCELLED,Munich (MUC),339,0.0
2,2023-12-21 22:35:00,2023-12-21 23:37:00,KL 1032 KLM,Amsterdam,BAGGAGE HANDLED,London Heathrow (LHR),344,3720.0
3,2023-12-21 22:40:00,2023-12-22 01:10:00,AZ 118 ITA Airways,Amsterdam,BAGGAGE ON BELT,Milan Linate (LIN),346,-77400.0
4,2023-12-21 23:00:00,2023-12-21 23:00:00,KL 1706 KLM,Amsterdam,CANCELLED,Madrid (MAD),316,0.0


In [4]:
# instantiate the problem
AMS21_n20_1 = reschedule_problem(df_subset, n_runway = 2)
# solve the problem
AMS21_n20_1.solve(best_first_graph_search)
AMS21_n20_1.display()

The airport resumed service at 23:30


Unnamed: 0,code,time_sch,pass_load,time_new,util,time_dff
0,OR 3802 TUI fly,2023-12-21 22:30:00,236,2023-12-21 23:30:00,-60.0,60.0
1,LH 2310 Lufthansa,2023-12-21 22:35:00,339,2023-12-21 23:35:00,-60.0,60.0
2,KL 1032 KLM,2023-12-21 22:35:00,344,2023-12-21 23:30:00,-55.0,55.0
3,AZ 118 ITA Airways,2023-12-21 22:40:00,346,2023-12-21 23:35:00,-55.0,55.0
4,KL 1706 KLM,2023-12-21 23:00:00,316,2023-12-21 23:45:00,-45.0,45.0
5,KL 980 KLM,2023-12-21 23:00:00,186,2023-12-21 23:45:00,-45.0,45.0
6,KL 1118 KLM,2023-12-21 23:00:00,263,2023-12-21 23:40:00,-40.0,40.0
7,KL 1434 KLM,2023-12-21 23:00:00,343,2023-12-21 23:40:00,-40.0,40.0
8,KL 1136 KLM,2023-12-21 23:05:00,263,2023-12-21 23:50:00,-45.0,45.0
9,HV 5136 Transavia,2023-12-21 23:15:00,339,2023-12-21 23:55:00,-40.0,40.0


## Test 2 Breadth-First Search
Here we use the breadth first tree search, assuming that the path cost is the depth of the node.
We can see from this particularly case, there is a significant increase of the runtime. 

In this section, we set up the problem for rescheduling 6 flights only because breadth-first search, in general, has a high time complexity as the algorithm seeks to conduct a complete search of all possible terminal state. As such, we also reduce the time of airport disruption to 20 min.

In [5]:
# subset
df_subset_2 = df.tail(6).reset_index(drop = True)
# instantiate the problem
AMS21_n6_2 = reschedule_problem(df_subset_2, n_runway = 2,max_delay=120, disruption_dur= 20)
# solve the problem
AMS21_n6_2.solve(breadth_first_search)
# return the solution
breadth_fs_sol_2 = AMS21_n6_2.solution

The airport resumed service at 23:45
Iteration: 1 at depth 1
Check if the depth is consistent withe the node state.
There should be 0.5 timeslot iterated.
frontier length: 1
Iteration: 2 at depth 1
Check if the depth is consistent withe the node state.
There should be 0.5 timeslot iterated.
frontier length: 2
Iteration: 3 at depth 1
Check if the depth is consistent withe the node state.
There should be 0.5 timeslot iterated.
frontier length: 3
Iteration: 4 at depth 1
Check if the depth is consistent withe the node state.
There should be 0.5 timeslot iterated.
frontier length: 4
Iteration: 5 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 5
Iteration: 6 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 6
Iteration: 7 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 7
Iter

In [6]:
# inspect the result
n_sol =len(breadth_fs_sol_2)
print(f"There are in total {n_sol} number of solution.")
# unzip the solution list
max_util = breadth_fs_sol_2[0][0]
sol_max_score = [node for util, node in breadth_fs_sol_2 if util == max_util]
print(f"The maximum utility of all solution is {max_util}, with {len(sol_max_score)} yielding this utility")

There are in total 1614 number of solution.
The maximum utility of all solution is -75.0, with 18 yielding this utility


As expected, the algorithm takes a long time before it returns a solution. With this reduced case of rescheduling only 6 flights, it takes more than 15 seconds for the algorithm to complete.
It returns all 1614 solutions, out of which 12 share the same utility of -80, that is the maximum utility across all the solutions.

In fact, the time complexity for the best first search algorithm put a lot of stress on the computation power and it is simply not feasible to rely on the algorithm this resolve the scheduling issue when a lot of flights have to be rescheduled: Empirically, rescheduling 10 flights takes more than **20 min**. Due to the limitation of the computational power, we will not be discussing such case in this project.

Nonetheless, this highlight a major drawback of the application of best-first search in a real-world scenario:
* If there is not more flights to reschedule, rescheudling the flights with a simple rule of assigning the earliest slot to the earlist plan would not yield a lot of lost in utility. 
* If there is more flights to reschedule, then the algorith simply cannot return a solution within a feasible time constraint before an airport has to return to operation.

## Test 3 Model Comparison
In this section, let's compare the breath-first search algorith with the uniform-cost search. As discussed in the section above, breadth-first search requires a lot of computation. Hence, we will limit the size of our problem to rescheduling only 5 flights in thhis section for a fair comparison of the solution they return.

### Breadth-First Search

In [7]:
# subset
df_subset_3 = df_subset.tail(6).reset_index(drop = True)
# instantiate the problem
AMS21_n5_3 = reschedule_problem(df_subset_3, n_runway = 1,max_delay=120, disruption_dur= 15)
# solve the problem
AMS21_n5_3.solve(breadth_first_search)
# the breadth first search returns a list of solution(s)
breadth_fs_sol_3 = AMS21_n5_3.solution

The airport resumed service at 23:40
Iteration: 1 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 1
Iteration: 2 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 2
Iteration: 3 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 3
Iteration: 4 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 4
Iteration: 5 at depth 2
Check if the depth is consistent withe the node state.
There should be 2.0 timeslot iterated.
frontier length: 4
Iteration: 6 at depth 2
Check if the depth is consistent withe the node state.
There should be 2.0 timeslot iterated.
frontier length: 5
Iteration: 7 at depth 2
Check if the depth is consistent withe the node state.
There should be 2.0 timeslot iterated.
frontier length: 6
Iter

In [8]:
# inspect the result
n_sol =len(breadth_fs_sol_3)
print(f"There are in total {n_sol} number of solution.")
# unzip the solution list
max_util = breadth_fs_sol_3[0][0]
sol_max_score = [node for util, node in breadth_fs_sol_3 if util == max_util]
print(f"The maximum utility of all solution is {max_util}, with {len(sol_max_score)} yielding this utility")

There are in total 216 number of solution.
The maximum utility of all solution is -90.0, with 216 yielding this utility


In [9]:
# print the optimal solution
breadth_fs_sol_3[0][1].state

Unnamed: 0,time_new,util
HV 6114 Transavia,2023 12 21 23:40,-15.0
HV 6902 Transavia,2023 12 21 23:45,-5.0
KL 1608 KLM,2023 12 21 23:50,-25.0
HV 5218 Transavia,2023 12 21 23:55,0.0
HV 5806 Transavia,2023 12 22 00:00,-30.0
HV 5666 Transavia,2023 12 22 00:05,-15.0


### Uniform Cost Search

In [10]:
# Instatiate and return the result
AMS21_n5_3_2 = reschedule_problem(df_subset_3, n_runway = 1,max_delay=120, disruption_dur= 15)
AMS21_n5_3_2.solve(best_first_graph_search)
AMS21_n5_3_2_result = AMS21_n5_3_2.display()

The airport resumed service at 23:40


In [11]:
AMS21_n5_3_2_result

Unnamed: 0,code,time_sch,pass_load,time_new,util,time_dff
0,HV 6114 Transavia,2023-12-21 23:25:00,300,2023-12-21 23:45:00,-20.0,20.0
1,KL 1608 KLM,2023-12-21 23:25:00,309,2023-12-21 23:40:00,-15.0,15.0
2,HV 5806 Transavia,2023-12-21 23:30:00,310,2023-12-21 23:50:00,-20.0,20.0
3,HV 6902 Transavia,2023-12-21 23:40:00,316,2023-12-21 23:55:00,-15.0,15.0
4,HV 5666 Transavia,2023-12-21 23:50:00,347,2023-12-22 00:00:00,-10.0,10.0
5,HV 5218 Transavia,2023-12-21 23:55:00,304,2023-12-22 00:05:00,-10.0,10.0


### Model Evaluation

In [12]:
# parse the result
compare_df = pd.merge(breadth_fs_sol_3[0][1].state, AMS21_n5_3_2_result[['code',"time_sch",'time_new','util']],
                left_index = True, right_on = 'code', suffixes = ("_breadth","_uniform"))

In [13]:
# reorder and print result
compare_df[["code","time_sch",'time_new_breadth', 'util_breadth', 'time_new_uniform',
       'util_uniform']].sort_values("time_sch")

Unnamed: 0,code,time_sch,time_new_breadth,util_breadth,time_new_uniform,util_uniform
0,HV 6114 Transavia,2023-12-21 23:25:00,2023 12 21 23:40,-15.0,2023-12-21 23:45:00,-20.0
1,KL 1608 KLM,2023-12-21 23:25:00,2023 12 21 23:50,-25.0,2023-12-21 23:40:00,-15.0
2,HV 5806 Transavia,2023-12-21 23:30:00,2023 12 22 00:00,-30.0,2023-12-21 23:50:00,-20.0
3,HV 6902 Transavia,2023-12-21 23:40:00,2023 12 21 23:45,-5.0,2023-12-21 23:55:00,-15.0
4,HV 5666 Transavia,2023-12-21 23:50:00,2023 12 22 00:05,-15.0,2023-12-22 00:00:00,-10.0
5,HV 5218 Transavia,2023-12-21 23:55:00,2023 12 21 23:55,0.0,2023-12-22 00:05:00,-10.0


In [14]:
# compare the utility
compare_df[['util_breadth','util_uniform']].sum()

util_breadth   -90.0
util_uniform   -90.0
dtype: float64

From this simple case, we can see that breadth search is not superior than the uniform cost search for this problem.

## Test 4
Instead of using the solely the time delayed, let's see if we define utility differently will yield variation in the performance between breadth-first search and uniform-cost search. We will use the `reschedule_updated_u` class to set up the problem. The class object have similar behaviours with the `rescheuld_problem` used above. The only difference is the computation of the utility of each rescheduled flight, which is defined as a function of:
* time-delayed
* Passenger load

There are different way to define the passenger weight on the utility. One proposed solution:
$$\arctan(\frac{pass\_load}{200})\frac{2}{\pi}$$


In [38]:
def util_f(delay, pass_load):
    scaled_pass_load = pass_load / 200
    util = delay * np.arctan(scaled_pass_load) /np.pi * 2
    return util


### Bread-First Search

In [39]:
# subset
df_subset_4 = df_subset.tail(6).reset_index(drop = True)
# instantiate the problem
AMS21_n6_4 = reschedule_custom_u(df_subset_4,util_f, n_runway = 1,max_delay=120, disruption_dur= 15)
# solve the problem
AMS21_n6_4.solve(breadth_first_search)
# the breadth first search returns a list of solution(s)
breadth_fs_sol_4 = AMS21_n6_4.solution

# inspect the result
n_sol =len(breadth_fs_sol_4)
print(f"There are in total {n_sol} number of solution.")
# unzip the solution list
max_util = breadth_fs_sol_4[0][0]
sol_max_score = [node for util, node in breadth_fs_sol_4 if util == max_util]
print(f"The maximum utility of all solution is {max_util}, with {len(sol_max_score)} yielding this utility")

The airport resumed service at 23:40
Iteration: 1 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 1
Iteration: 2 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 2
Iteration: 3 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 3
Iteration: 4 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 4
Iteration: 5 at depth 2
Check if the depth is consistent withe the node state.
There should be 2.0 timeslot iterated.
frontier length: 4
Iteration: 6 at depth 2
Check if the depth is consistent withe the node state.
There should be 2.0 timeslot iterated.
frontier length: 5
Iteration: 7 at depth 2
Check if the depth is consistent withe the node state.
There should be 2.0 timeslot iterated.
frontier length: 6
Iter

In [40]:
# print the optimal solution
breadth_fs_sol_4[0][1].state

Unnamed: 0,time_new,util
HV 6902 Transavia,2023 12 21 23:40,0.0
HV 5806 Transavia,2023 12 21 23:45,-9.528576
HV 5666 Transavia,2023 12 21 23:50,0.0
KL 1608 KLM,2023 12 21 23:55,-19.029023
HV 5218 Transavia,2023 12 22 00:00,-3.147738
HV 6114 Transavia,2023 12 22 00:05,-25.026637


### Uniform Cost Search

In [41]:
# Instatiate and return the result
AMS21_n6_4_2 = reschedule_custom_u(df_subset_4,util_f, n_runway = 1,max_delay=120, disruption_dur= 15)
AMS21_n6_4_2.solve(best_first_graph_search)
AMS21_n6_4_2_result = AMS21_n6_4_2.display()

The airport resumed service at 23:40


### Model Evaluation

In [42]:
# parse the result
compare_df = pd.merge(breadth_fs_sol_4[0][1].state, AMS21_n6_4_2_result[['code',"time_sch",'time_new','util']],
                left_index = True, right_on = 'code', suffixes = ("_breadth","_uniform"))
# reorder and print result
compare_df[["code","time_sch",'time_new_breadth', 'util_breadth', 'time_new_uniform',
       'util_uniform']].sort_values("time_sch")

Unnamed: 0,code,time_sch,time_new_breadth,util_breadth,time_new_uniform,util_uniform
1,KL 1608 KLM,2023-12-21 23:25:00,2023 12 21 23:55,-19.029023,2023-12-21 23:40:00,-9.514512
0,HV 6114 Transavia,2023-12-21 23:25:00,2023 12 22 00:05,-25.026637,2023-12-21 23:45:00,-12.513318
2,HV 5806 Transavia,2023-12-21 23:30:00,2023 12 21 23:45,-9.528576,2023-12-21 23:50:00,-12.704768
3,HV 6902 Transavia,2023-12-21 23:40:00,2023 12 21 23:40,0.0,2023-12-21 23:55:00,-9.611636
4,HV 5666 Transavia,2023-12-21 23:50:00,2023 12 21 23:50,0.0,2023-12-22 00:00:00,-6.671354
5,HV 5218 Transavia,2023-12-21 23:55:00,2023 12 22 00:00,-3.147738,2023-12-22 00:05:00,-6.295477


In [43]:
# compare the utility
compare_df[['util_breadth','util_uniform']].sum()

util_breadth   -56.731975
util_uniform   -57.311066
dtype: float64

From this result, we do see that the uniform-cost search does not yield the optimal solution.

## Test 5
Let's try using a usility function with the passenger weigh defined as follow:
$$\frac{e^{pass\_load} -1}{e^{pass\_load}}$$

In [21]:
def util_f(delay, pass_load):
    scaled_pass_load = pass_load/200
    util = delay * (np.e ** scaled_pass_load - 1) / np.e ** scaled_pass_load
    return util

In [22]:
# subset
df_subset_5 = df_subset.tail(6).reset_index(drop = True)
# instantiate the problem
AMS21_n6_5 = reschedule_custom_u(df_subset_5, util_f, n_runway = 1,max_delay=120, disruption_dur= 15)
# solve the problem
AMS21_n6_5.solve(breadth_first_search)
# the breadth first search returns a list of solution(s)
breadth_fs_sol_5 = AMS21_n6_5.solution

# inspect the result
n_sol =len(breadth_fs_sol_5)
print(f"There are in total {n_sol} number of solution.")
# unzip the solution list
max_util = breadth_fs_sol_5[0][0]
sol_max_score = [node for util, node in breadth_fs_sol_5 if util == max_util]
print(f"The maximum utility of all solution is {max_util}, with {len(sol_max_score)} yielding this utility")

The airport resumed service at 23:40
Iteration: 1 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 1
Iteration: 2 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 2
Iteration: 3 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 3
Iteration: 4 at depth 1
Check if the depth is consistent withe the node state.
There should be 1.0 timeslot iterated.
frontier length: 4
Iteration: 5 at depth 2
Check if the depth is consistent withe the node state.
There should be 2.0 timeslot iterated.
frontier length: 4
Iteration: 6 at depth 2
Check if the depth is consistent withe the node state.
There should be 2.0 timeslot iterated.
frontier length: 5
Iteration: 7 at depth 2
Check if the depth is consistent withe the node state.
There should be 2.0 timeslot iterated.
frontier length: 6
Iter

In [23]:
# Instatiate and return the result
AMS21_n6_5_2 = reschedule_custom_u(df_subset_5, util_f, n_runway = 1,max_delay=120, disruption_dur= 15)
AMS21_n6_5_2.solve(best_first_graph_search)
AMS21_n6_5_2_result = AMS21_n6_5_2.display()

The airport resumed service at 23:40


### Model Evaluation

In [24]:
# parse the result
compare_df = pd.merge(breadth_fs_sol_5[0][1].state, AMS21_n6_5_2_result[['code',"time_sch",'time_new','util']],
                left_index = True, right_on = 'code', suffixes = ("_breadth","_uniform"))
# reorder and print result
compare_df[["code","time_sch",'time_new_breadth', 'util_breadth', 'time_new_uniform',
       'util_uniform']].sort_values("time_sch")

Unnamed: 0,code,time_sch,time_new_breadth,util_breadth,time_new_uniform,util_uniform
1,KL 1608 KLM,2023-12-21 23:25:00,2023 12 21 23:55,-23.600644,2023-12-21 23:40:00,-11.800322
0,HV 6114 Transavia,2023-12-21 23:25:00,2023 12 22 00:05,-31.074794,2023-12-21 23:45:00,-15.537397
2,HV 5806 Transavia,2023-12-21 23:30:00,2023 12 21 23:45,-11.81628,2023-12-21 23:50:00,-15.755041
3,HV 6902 Transavia,2023-12-21 23:40:00,2023 12 21 23:40,0.0,2023-12-21 23:55:00,-11.910374
4,HV 5666 Transavia,2023-12-21 23:50:00,2023 12 21 23:50,0.0,2023-12-22 00:00:00,-8.235998
5,HV 5218 Transavia,2023-12-21 23:55:00,2023 12 22 00:00,-3.906441,2023-12-22 00:05:00,-7.812881


In [25]:
# compare the utility
compare_df[['util_breadth','util_uniform']].sum()

util_breadth   -70.398158
util_uniform   -71.052012
dtype: float64

## Heuristic Search


To do list

* Consider the case for heuristic tree search

In [None]:
trial_df = df_subset.copy()

In [None]:
trial_df.apply(lambda x: x['time_diff'] < -300,axis = 1)

## Showcase the Diverting Assumption

In [24]:
# Instatiate and return the result
AMS21_n5_3_2 = reschedule_problem(df_subset_3, n_runway = 1,max_delay=10, disruption_dur= 15)
AMS21_n5_3_2.solve(best_first_graph_search)
AMS21_n5_3_2_result = AMS21_n5_3_2.display()

The airport resumed service at 23:40


In [25]:
AMS21_n5_3_2_result = AMS21_n5_3_2.display()

In [26]:
AMS21_n5_3_2_result

Unnamed: 0,code,time_sch,pass_load,time_new,util,time_dff
0,HV 6114 Transavia,2023-12-21 23:25:00,300,1970-01-01 00:00:00,-10.0,diverted
1,KL 1608 KLM,2023-12-21 23:25:00,309,1970-01-01 00:00:00,-10.0,diverted
2,HV 5806 Transavia,2023-12-21 23:30:00,310,2023-12-21 23:40:00,-10.0,10.0
3,HV 6902 Transavia,2023-12-21 23:40:00,316,2023-12-21 23:45:00,-5.0,5.0
4,HV 5666 Transavia,2023-12-21 23:50:00,347,2023-12-21 23:50:00,0.0,-0.0
5,HV 5218 Transavia,2023-12-21 23:55:00,304,2023-12-21 23:55:00,0.0,-0.0
