## Testing CSP Algorithm
This notebook is used to create a reviewed flight schedule. It should take the input:
* the original departure / landing time\
* id of the flight

In [187]:
import pandas as pd
import constraint
import time
import numpy as np

In [2]:
data = pd.read_csv("./data/19DEC2023_AMS.csv", parse_dates= ["time_sch","time_act"])

  data = pd.read_csv("./data/19DEC2023_AMS.csv", parse_dates= ["time_sch","time_act"])
  data = pd.read_csv("./data/19DEC2023_AMS.csv", parse_dates= ["time_sch","time_act"])


Note that the data contains flights scheduled for the morning hours to the earning hours of the preceding day.
We will first sort the values by the scheduled time `time_sch` to identify those late night flights

In [3]:
data_sorted = data.sort_values("time_sch").reset_index(drop = True)

For the purpose of testing, we will only subset a small chunk of the flight schedule.

In [15]:
# retrieve the boolean index for flightrs scheduled between 10pm - 3am
# index = data_sorted['time_sch'].map(lambda x: x.hour>=  23) 
# subset the data
# data_subset = data[index]
# print(f"There are {len(data_subset)} scheduled flights in this dataset.")


# alternatively get the tail
data_subset_n5 = data_sorted.tail(5)


## Set Up 1: Naive Case
**Assumptions**
Now assume that the airport has to be shut down for 30 minutes between 22:00 to 22:30. Note also that after 23:00, the airport can only operate on **1 runway**. We also assume the airport only have enough staff to handle any take-off and landing until 01:00 for simplicity.
We also assume that flights can at best arrive on schedule but never earlier than schedule.
We will **not** consider if airborne flight has enough fuel to cruise in the air or not for this first simulation.

We will first run a naive algorithm that test for the `constraint` package in solving the question. 
Each flight can be considered as a varaible. For simplicity, we express the domain as the time (in minute) after the airport continue to operate (22:30). Given the assumption we made, the domain should take the minimum value of $f(x)=max(0,time\_sch)$ and the maximum value of 90.

In [19]:
# initiate the problem
flight_schedule = constraint.Problem()

# define global var
n_runway = 1
time_lag = 5
op_hr = 23
op_min = 30
max_time = 30

# add a variable for each flight
for key, flight in data_subset_n5.iterrows():
    print(f"Adding variable for {flight['code']} that was scheduled to use the runway at {flight['time_sch'].hour}:{flight['time_sch'].minute:02d}")
    # compute the relative time of schedule take-off/landing
    min_time = max((flight['time_sch'].hour - op_hr) * 60 + flight['time_sch'].minute -op_min,0)
    flight_schedule.addVariable(flight['code'], range(min_time,max_time,time_lag ))

Adding variable for HV 6118 Transavia that was scheduled to use the runway at 23:25
Adding variable for KL 1608 KLM that was scheduled to use the runway at 23:25
Adding variable for HV 6902 Transavia that was scheduled to use the runway at 23:40
Adding variable for HV 5356 Transavia that was scheduled to use the runway at 23:45
Adding variable for HV 6120 Transavia that was scheduled to use the runway at 23:50


In [120]:
# define constraint - no more than 2 flight at a given time 
def not_same_time(*flights):
    """
    check that no two flights are using the runway at the same time
    """
    global n_runway
    schedule = [flight for flight in flights]
    schedule_counter = pd.Series(schedule).value_counts()
    if schedule_counter.max() > n_runway:
        # assumed only one runway is operable
        return None
    else:
        return True


# add the constraint
flight_schedule.addConstraint(not_same_time,[flight['code'] for key, flight in  data_subset_n5.iterrows()])


In [121]:
# get the solution
sol_1 = flight_schedule.getSolutions()

# print the solutions
print(len(sol_1))
print(sol_1[0])


48
{'HV 6120 Transavia': 25, 'HV 5356 Transavia': 20, 'HV 6902 Transavia': 15, 'HV 6118 Transavia': 10, 'KL 1608 KLM': 0}


In the case of only 5 flights, the algorithm returns 48 solutions in about 10 secods. For   As the number of flight increases, the processing time to get all the solution increases exponentially.

## Set up 2
We can now try to build a more complex case.
* Consider whether a flight is departure or arrival.
* Consider the aircraft load
* Consider the possibility of diversion

We will first need to manipulate the data to add these complexity. For the departure/arrival variable, this is already implicitly stated in the dataframe. For the load, however, the existing data frame do not have any data for it. We will randomly assign a continuous variable as a simulation case.


In [122]:
# create a shallow copy
df_2 = data.copy()

# add a binary departure variable
df_2['depature'] = df_2['orig'].map(lambda orig: True if orig == "Amsterdam" else False)

# add a random variable of flight passenger load
np.random.seed(2024)
df_2['pass_load'] = np.random.normal(300,50,size =len(df_2))



Under this set up, we can define each flight as a variable with a domain of either a integer value or `none` for diverted flight.

In this set up, let's consider the following constraints:
* There could only be maximum of 1 flights at each time slot.
* Each flight can only be delayed by a maximum of 80 minutes.
* (potentially) the cumulative number of take off/landing should only differ by 10 (this ensures that the landing plane always have a gate)

Let's assume the airport is shut between 23:00 to 23:59. We only take a further subset to limit the run time.

In [None]:
# define global var
n_runway = 1
time_lag = 5
op_hr = 24
op_min = 0
max_time = 60 # time for airport still under operation in minute
max_diversion = 0.2
max_delay_minute = 80

In [199]:
# subset for flights scheduled after 23
df2_subset = df_2[df_2['time_sch'].map(lambda x: x.hour >= 23)].sample(frac = 0.5)

In [201]:
# maximum 2 flights at each time slot (defined in part 1)
# def not_same_time(*flights)

# each flight can only be delayed by 1.5 hours to iterate through flight
def max_delay(flight_var):
    # parse the scheduled time in minute relative to time of airport resumption
    global df2_subset
    global flight_code
    global max_delay_minute
    time_sch = df2_subset[df2_subset['code'] == flight_code]['time_sch'].reset_index().loc[0,'time_sch']
    time_sch = (time_sch.hour - op_hr)* 60 + time_sch.minute - op_min
    if (flight_var is None) or (flight_var - time_sch <= max_delay_minute):
        return True
    else:
        return None

# maximum number of diversion
def max_n_divert(*flights):
    global max_diversion # in decimal unit
    schedule = pd.Series([flight for flight in flights])
    n_resch = schedule.count() # number of rescheduled flights
    n_sch = len(schedule)
    if 1- n_resch / n_sch < max_diversion:
        return True

# cumulative take off and landing can only differ by 10


In [202]:
# initiate the problem
csp2 = constraint.Problem()

# add a variable for each flight
for key, flight in df2_subset.iterrows():
    print(f"Adding variable for {flight['code']} that was scheduled to use the runway at {flight['time_sch'].hour}:{flight['time_sch'].minute:02d}")
    # compute the relative time of schedule take-off/landing
    min_time = max((flight['time_sch'].hour - op_hr) * 60 + flight['time_sch'].minute -op_min,0)
    domain = list(range(min_time,max_time, time_lag))
    domain.append(None)
    print(type(domain))
    print(domain)
    # add the variable
    csp2.addVariable(flight['code'], domain)

# add slots constraint
csp2.addConstraint(not_same_time,[flight['code'] for key, flight in  df2_subset.iterrows()])

# add max delay constraint
for key, flight in df2_subset.iterrows():
    flight_code = flight['code']
    csp2.addConstraint(max_delay,[flight['code']])

# add max n delay constraint
    csp2.addConstraint(max_n_divert,[flight['code'] for key, flight in  df2_subset.iterrows()])

Adding variable for KL 1608 KLM that was scheduled to use the runway at 23:25
<class 'list'>
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, None]
Adding variable for KL 980 KLM that was scheduled to use the runway at 23:00
<class 'list'>
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, None]
Adding variable for KL 1706 KLM that was scheduled to use the runway at 23:00
<class 'list'>
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, None]
Adding variable for KL 1834 KLM that was scheduled to use the runway at 23:20
<class 'list'>
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, None]
Adding variable for HV 6120 Transavia that was scheduled to use the runway at 23:50
<class 'list'>
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, None]
Adding variable for KL 1118 KLM that was scheduled to use the runway at 23:00
<class 'list'>
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, None]
Adding variable for HV 5140 Transavia that was scheduled to use the runway at 23:00
<class 'list'>
[0, 5, 10, 15, 20, 2

To test the time complexity of the algorithm, let's first just get one solution:

In [203]:
sol2_first = csp2.getSolution()

Even with just getting one solution, algorithm requires more than 3 minutes to yield the first result.

In [204]:
sol2_first

{'HV 5140 Transavia': None,
 'HV 6120 Transavia': 40,
 'KL 1118 KLM': 35,
 'KL 1136 KLM': 30,
 'KL 1608 KLM': 25,
 'KL 1706 KLM': 20,
 'KL 1834 KLM': 15,
 'KL 980 KLM': 10}

In [None]:
sol2 = csp2.getSolutions()