<a target="_blank" href="https://colab.research.google.com/github/PassengerSim/algorithms/blob/main/forecasting/conditional-forecasting.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>  

This notebook defines an agreed algorithm for unconditional Q forecasting.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Exogenous Inputs to Q Forecast

## Timeframes

The booking horizon is divided into timeframes of arbitrary length, not necesssarily 
homogeneous. Without loss of generality, we assume each timeframe is a whole number of 
days. Each timeframe can be identified by a `DCP_INDEX`, which starts at 0 and increases
by one sequentially through the timeframes, or by a `TF` identity, which gives the
number of days from departure at the beginning of the timeframe.


In [2]:
timeframes = pd.Series([21, 14, 7], name="TF").rename_axis(index="DCP_INDEX")
timeframes.to_frame()

Unnamed: 0_level_0,TF
DCP_INDEX,Unnamed: 1_level_1
0,21
1,14
2,7


## Frat5 Curve

This curve defines, for each timeframe, the fare ratio at which half of
customers would be willing to buy up to a higher fare class.


In [3]:
frat_5_curve = pd.Series([1.20, 1.63, 2.83], index=timeframes, name="FRAT5")
frat_5_curve.to_frame()

Unnamed: 0_level_0,FRAT5
TF,Unnamed: 1_level_1
21,1.2
14,1.63
7,2.83


## Max Cap

The "max cap" is a limiting parameter, which controls the maximum level of "Q" inflation for any 
unit of actual sales.  Under Q forecasting, early timeframes generally have a Frat5 curve value
that makes it so that one sale of a high-value fare implies hundreds or thousands of non-purchasing
customers.  This cap ensures a limit to the level of inflation of those high value customers, so 
the forecast does not become skewed due to a very small number of high-value observations.

In [4]:
max_cap = 10

## Fare Classes and Prices

In [5]:
fares = pd.Series(
    {
        "Y0": 500,
        "Y1": 400,
        "Y2": 300,
        "Y3": 225,
        "Y4": 175,
        "Y5": 150,
    },
    name="PRICE",
).rename_axis(index="FARECLASS")

assert fares.is_monotonic_decreasing

fares.to_frame()

Unnamed: 0_level_0,PRICE
FARECLASS,Unnamed: 1_level_1
Y0,500
Y1,400
Y2,300
Y3,225
Y4,175
Y5,150


## Sales History

We presume we have recorded historical sales by fare class for 26 prior 
sampled days. We also have recorded whether each fare was available
for sale or not, so we can differentiate two differnt zero sales states:
because the fare class was not available, or because it was available
but we simply failed to sell it.

Q forecasting assumes a fully fenceless/unrestricted market, 
with the (optional) exception of advance purchase restrictions, which
apply to the time of purchase but do not otherwise differentiate fare
classes.  In this environment, there will theoretically be no sales 
above the lowest available fare class at any moment, as there is no 
reason for any customer to purchase a fare class higher than the 
lowest (least expensive) available.  In practice, sometimes "stuff happens"
and other fare classes somehow end up getting sold, but we ignore this 
for simulation. 

In this example, the Y5 class has an AP restriction at 14 days (i.e. after
the first timeframe) and the Y4 class has an AP restriction at 7 days 
(i.e. after the second timeframe).

For ease of exposition in this notebook, we comingle the sales and closure data 
here into one data structure, indicating a closure with X and a number of sales
with an integer, so that "0" means we could have sold a fare being offered but
did not, and "X" means the fare was not offered (and thus never sold).


In [6]:
# some tools for working with the data

X = np.nan


def format_sales_and_closure(**raw_sales):
    """Convert raw sales data to a sales and closure dataframes."""
    sales = pd.concat(
        {
            k: pd.DataFrame(v)
            .rename_axis(index="SAMPLE", columns="TF")
            .rename(columns=timeframes)
            for k, v in raw_sales.items()
        },
        names=["FARECLASS"],
    )
    closures = sales.isna()
    sales = sales.fillna(0).astype(int)

    # check closures are ordered consistent with monotonicity rules,
    # so that the closure of any fare class in any timeframe requires
    # the closure of all less expensive fare classes in the same timeframe
    assert (closures.loc["Y5"] >= closures.loc["Y4"]).all(axis=None)
    assert (closures.loc["Y4"] >= closures.loc["Y3"]).all(axis=None)
    assert (closures.loc["Y3"] >= closures.loc["Y2"]).all(axis=None)
    assert (closures.loc["Y2"] >= closures.loc["Y1"]).all(axis=None)
    assert (closures.loc["Y1"] >= closures.loc["Y0"]).all(axis=None)

    return sales, closures

In [7]:
# Clean Sales Data
#
# This sales history represents a scenario where all sales are made at the
# lowest available fare class, and the identified of the lowest fare class
# changes only at DCPs and not in the middle of a timeframe.  This is an
# artificial construct applied here to simplify the exposition for a base
# algorithm.

# fmt: off

sales_clean, closures_clean = format_sales_and_closure(
    Y5=[
        [ 7, X, X],  # 00
        [ 7, X, X],  # 01
        [ 6, X, X],  # 02
        [ X, X, X],  # 03
        [19, X, X],  # 04
        [ X, X, X],  # 05
        [11, X, X],  # 06
        [ X, X, X],  # 07
        [17, X, X],  # 08
        [ 8, X, X],  # 09
        [ X, X, X],  # 10
        [23, X, X],  # 11
        [28, X, X],  # 12
        [17, X, X],  # 13
        [13, X, X],  # 14
        [ X, X, X],  # 15
        [11, X, X],  # 16
        [18, X, X],  # 17
        [ X, X, X],  # 18
        [ X, X, X],  # 19
        [ X, X, X],  # 20
        [ X, X, X],  # 21
        [ X, X, X],  # 22
        [ X, X, X],  # 23
        [ X, X, X],  # 24
        [ X, X, X],  # 25
    ],
    Y4=[
        [0, 1, X],  # 00
        [0, 1, X],  # 01
        [0, 1, X],  # 02
        [3, 1, X],  # 03
        [0, 2, X],  # 04
        [5, 1, X],  # 05
        [0, 1, X],  # 06
        [6, 2, X],  # 07
        [0, 1, X],  # 08
        [0, 1, X],  # 09
        [9, X, X],  # 10
        [0, 1, X],  # 11
        [0, X, X],  # 12
        [0, 1, X],  # 13
        [0, 2, X],  # 14
        [4, X, X],  # 15
        [0, 2, X],  # 16
        [0, 1, X],  # 17
        [0, 2, X],  # 18
        [0, 1, X],  # 19
        [0, 1, X],  # 20
        [9, 2, X],  # 21
        [0, 1, X],  # 22
        [X, X, X],  # 23
        [0, 1, X],  # 24
        [X, X, X],  # 25
    ],
    Y3=[
        [0, 0, 1],  # 00
        [0, 0, 2],  # 01
        [0, 0, 0],  # 02
        [0, 0, 1],  # 03
        [0, 0, X],  # 04
        [0, 0, 1],  # 05
        [0, 0, 3],  # 06
        [0, 0, 2],  # 07
        [0, 0, 1],  # 08
        [0, 0, 2],  # 09
        [0, 1, 3],  # 10
        [0, 0, 1],  # 11
        [0, 5, X],  # 12
        [0, 0, 1],  # 13
        [0, 0, 2],  # 14
        [0, 2, 1],  # 15
        [0, 0, 1],  # 16
        [0, 0, X],  # 17
        [0, 0, X],  # 18
        [0, 0, 1],  # 19
        [0, 0, X],  # 20
        [0, 0, X],  # 21
        [0, 0, 0],  # 22
        [X, X, 1],  # 23
        [0, 0, X],  # 24
        [4, 4, 1],  # 25
    ],
    Y2=[
        [0, 0, 0],  # 00
        [0, 0, 0],  # 01
        [0, 0, 0],  # 02
        [0, 0, 0],  # 03
        [0, 0, X],  # 04
        [0, 0, 0],  # 05
        [0, 0, 0],  # 06
        [0, 0, 0],  # 07
        [0, 0, 0],  # 08
        [0, 0, 0],  # 09
        [0, 0, 0],  # 10
        [0, 0, 0],  # 11
        [0, 0, X],  # 12
        [0, 0, 0],  # 13
        [0, 0, 0],  # 14
        [0, 0, 0],  # 15
        [0, 0, 0],  # 16
        [0, 0, X],  # 17
        [0, 0, 1],  # 18
        [0, 0, 0],  # 19
        [0, 0, 2],  # 20
        [0, 0, 1],  # 21
        [0, 0, 0],  # 22
        [1, 0, 0],  # 23
        [0, 0, 1],  # 24
        [0, 0, 0],  # 25
    ],
    Y1=[
        [0, 0, 0],  # 00
        [0, 0, 0],  # 01
        [0, 0, 0],  # 02
        [0, 0, 0],  # 03
        [0, 0, 1],  # 04
        [0, 0, 0],  # 05
        [0, 0, 0],  # 06
        [0, 0, 0],  # 07
        [0, 0, 0],  # 08
        [0, 0, 0],  # 09
        [0, 0, 0],  # 10
        [0, 0, 0],  # 11
        [0, 0, X],  # 12
        [0, 0, 0],  # 13
        [0, 0, 0],  # 14
        [0, 0, 0],  # 15
        [0, 0, 0],  # 16
        [0, 0, 1],  # 17
        [0, 0, 0],  # 18
        [0, 0, 0],  # 19
        [0, 0, 0],  # 20
        [0, 0, 0],  # 21
        [0, 0, 0],  # 22
        [0, 0, 0],  # 23
        [0, 0, 0],  # 24
        [0, 0, 0],  # 25
    ],
    Y0=[
        [0, 0, 0],  # 00
        [0, 0, 0],  # 01
        [0, 0, 0],  # 02
        [0, 0, 0],  # 03
        [0, 0, 0],  # 04
        [0, 0, 0],  # 05
        [0, 0, 0],  # 06
        [0, 0, 0],  # 07
        [0, 0, 0],  # 08
        [0, 0, 0],  # 09
        [0, 0, 0],  # 10
        [0, 0, 0],  # 11
        [0, 0, 1],  # 12
        [0, 0, 0],  # 13
        [0, 0, 0],  # 14
        [0, 0, 0],  # 15
        [0, 0, 0],  # 16
        [0, 0, 0],  # 17
        [0, 0, 0],  # 18
        [0, 0, 0],  # 19
        [0, 0, 0],  # 20
        [0, 0, 0],  # 21
        [0, 0, 0],  # 22
        [0, 0, 0],  # 23
        [0, 0, 0],  # 24
        [0, 0, 0],  # 25
    ],
)

# fmt: on

In [8]:
# Heavy Sales Data
#
# This sales history represents a scenario where all sales are made at the
# lowest available fare class, and the identified of the lowest fare class
# changes only at DCPs and not in the middle of a timeframe.  This is an
# artificial construct applied here to simplify the exposition for a base
# algorithm.  In addition, in every timeframe, there is at least one sale
# in the lowest available fare class (i.e. no zeros anywhere that there 
# might not be a zero).  This scenario is used for some technical diagnostics
# and is not a realistic scenario.

# fmt: off

sales_heavy, closures_heavy = format_sales_and_closure(
    Y5=[
        [ 7, X, X],  # 00
        [ 7, X, X],  # 01
        [ 6, X, X],  # 02
        [ X, X, X],  # 03
        [19, X, X],  # 04
        [ X, X, X],  # 05
        [11, X, X],  # 06
        [ X, X, X],  # 07
        [17, X, X],  # 08
        [ 8, X, X],  # 09
        [ X, X, X],  # 10
        [23, X, X],  # 11
        [28, X, X],  # 12
        [17, X, X],  # 13
        [13, X, X],  # 14
        [ X, X, X],  # 15
        [11, X, X],  # 16
        [18, X, X],  # 17
        [ X, X, X],  # 18
        [ X, X, X],  # 19
        [ X, X, X],  # 20
        [ X, X, X],  # 21
        [ X, X, X],  # 22
        [ X, X, X],  # 23
        [ X, X, X],  # 24
        [ X, X, X],  # 25
    ],
    Y4=[
        [0, 1, X],  # 00
        [0, 1, X],  # 01
        [0, 1, X],  # 02
        [3, 1, X],  # 03
        [0, 2, X],  # 04
        [5, 1, X],  # 05
        [0, 1, X],  # 06
        [6, 2, X],  # 07
        [0, 1, X],  # 08
        [0, 1, X],  # 09
        [9, X, X],  # 10
        [0, 1, X],  # 11
        [0, X, X],  # 12
        [0, 1, X],  # 13
        [0, 2, X],  # 14
        [4, X, X],  # 15
        [0, 2, X],  # 16
        [0, 1, X],  # 17
        [3, 2, X],  # 18
        [2, 1, X],  # 19
        [4, 1, X],  # 20
        [9, 2, X],  # 21
        [2, 1, X],  # 22
        [X, X, X],  # 23
        [1, 1, X],  # 24
        [X, X, X],  # 25
    ],
    Y3=[
        [0, 0, 1],  # 00
        [0, 0, 2],  # 01
        [0, 0, 3],  # 02
        [0, 0, 1],  # 03
        [0, 0, X],  # 04
        [0, 0, 1],  # 05
        [0, 0, 3],  # 06
        [0, 0, 2],  # 07
        [0, 0, 1],  # 08
        [0, 0, 2],  # 09
        [0, 1, 3],  # 10
        [0, 0, 1],  # 11
        [0, 5, X],  # 12
        [0, 0, 1],  # 13
        [0, 0, 2],  # 14
        [0, 2, 1],  # 15
        [0, 0, 1],  # 16
        [0, 0, X],  # 17
        [0, 0, X],  # 18
        [0, 0, 1],  # 19
        [0, 0, X],  # 20
        [0, 0, X],  # 21
        [0, 0, 1],  # 22
        [X, X, 1],  # 23
        [0, 0, X],  # 24
        [4, 4, 1],  # 25
    ],
    Y2=[
        [0, 0, 0],  # 00
        [0, 0, 0],  # 01
        [0, 0, 0],  # 02
        [0, 0, 0],  # 03
        [0, 0, X],  # 04
        [0, 0, 0],  # 05
        [0, 0, 0],  # 06
        [0, 0, 0],  # 07
        [0, 0, 0],  # 08
        [0, 0, 0],  # 09
        [0, 0, 0],  # 10
        [0, 0, 0],  # 11
        [0, 0, X],  # 12
        [0, 0, 0],  # 13
        [0, 0, 0],  # 14
        [0, 0, 0],  # 15
        [0, 0, 0],  # 16
        [0, 0, X],  # 17
        [0, 0, 1],  # 18
        [0, 0, 0],  # 19
        [0, 0, 2],  # 20
        [0, 0, 1],  # 21
        [0, 0, 0],  # 22
        [1, 2, 0],  # 23
        [0, 0, 1],  # 24
        [0, 0, 0],  # 25
    ],
    Y1=[
        [0, 0, 0],  # 00
        [0, 0, 0],  # 01
        [0, 0, 0],  # 02
        [0, 0, 0],  # 03
        [0, 0, 1],  # 04
        [0, 0, 0],  # 05
        [0, 0, 0],  # 06
        [0, 0, 0],  # 07
        [0, 0, 0],  # 08
        [0, 0, 0],  # 09
        [0, 0, 0],  # 10
        [0, 0, 0],  # 11
        [0, 0, X],  # 12
        [0, 0, 0],  # 13
        [0, 0, 0],  # 14
        [0, 0, 0],  # 15
        [0, 0, 0],  # 16
        [0, 0, 1],  # 17
        [0, 0, 0],  # 18
        [0, 0, 0],  # 19
        [0, 0, 0],  # 20
        [0, 0, 0],  # 21
        [0, 0, 0],  # 22
        [0, 0, 0],  # 23
        [0, 0, 0],  # 24
        [0, 0, 0],  # 25
    ],
    Y0=[
        [0, 0, 0],  # 00
        [0, 0, 0],  # 01
        [0, 0, 0],  # 02
        [0, 0, 0],  # 03
        [0, 0, 0],  # 04
        [0, 0, 0],  # 05
        [0, 0, 0],  # 06
        [0, 0, 0],  # 07
        [0, 0, 0],  # 08
        [0, 0, 0],  # 09
        [0, 0, 0],  # 10
        [0, 0, 0],  # 11
        [0, 0, 1],  # 12
        [0, 0, 0],  # 13
        [0, 0, 0],  # 14
        [0, 0, 0],  # 15
        [0, 0, 0],  # 16
        [0, 0, 0],  # 17
        [0, 0, 0],  # 18
        [0, 0, 0],  # 19
        [0, 0, 0],  # 20
        [0, 0, 0],  # 21
        [0, 0, 0],  # 22
        [0, 0, 0],  # 23
        [0, 0, 0],  # 24
        [0, 0, 0],  # 25
    ],
)

# fmt: on

assert sales_heavy.groupby("SAMPLE").sum().min(axis=None) > 0

In [9]:
# check that sales data matches other algorithm examples

import import_ipynb
import importlib
cf = importlib.import_module("conditional-forecasting")
pd.testing.assert_frame_equal(cf.sales_clean, sales_clean)
pd.testing.assert_frame_equal(cf.sales_heavy, sales_heavy)

Forecast Mean: [12.48363676  1.83805027  1.6379777 ]
Fcst Variance: [33.811018    1.19089358  0.5034093 ]
Forecast Mean: [11.6605633   1.9381824   1.65515212]
Fcst Variance: [34.24086023  1.19706163  0.50358652]
Forecast Mean: [10.83738941  2.04752026  1.68058334]
Fcst Variance: [35.53059669  1.21788619  0.50449993]
Forecast Mean: [12.26926131  1.906933    1.66088499]
Fcst Variance: [33.84017763  1.1938125   0.50372458]
Forecast Mean: [12.48363676  1.83805027  1.6379777 ]
Fcst Variance: [35.15717185  1.30627819  0.61879392]
Forecast Mean: [13.05507655  1.9520325   1.7559949 ]
Fcst Variance: [39.93497886  1.23429957  0.46775185]


In [10]:
# Select Scenario for Notebook
sales, closures = sales_clean, closures_clean

# Calculated Values

In [11]:
def supc(t):
    return -np.log(0.5) / (t - 1)


supc(frat_5_curve)

TF
21    3.465736
14    1.100234
7     0.378769
Name: FRAT5, dtype: float64

In [12]:
def outer_product(s1: pd.Series, s2: pd.Series):
    """Compute the outer product of two Series as a DataFrame."""
    return pd.DataFrame(np.outer(s1, s2), index=s1.index, columns=s2.index).rename_axis(
        index=s1.index.name, columns=s2.index.name
    )

### Sellup Probability

The sellup probability starts with a customer who would definitely purchase the bottom 
fare class if it is available.  It then computes the probability that, for any fare 
class that might be the lowest currently available, what is the probability that the
customer would purchase that fare class.  So, for the bottom class, the value is 1.0,
while for progressively higher priced fares the value drops.  This sellup probability
can be computed from the Frat5 value (or other sellup computation) and the list of
all possible fare prices.

In [13]:
def sellup_prob_func(fares, frat_5_curve):
    minimum_fare = fares.min()
    df = outer_product(
        -((fares / minimum_fare) - 1),
        supc(frat_5_curve),
    )
    return np.exp(df)


sellup_prob = sellup_prob_func(fares, frat_5_curve)
sellup_prob

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,0.000308,0.076749,0.413212
Y1,0.0031,0.159818,0.53191
Y2,0.03125,0.332793,0.684704
Y3,0.176777,0.576882,0.827468
Y4,0.561231,0.832458,0.938823
Y5,1.0,1.0,1.0


### Net Sellup Probability

The net sellup probability transforms the sellup probability, expressing the
fraction of customers in any timeframe who would be expected to purchase a 
given fare class but *not* anything more expensive.  Put another way, it is
the probability of losing the sale if you close a particular fare class. 

In [14]:
def net_sellup_prob_func(fares, frat_5_curve):
    """Differences in sellup rates."""
    differences = sellup_prob.diff(axis=0)
    # use fillna, to set the values of the top fare class
    # equal to their gross values (i.e. diff from zero)
    return differences.fillna(sellup_prob)


net_sellup_prob = net_sellup_prob_func(fares, frat_5_curve)
net_sellup_prob

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,0.000308,0.076749,0.413212
Y1,0.002793,0.083068,0.118698
Y2,0.02815,0.172976,0.152794
Y3,0.145527,0.244089,0.142765
Y4,0.384454,0.255576,0.111355
Y5,0.438769,0.167542,0.061177


### KI fare adjustment


In [15]:
def fare_adj_ki_func(sellup_prob, fares, scale_factor=1):
    """Fare adjustment using KI algorithm."""
    df = sellup_prob.mul(fares, axis=0)
    df = df.diff().div(sellup_prob.diff())
    df = df.T.fillna(fares).T
    if scale_factor != 1:
        df = df.mul(scale_factor).add(fares * (1 - scale_factor), axis=0)
    return df


fare_adj_ki = fare_adj_ki_func(sellup_prob, fares)
fare_adj_ki

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,500.0,500.0,500.0
Y1,388.986018,307.607025,51.878172
Y2,288.986018,207.607025,-48.121828
Y3,208.894707,122.744306,-134.702735
Y4,152.009402,62.140631,-196.545717
Y5,118.022407,25.783507,-233.651297


# Q Forecast

The "Q" forecast is a forecast of how much demand would have been observed if the lowest
fareclass was always open.

In [16]:
q_inflation_factors = np.minimum(1.0 / sellup_prob, max_cap)
q_inflation_factors

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,10.0,10.0,2.420065
Y1,10.0,6.257137,1.880018
Y2,10.0,3.004868,1.460486
Y3,5.656854,1.733455,1.208506
Y4,1.781797,1.201262,1.065163
Y5,1.0,1.0,1.0


## Q-Equivalent History

The Q demand history is constructed by inflating priceable sales in each fare class, 
and aggregating the result by summing over all fare classes.


In [17]:
q_history = (
    q_inflation_factors * sales
).groupby("SAMPLE").sum()

q_history

TF,21,14,7
SAMPLE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,7.0,1.201262,1.208506
1,7.0,1.201262,2.417011
2,6.0,1.201262,0.0
3,5.345392,1.201262,1.208506
4,19.0,2.402523,1.880018
5,8.908987,1.201262,1.208506
6,11.0,1.201262,3.625517
7,10.690785,2.402523,2.417011
8,17.0,1.201262,1.208506
9,8.0,1.201262,2.417011


The Q history is detruncated only in instances where the entire PATH is 
sold out (i.e. when the highest fareclass of the path is closed).  That
is assumed not to happen in this example, but if it did happen it would 
happen to `q_history` here.

Next, within each timeframe, we compute the mean and variance of the Q demand.

In [18]:
q_history.mean()

TF
21    10.837389
14     2.047520
7      1.680583
dtype: float64

In [19]:
q_history.var()

TF
21    61.227067
14     3.420050
7      0.805810
dtype: float64

## Q Forecast Partitioned

The mean and variance are partition to the various fareclasses on the path.

In [20]:
q_partitioned_mean_raw = net_sellup_prob * q_history.mean()
q_partitioned_mean_raw

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,0.003334,0.157146,0.694437
Y1,0.030267,0.170084,0.199481
Y2,0.305068,0.354171,0.256783
Y3,1.577129,0.499777,0.239928
Y4,4.166481,0.523297,0.187141
Y5,4.75511,0.343045,0.102813


In [21]:
q_partitioned_var_raw = net_sellup_prob * q_history.var()
q_partitioned_var_raw

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,0.018833,0.262486,0.332971
Y1,0.170995,0.284098,0.095648
Y2,1.723518,0.591586,0.123123
Y3,8.910173,0.834797,0.115041
Y4,23.539011,0.874082,0.089731
Y5,26.864538,0.573001,0.049297


These mean and variance levels are zero'ed out whenever the adjusted fare is negative,
and as appropriate for AP closures.

In [22]:
q_partitioned_mean = q_partitioned_mean_raw.where(fare_adj_ki > 0, 0)
q_partitioned_mean.loc["Y5", 14:] = 0 
q_partitioned_mean.loc["Y4", 7:] = 0 
q_partitioned_mean

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,0.003334,0.157146,0.694437
Y1,0.030267,0.170084,0.199481
Y2,0.305068,0.354171,0.0
Y3,1.577129,0.499777,0.0
Y4,4.166481,0.523297,0.0
Y5,4.75511,0.0,0.0


In [23]:
q_partitioned_var = q_partitioned_var_raw.where(fare_adj_ki > 0, 0)
q_partitioned_var.loc["Y5", 14:] = 0
q_partitioned_var.loc["Y4", 7:] = 0
q_partitioned_var

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,0.018833,0.262486,0.332971
Y1,0.170995,0.284098,0.095648
Y2,1.723518,0.591586,0.0
Y3,8.910173,0.834797,0.0
Y4,23.539011,0.874082,0.0
Y5,26.864538,0.0,0.0


## Total Forecast to Departure

Take the cumulative sum (backward) for the forecast mean and variance 
to get the total forecast mean and variance to departure.

In [24]:
total_forecast_mean_to_departure = (
    q_partitioned_mean.iloc[:, ::-1].cumsum(axis=1).iloc[:, ::-1]
)
total_forecast_mean_to_departure

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,0.854917,0.851583,0.694437
Y1,0.399832,0.369565,0.199481
Y2,0.65924,0.354171,0.0
Y3,2.076907,0.499777,0.0
Y4,4.689778,0.523297,0.0
Y5,4.75511,0.0,0.0


In [25]:
total_forecast_var_to_departure = (
    q_partitioned_var_raw.iloc[:, ::-1].cumsum(axis=1).iloc[:, ::-1]
)
total_forecast_var_to_departure

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,0.61429,0.595457,0.332971
Y1,0.55074,0.379745,0.095648
Y2,2.438227,0.714709,0.123123
Y3,9.860011,0.949838,0.115041
Y4,24.502824,0.963813,0.089731
Y5,27.486836,0.622298,0.049297


In [26]:
total_forecast_stdev_to_departure = np.sqrt(total_forecast_var_to_departure)
total_forecast_stdev_to_departure

TF,21,14,7
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y0,0.783767,0.771658,0.577036
Y1,0.742119,0.616235,0.30927
Y2,1.561482,0.845405,0.350889
Y3,3.140065,0.974596,0.339177
Y4,4.950033,0.98174,0.299551
Y5,5.242789,0.788859,0.222029


# Simulation Study

Here we simulate a choice process that exactly follows the assumptions laid out above, and
measure how well the Q forecast outputs represent the simulated outputs.

In [135]:
tf = 21

In [136]:
mean_arrivals = q_history.mean().loc[tf]
stdev_arrivals = q_history.std().loc[tf]
mean_arrivals, stdev_arrivals

(np.float64(10.837389411695153), np.float64(7.8247726637837856))

In [145]:
prng = np.random.default_rng(42)
random_q_demand = prng.normal(mean_arrivals, stdev_arrivals, size=10_000_000).round().clip(0, None).astype(int)
random_q_demand

array([13,  3, 17, ..., 12,  3, 15], shape=(10000000,))

In [146]:
random_q_demand.mean(), random_q_demand.std()

(np.float64(11.1333477), np.float64(7.273886271512957))

In [138]:
remaining_demand = random_q_demand.copy()
remaining_prob = 1.0
random_Y_demand = {}

for cls in fares.index:
    sup = net_sellup_prob.loc[cls, tf] / remaining_prob
    random_Y_demand[cls] = prng.binomial(remaining_demand, sup)
    remaining_demand -= random_Y_demand[cls]
    remaining_prob -= net_sellup_prob.loc[cls, tf]

In [139]:
pd.concat({"mean": pd.DataFrame(random_Y_demand).mean(), "var":pd.DataFrame(random_Y_demand).var()}, axis=1).rename_axis(index="FARECLASS")

Unnamed: 0_level_0,mean,var
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1
Y0,0.003404,0.003405
Y1,0.031068,0.0314
Y2,0.31333,0.346401
Y3,1.619737,2.504856
Y4,4.280132,10.452813
Y5,4.885677,12.930894


In [140]:
pd.concat({"mean": q_partitioned_mean[tf], "var":q_partitioned_var[tf]}, axis=1)

Unnamed: 0_level_0,mean,var
FARECLASS,Unnamed: 1_level_1,Unnamed: 2_level_1
Y0,0.003334,0.018833
Y1,0.030267,0.170995
Y2,0.305068,1.723518
Y3,1.577129,8.910173
Y4,4.166481,23.539011
Y5,4.75511,26.864538
