# Bike Share Rebalancing With Mathematical Optimization

Bike share systems have become an effective commuting method globally for everyday urban dwellers as well as tourists.

Citi-Bike in NYC being the largest Bike-Sharing network had 1,588 active stations and 25,575 active bikes in July 2022.

Over 3 million rides were completed July 2022 that cover NYC/Hoboken/Jersey City, with around 150,000 active annual members.

During rush hours there are many bike stations that have a high demand for bikes, which means their out-flow of bikes is greater than their in-flow in these stations. 

Meanwhile there are stations that have a high demand for docks (riders return their bikes to these stations) which means their in-flow of bikes is greater than their out-flow.

Lack of available bikes or docks in high-demand stations can cause major imbalance in the bike sharing network and result in customer dissatisfaction and lost revenue.

To tackle this problem, bikes are relocated between stations to create a balance between supply and demand.

## Problem Statement and Solution Approach

Using historical Citi-bike data in NYC and Jersey area during July 2022, we like to know:
- What is the demand for bikes per hour at each station during the first week of August?
- Knowing the demand, how can we minimize loss of sale?

Loss of sale is caused by lack of bikes when customers demand them. So, bikes should be transferred from stations with higher in-flow of bikes to those with higher out-flow of bikes. 

So, first, number of bikes to be added to or removed from each station during each hour should be determined. Then, the physical transfer of bikes between stations should be scheduled. 

In this notebook, we'll focus on the first part and at the end, discuss how the second part can be solved.
We'll use a mixture of Machine Learning (ML) and Mathematical Optimization (MO) to solve this problem. 

**Solution Approach**
The solution approach is comprised of two steps:
- **Step 1**: We use the historical Citi-bike data in NYC and Jersey area during July 2022 and use an ML model to predict the number of in-flow and out-flow of bikes per hour at each station for the first week of August. This is done in [predict_bike_flow](predict_bike_flow.ipynb) Notebook.
- **Step 2**: We use an MO model to decide how many bikes should be added to or removed from each station during each hour so that the total loss of sale is minimized.

To ensure that everyone can run the notebook with the Gurobi restricted license, we reduce the size of the data. To achieve that, we focus on the top 50 stations during the morning rush hours (7 am to 9 am).

The top stations are chosen using the PageRank algorithm.

# Install Required Packages

In [None]:
%pip install gurobipy
%pip install pandas

# For Google Colab Only

If you like to run the notebook in Google Colab, follow these steps:
- Click on [this link](https://colab.research.google.com/github/decision-spot/bike_share/blob/main/bike_rebalancing.ipynb)
This should open up the notebook in Google Colab.
- To get all the files, run the following cells to clone to repo and change the current working directory path.

In [None]:
!git clone https://github.com/decision-spot/bike_share.git

In [None]:
import os
from google.colab import files
os.chdir('bike_share')

# Import Packages

In [1]:
import datetime
import gurobipy as gp
import pandas as pd
from gurobipy import GRB

# Optimization Problem

## Problem Definition

We want to minimize the total loss of sale. Loss of sale at each station and in each hour can be defined as the difference between the total demand of bikes (number of bikes that start their trip from the station) and total supply of bikes.

Total supply is comprised of number of bikes that end their trip at the station plus all the existing bikes at the station (a.k.a inventory) plus number of bikes that are added or removed from that station in that hour through some bike transfers. 

**Assumptions:**
- Inventory at the beginning of first hour (in our case, hour 7) is zero.
- At any given hour, we have access to a limited number of bikes that can be added to the stations in hope of helping reduce the imbalance without yet transferring the bikes between stations (since this analysis is during morning rush hours).

## Load Required Data

In [2]:
stations = pd.read_csv('top_stations.csv', index_col='station')
stations.head()

Unnamed: 0_level_0,capacity,lat,lon,region
station,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Cleveland Pl & Spring St,33,40.722104,-73.997249,71
Grand Army Plaza & Central Park S,96,40.764397,-73.973715,71
Broadway & E 14 St,114,40.734546,-73.990741,71
Lafayette St & E 8 St,47,40.730207,-73.991026,71
Norfolk St & Broome St,59,40.717227,-73.988021,71


In [3]:
stations_flow = pd.read_csv('stations_flow.csv')
stations_flow['datetime'] = pd.to_datetime(stations_flow['datetime'])
stations_flow.head()

Unnamed: 0,station,datetime,start_forecast,end_forecast
0,1 Ave & E 62 St,2022-08-01 00:00:00,4.0,5.0
1,1 Ave & E 62 St,2022-08-01 01:00:00,2.0,3.0
2,1 Ave & E 62 St,2022-08-01 02:00:00,4.0,2.0
3,1 Ave & E 62 St,2022-08-01 03:00:00,2.0,2.0
4,1 Ave & E 62 St,2022-08-01 04:00:00,2.0,2.0


The `stations_flow` data contains the prediction for the first 5 days of August 2022 and during all the hours. Our analysis is for morning rush hours, between 7 to 9 am. Also, we can run our MO model daily. For now, we only focus on the first day but at the end, we will show the full model and how it can be run daily.

In [4]:
# Pandas will give a few SettingWithCopyWarning here when new columns are created. 
# They are false alarms. So, we suppress them.
pd.options.mode.chained_assignment = None
morning_flow = stations_flow[stations_flow['datetime'].dt.hour.between(7, 9)]
morning_flow['date'] = morning_flow['datetime'].dt.date
morning_flow['time'] = morning_flow['datetime'].dt.hour
# For now, let's run the MO model for the first date: 08/01/2022
flow_df = morning_flow.loc[morning_flow['date'] == datetime.date(2022, 8, 1)]
flow_df.set_index(['station', 'time'], inplace=True)
flow_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,datetime,start_forecast,end_forecast,date
station,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1 Ave & E 62 St,7,2022-08-01 07:00:00,11.0,10.0,2022-08-01
1 Ave & E 62 St,8,2022-08-01 08:00:00,14.0,14.0,2022-08-01
1 Ave & E 62 St,9,2022-08-01 09:00:00,17.0,20.0,2022-08-01
1 Ave & E 68 St,7,2022-08-01 07:00:00,17.0,32.0,2022-08-01
1 Ave & E 68 St,8,2022-08-01 08:00:00,22.0,41.0,2022-08-01


## Problem Formulation

Let's define our notations for the MO model. We want to run the model for every hour and every station. So, let's define the following two sets:

**Sets**
- $I\quad$: Set of stations
- $T\quad$: Set of hours 

We also have the following information from `stations` and `flow_df` dataframes:

**Parameters**
- $e_{i,t}\quad$: number of bikes that end their trip at station $i$ at hour $t$ (a.k.a. supply)
- $s_{i,t}\quad$: number of bikes that start their trip at station $i$ at hour $t$ (a.k.a. demand)
- $c_{i}\quad$: capacity of station $i$

We know it's not easy to transfer bikes between stations during rush hours and heavy traffic. To mitigate loss due to unavailable bikes at high-demand stations, we assume there is a small reserve of bikes available at the beginning of an hour that we can allocate to the stations. We show that with $N$: 
- $N\quad$: Number of bikes at hand that we can assign to stations at a given hour

In [5]:
num_bikes = 25  # N: Number of bikes at hand that we can assign to stations at a given hour
station_names = list(stations.index)  # set I
time_rng = morning_flow['time'].drop_duplicates().values  # set T

In [6]:
station_time = flow_df.index  # pair of (i,t) index
start_forecast = flow_df.start_forecast  # s
end_forecast = flow_df.end_forecast  # e
capacity = stations.capacity  # c

## Decision Variables

First of all, we like to find out how many bikes should be added to or removed from each station during each hour. We show number of bikes added with $y_{i,t}$ and those removed with $z_{i,t}$:
- $y_{i,t}\quad$: number of bikes to be added to station $i$ at hour $t$
- $z_{i,t}\quad$: number of bikes to be removed from station $i$ at hour $t$

Another variable is the inventory of bikes at each station and at the beginning of each hour. Since the inventory depends on the value of $y_{i,t}$ and $z_{i,t}$, it's another decision variable. We show that with $q_{i,t}$:
- $q_{i,t}\quad$: inventory of bikes at station $i$ at the beginning of hour $t$

For simplicity, we assume that there is no inventory at the beginning of the first hour. In our case, this means no initial inventory at 7 am.

The goal of the model is to reduce the lost sale at each station per hour. This value, also depends on the value of decision variables $y_{i,t}$ and $z_{i,t}$ and as a result, is a decision variable itself. We show that with $l_{i,t}$:
- $l_{i,t}\quad$: lost sale at station $i$ at hour $t$

With these initial decision variables, we can start the MO model:

In [8]:
mdl = gp.Model('bike_rebalancing')
# Variables

y = mdl.addVars(station_time, lb=0, vtype=GRB.CONTINUOUS, name='y')
# y = mdl.addVars(station_names, time_rng, lb=0, vtype=GRB.CONTINUOUS, name='y')  # alternatively
z = mdl.addVars(station_time, lb=0, vtype=GRB.CONTINUOUS, name='z')
q = mdl.addVars(station_time, lb=0, vtype=GRB.CONTINUOUS, name='q')
l = mdl.addVars(station_time, lb=0, vtype=GRB.CONTINUOUS, name='l')

## Constraints

First, we set up lowerbound and upperbound values for the number of bikes that can be added to or removed from a station in each hour.

In each hour, we have $s_{i,t}$ bikes that start their trip from the station and $e_{i,t}$ bikes that end their trip at the station. If $s_{i,t} \ge e_{i,t}$, demand is exceeding supply. In this case, we may choose to add some bikes to this station to reduce the loss (we can also add nothing). One thing to remember is that we cannot add more bikes than the station's capacity.

On the other hand, if $s_{i,t} \le e_{i,t}$, we may choose to remove some of the excess bikes from that station (we can also remove nothing). However, if $e_{i,t} \ge s_{i,t} + c_i$, then we'll have more bikes arriving to the station than even the station capacity. In that case, we must remove some bikes to avoid overflow.

The above descriptions, help us to define the bounds on $y_{i,t}$ and $z_{i,t}$.
For adding bikes, the bounds are:

\begin{align}
&0 \le y_{i,t} \le c_i &\quad \forall i \in I, t \in T \tag{1}\\
\end{align}

For removing bikes, the upperbound is:

\begin{align}
&z_{i,t} \le \max(0, e_{i,t} - s_{i,t}) &\quad \forall i \in I, t \in T \tag{2}\\
\end{align}

And the lowerbound is:

\begin{align}
&z_{i,t} \ge  e_{i,t} - s_{i,t} - c_i &\quad \forall i \in I, t \in T \tag{3}\\
\end{align}

In [9]:
# UB of y
mdl.addConstrs((y[i, t] <= capacity[i] for i, t in station_time), 'ub_y')

# LB of z
mdl.addConstrs(
    (z[i, t] >= (end_forecast[i, t] - start_forecast[i, t] - capacity[i])
     for i, t in station_time), 'lb_z')
#     for i in station_names for t in time_rng), 'lb_z')

# UB of z
mdl.addConstrs(
        (z[i, t] <= max(0, end_forecast[i, t] - start_forecast[i, t])
         for i, t in station_time), 'ub_z');  # add ";" here to stop the cell to print the constraints

Next, we set up the initial inventory (i.e. inventory at hour $t_0$) to be 0. 

\begin{align}
&q_{i,t_0} = 0 &\quad \forall i \in I \tag{4}\\
\end{align}

In [10]:
t0 = 7
# setting up initial inventory
mdl.addConstrs((q[i, t0] == 0 for i in stations.index), name='initial_inv');

Our next constraint is the definition of the inventory at a station at the beginning of an hour.
The inventory is defined as the difference between all the bikes that are added to a station minus all the bikes that are removed. Since it's possible that demand of bikes at a station exceeds the total supply of bikes (which will cause loss of sale), we need to ensure that inventory at a station is a non-negative number.

\begin{align}
&q_{i,t} = \max(0, e_{i,t-1} + y_{i,t-1} + q_{i,t-1} - s_{i,t-1} - z_{i,t-1}) &\quad \forall i \in I, t \in T/t_0 \tag{5}\\
\end{align}

To write this constraint in Gurobi, we can take advantage of Gurobi's [`max_()`](https://www.gurobi.com/documentation/9.5/refman/py_max_.html) function. This function accepts a list of decision variables, and if desired, a constant. Our constant is 0. But $e_{i,t-1} + y_{i,t-1} + q_{i,t-1} - s_{i,t-1} - z_{i,t-1}$ is a linear expression and not a decision variable. However, the fix is very simple. We can define an auxiliary variable that is equal to the linear expression and then use that auxiliary variable in the inventory's definition. In other words, we first create $a_{i,t}$ as a new decision variable:

- $a_{i,t}\quad$: auxiliary variable

Then, we define $a_{i,t}$ as:

\begin{align}
&a_{i,t} = e_{i,t-1} + y_{i,t-1} + q_{i,t-1} - s_{i,t-1} - z_{i,t-1} &\quad \forall i \in I, t \in T/t_0 \tag{6}\\
\end{align}

Equation 5 can then be simplified to:

\begin{align}
&q_{i,t} = \max(0, a_{i,t}) &\quad \forall i \in I, t \in T/t_0 \tag{7}\\
\end{align}

We replace equation 5 with equations 6 and 7.

In [11]:
a = mdl.addVars(station_time, lb=-GRB.INFINITY, vtype=GRB.CONTINUOUS, name='a')  # auxiliary variable
# defintion of auxiliary variable
mdl.addConstrs((a[i, t] == end_forecast[i, t - 1] + y[i, t - 1] + q[i, t - 1]
                - start_forecast[i, t - 1] - z[i, t - 1]
                for i, t in station_time if t != t0), name='aux_def')
# definition of inventory
mdl.addConstrs((q[i, t] == gp.max_(a[i, t], 0) for i, t in station_time if t!=t0), name='inv_def');

The next constraint is an upperbound on the inventory at each station. We need to ensure the inventory does not exceed the capacity of the station.

\begin{align}
&q_{i,t} \le c_i &\quad \forall i \in I, t \in T \tag{8}\\
\end{align}

In [12]:
# UB of inventory
mdl.addConstrs((q[i, t] <= capacity[i] for i, t in station_time), 'ub_inv');

Next, we need to define how the loss of sale is calculated. Loss of sale is the difference between the demand and all the supply of bikes at a station. 
- Demand of bikes are all the bikes that leave a station, in any shape or form. So, what are the demands?
- Supply of bikes are all the bikes that arrive at a station, in any shape or form. So, what are the supplies?

Of course, if the supply is greater than the demand, there is no loss. So, we need to ensure that loss only considers non-negative values. This can be achieved by:

\begin{align}
&l_{i,t} = \max(0, s_{i,t} + z_{i,t} - e_{i,t} - y_{i,t} - q_{i,t}) &\quad \forall i \in I, t \in T \tag{9}\\
\end{align}

In [13]:
# loss definition
mdl.addConstrs(
    (l[i, t] >= start_forecast[i, t] + z[i, t]
     - end_forecast[i, t] - y[i, t] - q[i, t]
     for i, t in station_time), 'loss_def');

We assumed that we have a small reserve of bikes at the beginning of each hour to allocate to stations. This limit is on the total number of bikes added to the stations.

\begin{align}
&\sum_{i} y_{i,t} \le N &\quad \forall t \in T \tag{10}\\
\end{align}

In [14]:
# limit on number of bikes added
mdl.addConstrs((y.sum('*', t) <= num_bikes for t in time_rng), name='total_bikes');

## Objective

The objective is to minimize total loss of sale. 

$$\min \sum_{i,t} l_{i,t}$$

In [15]:
mdl.setObjective(l.sum(), GRB.MINIMIZE)

We can now tell Gurobi that the model is complete and it can solve the problem.

In [16]:
mdl.optimize()

Gurobi Optimizer version 9.5.1 build v9.5.1rc2 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 813 rows, 675 columns and 1620 nonzeros
Model fingerprint: 0xecda0867
Model has 90 general constraints
Variable types: 675 continuous, 0 integer (0 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+02]
Presolve removed 750 rows and 580 columns
Presolve time: 0.00s
Presolved: 63 rows, 95 columns, 201 nonzeros
Variable types: 81 continuous, 14 integer (14 binary)
Found heuristic solution: objective 40.0000000

Root relaxation: objective 1.368348e+01, 62 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   13.68348    0    9   40.00000   13.68348  65.8%     -    0s
     0 

## Post Processing

In [17]:
df = pd.DataFrame()
if mdl.status == GRB.Status.OPTIMAL:
    df = flow_df.copy()
    df = df.merge(stations[['capacity']], left_on='station', right_index=True)
    df[['bikes_added', 'bikes_removed', 'loss_sale', 'beginning_inventory']] = 0
    for k, v in y.items():
        df.loc[k, 'bikes_added'] = v.x
        df.loc[k, 'bikes_removed'] = z[k].x
        df.loc[k, 'beginning_inventory'] = q[k].x
        df.loc[k, 'loss_sale'] = l[k].x
    df.reset_index(inplace=True)
    print(f'Total Loss : {mdl.objVal}')
    total_bikes_added = df.groupby('time')['bikes_added'].sum()
    print(f'Total number of bikes added in each hour:\n {total_bikes_added}')
else:
    print('Could not find a feasible solution!')
df.head(10)

Total Loss : 40.0
Total number of bikes added in each hour:
 time
7    25
8    25
9    25
Name: bikes_added, dtype: int64


Unnamed: 0,station,time,datetime,start_forecast,end_forecast,date,capacity,bikes_added,bikes_removed,loss_sale,beginning_inventory
0,1 Ave & E 62 St,7,2022-08-01 07:00:00,11.0,10.0,2022-08-01,45,0,0,1,0
1,1 Ave & E 62 St,8,2022-08-01 08:00:00,14.0,14.0,2022-08-01,45,0,0,0,0
2,1 Ave & E 62 St,9,2022-08-01 09:00:00,17.0,20.0,2022-08-01,45,0,0,0,0
3,1 Ave & E 68 St,7,2022-08-01 07:00:00,17.0,32.0,2022-08-01,62,0,15,0,0
4,1 Ave & E 68 St,8,2022-08-01 08:00:00,22.0,41.0,2022-08-01,62,0,19,0,0
5,1 Ave & E 68 St,9,2022-08-01 09:00:00,28.0,48.0,2022-08-01,62,0,0,0,0
6,1 Ave & E 78 St,7,2022-08-01 07:00:00,9.0,7.0,2022-08-01,57,7,0,0,0
7,1 Ave & E 78 St,8,2022-08-01 08:00:00,18.0,5.0,2022-08-01,57,8,0,0,5
8,1 Ave & E 78 St,9,2022-08-01 09:00:00,18.0,7.0,2022-08-01,57,11,0,0,0
9,10 Ave & W 14 St,7,2022-08-01 07:00:00,3.0,5.0,2022-08-01,55,0,2,0,0


## Putting it all together

**Sets**
- $I\quad$: Set of stations
- $T\quad$: Set of hours 

**Parameters**
- $e_{i,t}\quad$: number of bikes that end their trip at station $i$ at hour $t$ (a.k.a. supply)
- $s_{i,t}\quad$: number of bikes that start their trip at station $i$ at hour $t$ (a.k.a. demand)
- $c_{i}\quad$: capacity of station $i$
- $N\quad$: Number of bikes at hand that we can assign to different stations at a given hour

**Variables**
- $y_{i,t}\quad$: number of bikes to be added to station $i$ at hour $t$
- $z_{i,t}\quad$: number of bikes to be removed from station $i$ at hour $t$
- $q_{i,t}\quad$: inventory of bikes at station $i$ at the beginning of hour $t$
- $l_{i,t}\quad$: lost sale at station $i$ at hour $t$
- $a_{i,t}\quad$: auxiliary variable needed to define inventory's relationship with other variables. We use this in the code

\begin{align}
&\min \sum_{i,t} l_{i,t}&\\
\mbox{s.t: }\\
&y_{i,t} \le c_i &\quad \forall i \in I, t \in T \tag{1}\\
&z_{i,t} \ge \max(0, e_{i,t} - s_{i,t}) &\quad \forall i \in I, t \in T \tag{2}\\
&z_{i,t} \le  e_{i,t} - s_{i,t} - c_i &\quad \forall i \in I, t \in T \tag{3}\\
&q_{i,t_0} = 0 &\quad \forall i \in I \tag{4}\\
&a_{i,t} = e_{i,t-1} + y_{i,t-1} + q_{i,t-1} - s_{i,t-1} - z_{i,t-1} &\quad \forall i \in I, t \in T/t_0 \tag{5}\\
&q_{i,t} = \max(0, a_{i,t}) &\quad \forall i \in I, t \in T/t_0 \tag{6}\\
&q_{i,t} \le c_i &\quad \forall i \in I, t \in T \tag{7}\\
&l_{i,t} = \max(0, s_{i,t} + z_{i,t} - e_{i,t} - y_{i,t} - q_{i,t}) &\quad \forall i \in I, t \in T \tag{8}\\
&\sum_{i} y_{i,t} \le N &\quad \forall t \in T \tag{9}\\
&y_{i,t}, z_{i,t}, q_{i,t}, l_{i,t} \ge0 &\quad \forall i \in I, t \in T \tag{10}\\
&a_{i,t} \quad \mbox{urs} &\quad \forall i \in I, t \in T \tag{11}\\
\end{align}

In [18]:
def bike_rebalancing(flow_df, num_bikes):
    station_time = flow_df.index  # pair of (i,t) index
    start_forecast = flow_df.start_forecast  # s
    end_forecast = flow_df.end_forecast  # e
    capacity = stations.capacity  # c
    mdl = gp.Model('bike_rebalancing')

    # Variables
    y = mdl.addVars(station_time, lb=0, vtype=GRB.CONTINUOUS, name='y')
    z = mdl.addVars(station_time, lb=0, vtype=GRB.CONTINUOUS, name='z')
    q = mdl.addVars(station_time, lb=0, vtype=GRB.CONTINUOUS, name='q')
    l = mdl.addVars(station_time, lb=0, vtype=GRB.CONTINUOUS, name='l')
    a = mdl.addVars(station_time, lb=-GRB.INFINITY, vtype=GRB.CONTINUOUS, name='a')

    # Constraints
    mdl.addConstrs((y[i, t] <= capacity[i] for i, t in station_time), 'ub_y')

    mdl.addConstrs(
        (z[i, t] >= (end_forecast[i, t] - start_forecast[i, t] - capacity[i])
         for i, t in station_time), 'lb_z')
    
    mdl.addConstrs(
        (z[i, t] <= max(0, end_forecast[i, t] - start_forecast[i, t])
         for i, t in station_time), 'ub_z')

    t0 = 7
    mdl.addConstrs((q[i, t0] == 0 for i in stations.index), name='initial_inv')

    mdl.addConstrs((a[i, t] == end_forecast[i, t - 1] + y[i, t - 1] + q[i, t - 1]
                    - start_forecast[i, t - 1] - z[i, t - 1]
                    for i, t in station_time if t != t0), name='aux_def')

    mdl.addConstrs((q[i, t] == gp.max_(a[i, t], 0) for i, t in station_time if t != t0), name='inv_def')

    mdl.addConstrs((q[i, t] <= capacity[i] for i, t in station_time), 'ub_inv')

    mdl.addConstrs((l[i, t] >= start_forecast[i, t] + z[i, t]
                    - end_forecast[i, t] - y[i, t] - q[i, t] for i, t in station_time), 'loss_def')

    mdl.addConstrs((y.sum('*', t) <= num_bikes for t in time_rng), name='total_bikes')

    # Objectives
    mdl.setObjective(l.sum(), GRB.MINIMIZE)
    mdl.optimize()

    # create output
    df = pd.DataFrame()
    if mdl.status == GRB.Status.OPTIMAL:
        df = flow_df.copy()
        df = df.merge(stations[['capacity']], left_on='station', right_index=True)
        df[['bikes_added', 'bikes_removed', 'loss_sale', 'beginning_inventory']] = 0
        for k, v in y.items():
            df.loc[k, 'bikes_added'] = v.x
            df.loc[k, 'bikes_removed'] = z[k].x
            df.loc[k, 'beginning_inventory'] = q[k].x
            df.loc[k, 'loss_sale'] = l[k].x
        df.reset_index(inplace=True)
        print(f'Total Loss : {mdl.objVal}')
        total_bikes_added = df.groupby('time')['bikes_added'].sum()
        print(f'Total number of bikes added in each hour:\n {total_bikes_added}')
    else:
        print('Could not find a feasible solution!')
    return df

In [19]:
# run the model daily
all_outputs = []
g = morning_flow.set_index(['station', 'time']).groupby('date')
for x in g.groups:
    df = g.get_group(x)
    odf = bike_rebalancing(df, num_bikes)
    all_outputs.append(odf)
output_df = pd.concat(all_outputs)

Gurobi Optimizer version 9.5.1 build v9.5.1rc2 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 813 rows, 675 columns and 1620 nonzeros
Model fingerprint: 0xecda0867
Model has 90 general constraints
Variable types: 675 continuous, 0 integer (0 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+02]
Presolve removed 750 rows and 580 columns
Presolve time: 0.00s
Presolved: 63 rows, 95 columns, 201 nonzeros
Variable types: 81 continuous, 14 integer (14 binary)
Found heuristic solution: objective 40.0000000

Root relaxation: objective 1.368348e+01, 62 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   13.68348    0    9   40.00000   13.68348  65.8%     -    0s
     0 

In [20]:
output_df.head(10)

Unnamed: 0,station,time,datetime,start_forecast,end_forecast,date,capacity,bikes_added,bikes_removed,loss_sale,beginning_inventory
0,1 Ave & E 62 St,7,2022-08-01 07:00:00,11.0,10.0,2022-08-01,45,0.0,0,1.0,0.0
1,1 Ave & E 62 St,8,2022-08-01 08:00:00,14.0,14.0,2022-08-01,45,0.0,0,0.0,0.0
2,1 Ave & E 62 St,9,2022-08-01 09:00:00,17.0,20.0,2022-08-01,45,0.0,0,0.0,0.0
3,1 Ave & E 68 St,7,2022-08-01 07:00:00,17.0,32.0,2022-08-01,62,0.0,15,0.0,0.0
4,1 Ave & E 68 St,8,2022-08-01 08:00:00,22.0,41.0,2022-08-01,62,0.0,19,0.0,0.0
5,1 Ave & E 68 St,9,2022-08-01 09:00:00,28.0,48.0,2022-08-01,62,0.0,0,0.0,0.0
6,1 Ave & E 78 St,7,2022-08-01 07:00:00,9.0,7.0,2022-08-01,57,7.0,0,0.0,0.0
7,1 Ave & E 78 St,8,2022-08-01 08:00:00,18.0,5.0,2022-08-01,57,8.0,0,0.0,5.0
8,1 Ave & E 78 St,9,2022-08-01 09:00:00,18.0,7.0,2022-08-01,57,11.0,0,0.0,0.0
9,10 Ave & W 14 St,7,2022-08-01 07:00:00,3.0,5.0,2022-08-01,55,0.0,2,0.0,0.0


# Model Enhancement

Looking at the `output_df`, you may notice stations that see a transfer of bikes (addition or removal) at every hour. Even worse are those stations where some bikes are removed from them in one hour but then bikes are added to them in the next hour. We know these additions/removals consume time and money. What can we do to avoid this situation?

One way to formulate this is to introduce a fixed cost for the use of the truck (or the bicycle trailer) that transfers the bikes and then add this term to the objective function. With a cost associated with the transfer, the model is incentivized to use fewer number of transfers.

So first, we should calculate the number of times a transfer has occurred. Any time bikes are added to a station or removed from a station, a transfer has happened. So, we need a way to link addition of bikes and removal of bikes to a transfer. One way to achieve this is to introduce two new binary variables as follows:

- $x_{i,t}\quad$: 1 if any bike is added to station $i$ at hour $t$; 0 otherwise
- $w_{i,t}\quad$: 1 if any bike is removed from station $i$ at hour $t$; 0 otherwise

Next, we need to establish the relationship between $y_{i,t}$ with $x_{i,t}$, and $z_{i,t}$ with $w_{i,t}$. Basically, we want to say:
if $y_{i,t} \ge 0$, then $x_{i,t} = 1$ and if $z_{i,t} \ge 0$, then $w_{i,t} = 1$. 

We introduce the following two constraints:

\begin{align}
&y_{i,t} \le M x_{i,t} &\quad \forall i \in I, t \in T \tag{11}\\
&z_{i,t} \le M w_{i,t} &\quad \forall i \in I, t \in T \tag{12}\\
\end{align}

where $M$ is a large number.

Constraint 11 ensures that if $y_{i,t}\ge 0$, then $x_{i,t} = 1$. But by itself, this constraint cannot make $x_{i,t} =0$ if $y_{i,t} \le 0$. The same is true with constraint 12. It ensures that $z_{i,t} \ge 0$ make $w_{i,t} = 1$. However, it cannot force $w_{i,t} = 0$ if $z_{i,t} \ge 0$.

This can be achieved by the objective function.

Total number of transfers is equal to sum of $x_{i,t}$ and $w_{i,t}$ and our goal is to minimize number of transfers. So, we add these terms to the objective function. In other words, our new objective function is:

$$\min \sum_{i,t} (l_{i,t} + x_{i,t} + w_{i,t})$$

Since minimizing total transfers is desired, the objective function tries to make both $x_{i,t}$ and $w_{i,t}$ as small as possible (or zero, in this case). Along with constraints 11 and 12, this means that for cases where $x_{i,t}$ and $w_{i,t}$ can take either 0 or 1, objective function forces them to get a value of 0. Moreover, since any extra transfer causes either $x_{i,t}$ or $w_{i,t}$ to be 1, the model is incentivized to have fewer transfers in order to minimize the objective function.

Of course, you can make this model even more generic by having a coefficient for each term in the objective function (think of them as cost).

In [21]:
x = mdl.addVars(station_time, vtype=GRB.BINARY, name='x')  # 1 if y_{i,t} >= 0 
w = mdl.addVars(station_time, vtype=GRB.BINARY, name='w')  # 1 if z_{i,t} >= 0

big_m = 1000  # large number
# relation between y and x
mdl.addConstrs((y[i, t] <= big_m * x[i, t] for i, t in station_time), 'rel_y_x')
# relation between z and w
mdl.addConstrs((z[i, t] <= big_m * w[i, t] for i, t in station_time), 'rel_z_w')

# new objective
obj = l.sum() + (x.sum() + w.sum())
mdl.setObjective(obj, GRB.MINIMIZE)
mdl.optimize()

Gurobi Optimizer version 9.5.1 build v9.5.1rc2 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 1083 rows, 945 columns and 2160 nonzeros
Model fingerprint: 0x361303a6
Model has 90 general constraints
Variable types: 675 continuous, 270 integer (270 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+03]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+02]

MIP start from previous solve produced solution with objective 52 (0.01s)
MIP start from previous solve produced solution with objective 49 (0.03s)
Loaded MIP start from previous solve with objective 49

Presolve removed 974 rows and 789 columns
Presolve time: 0.00s
Presolved: 109 rows, 156 columns, 320 nonzeros
Variable types: 99 continuous, 57 integer (57 binary)

Root relaxation: objective 1.700045e+01, 114 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |    

## Post Processing

In [22]:
df = pd.DataFrame()
if mdl.status == GRB.Status.OPTIMAL:
    df = flow_df.copy()
    df = df.merge(stations[['capacity']], left_on='station', right_index=True)
    df[['bikes_added', 'bikes_removed', 'loss_sale', 'beginning_inventory']] = 0
    for k, v in y.items():
        df.loc[k, 'bikes_added'] = v.x
        df.loc[k, 'bikes_removed'] = z[k].x
        df.loc[k, 'beginning_inventory'] = q[k].x
        df.loc[k, 'loss_sale'] = l[k].x
    df.reset_index(inplace=True)
    print(f'Total Loss : {mdl.objVal}')
    total_bikes_added = df.groupby('time')['bikes_added'].sum()
    print(f'Total number of bikes added in each hour:\n {total_bikes_added}')
else:
    print('Could not find a feasible solution!')
df.head(10)

Total Loss : 49.0
Total number of bikes added in each hour:
 time
7    25
8    25
9    25
Name: bikes_added, dtype: int64


Unnamed: 0,station,time,datetime,start_forecast,end_forecast,date,capacity,bikes_added,bikes_removed,loss_sale,beginning_inventory
0,1 Ave & E 62 St,7,2022-08-01 07:00:00,11.0,10.0,2022-08-01,45,0,0,1.0,0
1,1 Ave & E 62 St,8,2022-08-01 08:00:00,14.0,14.0,2022-08-01,45,0,0,0.0,0
2,1 Ave & E 62 St,9,2022-08-01 09:00:00,17.0,20.0,2022-08-01,45,0,0,0.0,0
3,1 Ave & E 68 St,7,2022-08-01 07:00:00,17.0,32.0,2022-08-01,62,0,0,0.0,0
4,1 Ave & E 68 St,8,2022-08-01 08:00:00,22.0,41.0,2022-08-01,62,0,0,0.0,15
5,1 Ave & E 68 St,9,2022-08-01 09:00:00,28.0,48.0,2022-08-01,62,0,0,0.0,34
6,1 Ave & E 78 St,7,2022-08-01 07:00:00,9.0,7.0,2022-08-01,57,0,0,2.0,0
7,1 Ave & E 78 St,8,2022-08-01 08:00:00,18.0,5.0,2022-08-01,57,18,0,0.0,0
8,1 Ave & E 78 St,9,2022-08-01 09:00:00,18.0,7.0,2022-08-01,57,6,0,0.0,5
9,10 Ave & W 14 St,7,2022-08-01 07:00:00,3.0,5.0,2022-08-01,55,0,0,0.0,0


# Extra

## Scenario Analysis

An important requirements in many MO problems is scenario analysis or what-if analysis. 
Generally speaking, in what-if analysis, we're interested in knowing how the solution changes under various scenarios. 
Think about our case here.
- How does the solution change if the number of reserved bikes increase or decrease by 10%?
- In the enhanced model, what happens if the cost of lost sale change? How about the cost of visiting a location for adding or removing the bikes?
- What if a new station is added close to the busiest station?
- What if we want to ensure that every station have at least 2 available bikes at the beginning of each hour?

Our model here is still a simple model. But you can imagine the value that the scenario analysis can provide. It can enable you to answer many questions by creating and comparing different scenarios and evaluating their outcomes, so that you can assess their impacts on the business goals. To learn more, you can check this [multi-scenario example](https://www.gurobi.com/documentation/9.5/examples/multiscenario_py.html) from Gurobi.

## How this problem is solved in reality?

After knowing how many bikes needed in each station, the bikes need to be physically moved from one station to another. 

During rush hours, the traffic is already heavy. So, bicycle trailers (that can usually hold 5 bicycles) are used to move bikes around.

During lighter hours (mainly at night), the bikes are transferred using trucks.

In either case, some bikes should be removed from stations where there are more in-flow of bikes and should be transferred to stations where there are more out-flow of bikes to balance them out. 
This problem, where trucks need to go from one station to another and either pick up bikes or deliver them, is itself another mathematical optimization problem. 

In this problem, we need to ensure that all the stations that have a pickup or delivery, are visited during a certain time window and the goal can be to do this with minimum number of trucks or minimum transportation cost (for example, fuel cost plus the cost for using the truck). This problem is a variation of the famous Vehicle Routing Problem (VRP) with pickup and delivery.
To learn more, check out [this webinar](https://www.gurobi.com/resource/how-to-synchronize-complex-routing-operations-synched-vrps-with-gurobi/)