## Demonstrating a capacitated p-median location-allocation model

This is a Juypter Notebook demonstrating the newly developled capacitated p-median location-allocation model using sample data from UCL IOE for teacher student placements as school. For more details on the background, please see associated [paper](agile-short-paper/short-paper.pdf) presented at AGILE 2023. 

Demonstration input and output data have been provided. See information below for more detais. 

### Install code to bring in dependencies

Run `python -m pip install -e .` in a terminal to install dependicies. 
f you are running this a Juypter notebook, remember to run this in the terminal inside the correct environemnt / notebook, and then restart the Kernel.

### Import packages

In [None]:
from pathlib import Path

import numpy as np
import pandas as pd
import pulp
import spopt
from spopt.locate import PMedian

from scripts import create_allocation_map

### Import data

In [None]:
_file_location = Path().resolve()

In [None]:
students_df = pd.read_csv(_file_location / "data" / "example_subject_students.csv")
schools_df = pd.read_csv(_file_location / "data" / "example_subject_schools.csv")

Do pre-processing from <https://github.com/UCL/ioe-student-school-allocation/>.

By default, this can be skipped, and you can read in the provided example data (`data/example_subject_student_school_journeys.csv`). 

If you are using a Juypter Notebook, remember to run the install code in the terminal in the Juypter Notebook environment. 

Install using `pip`:

```python
python -m pip install --upgrade pip
python -m pip install -e .
```

You will also need to add a TfL API key, available from 
<https://api-portal.tfl.gov.uk/>. It is set in `.envrc_sample` `export TFL_APP_KEY=`

What you should do is 
```sh
cp .envrc_sample .envrc
```
Then put in the key. Then run
```sh
source .envrc
```
Then re-run. You can check if it’s worked by running
`echo $TFL_APP_KEY`. 

Use the same approach to add in `OPENROUTESERVICE_API_KEY`, from https://openrouteservice.org/. 

For this example, we then also need to remove the `OPENROUTESERVICE_BASE_URL` by running
`unset OPENROUTESERVICE_BASE_URL`
and set the number of cores by running:
`export N_CORES=1`.

Run using
```sh
tfl example_subject
```

Running time
- 25 min on 1 core for 10 students, 70 schools, 3 failures
- 14 min on 4 cores for 20 students, 70 schools
- 12 min on 8 cores for 20 students, 70 schools, 10 failures
- 6 min on 16 cores for 19 students, 70 schools, 2 failures

If this works, it will create a file `data/example_subject_student_school_journeys.csv`. If it doesn't work, you can use the sample data available in the same location. 

### Read in journy data

In [None]:
example_subject_time = pd.read_csv(
    _file_location / "data" / "example_subject_student_school_journeys.csv"
)

A large value to fix optimisation

In [None]:
LARGE_VALUE_PLACEHOLDER = 10_000

Create pivot table from data

In [None]:
example_subject_time_table = (
    example_subject_time.pivot_table(
        columns="school",
        fill_value=LARGE_VALUE_PLACEHOLDER,
        index="student",
        sort=False,
        values="time",
    )
    .astype(int)
    .values
)

In [None]:
print(example_subject_time_table)

Clean data for the model

Define the function to clean school and student dataframe
to only keep the students and schools which have successful journeys

In [None]:
def data_clean(
    df: pd.DataFrame, id_col: str, time_col: str, time: pd.DataFrame
) -> pd.DataFrame:
    ids_to_remove = set(df[id_col]) - set(time[time_col])
    mask = ~df[id_col].isin(ids_to_remove)
    return df[mask].reset_index().drop("index", axis=1)

In [None]:
schools_df_clean = data_clean(
    schools_df, "SE2 PP: Code", "school", example_subject_time
)
students_df_clean = data_clean(students_df, "ST: ID", "student", example_subject_time)

Check if data is okay

In [None]:
assert len(schools_df_clean) == len(example_subject_time_table[0])
assert len(students_df_clean) == len(example_subject_time_table)

`spopt` version. Currently (as at 06/06/2023) the new code developed for `spopt` is not integrated in to the main `spopt` version. There is currently a PR in progress for this at https://github.com/pysal/spopt/pull/374. 

The version we need to use should be `0.1.dev975+g1e3c727` or similar (the `0.1.dev` bit is key).

In [None]:
print(spopt.__version__)

If the above reports `0.5.0` then this is the main `spopt` package without the new capacitated p-median options. In this case you need to install the new version manually using:
`python3 -m pip install spopt@git+https://github.com/rongboxu/spopt`
If you are running this a Juypter notebook, remember to run this in the terminal inside the correct environemnt / notebook, and then restart the Kernel

Data preparing

a. set the amount of each demand point: in IOE case, it is 1.

In [None]:
demand = np.ones(len(students_df_clean))

b. Pick out predefined facilities: priority 1 schools
please notice that the column name of priority can vary, for maths it's 'MAT priority' etc.

In [None]:
schools_priority_1 = schools_df_clean[
    schools_df_clean["MAT priority"] == 1
].index.tolist()
schools_priority_1_arr = np.array(schools_priority_1)

c. set the facility capacities

In [None]:
capacities_arr = np.array(schools_df_clean["Count"])

Run the model


If you get this error:
>Problem is infeasible. The predefined facilities can't be 
>fulfilled, because their capacity is larger than the total 
>demand 10.0.
This is because you have more priority schools (priority = 1 or 2? need to check)
than you do students who need placements. You need more schools, or fewer students. 
                        
The `fulfill_predefined_fac` must be true, it is used to guarantee priority 1
schools will be fulfilled

In [None]:
#solver = pulp.PULP_CBC_CMD()
solver = pulp.COIN_CMD()
pmedian_from_cost_matrix = PMedian.from_cost_matrix(
    example_subject_time_table,
    demand,
    p_facilities=len(students_df_clean),
    predefined_facilities_arr=schools_priority_1_arr,
    facility_capacities=capacities_arr,
    fulfill_predefined_fac=True,
)
pmedian_from_cost_matrix = pmedian_from_cost_matrix.solve(solver)

There are a choice of solvers that can be used. PULP (`solver = pulp.PULP_CBC_CMD()`) is the default, but an alternative is COIN (`solver = pulp.COIN_CMD()`). Comment line 1 of this code in or out as needed. 

Save the match result

In [None]:
match_df = students_df_clean

for i in range(len(students_df_clean)):
    school_index = pmedian_from_cost_matrix.cli2fac[i]
    match_df.loc[i, "allocation_school_id"] = schools_df_clean.loc[
        school_index[0], "SE2 PP: Code"
    ]

In [None]:
match_df.to_csv(_file_location / "data" / "example_subject_matches.csv")

Run the map creation py document

In [None]:
create_allocation_map.main("example_subject")