# Introduction to RCPSP

What is RCPSP ? (Resource Constrained Project Scheduling Problem)

- $M$ activities or tasks in a project (instance)
- Precedence constraints: 

    > If activity $j\in[1,M]$ is a successor of activity $i\in[1,M]$, then activity $i$ must be completed before activity $j$ can be started

- Resource constraints: 

    > Each project is assigned a set of K renewable resources where each resource $k$ is available in $R_{k}$ units for the entire duration of the project. Each activity may require one or more of these resources to be completed. While scheduling the activities, the daily resource usage for resource $k$ can not exceed $R_{k}$ units. 
    
- Each activity $j$ takes $d_{j}$ time units to complete.

- The overall goal of the problem is usually to minimize the makespan.

Here we focus on *single mode RCPSP with renewable resources*, but there exists also variants of the problem
- multi-mode: a task can be performed in several ways (modes), with specific duration and resources needs. The choice of the mode is in this case part of the solution.
- mix of renewable and non-renewable resources.


## Prerequisites


Concerning the python kernel to use for this notebook:
- If running locally, be sure to use an environment with discrete-optimization and minizinc.
- If running on colab, the next cell does it for you.
- If running on binder, the environment should be ready.


In [None]:
# On Colab: install the library
on_colab = "google.colab" in str(get_ipython())
if on_colab:
    import importlib
    import os
    import sys  # noqa: avoid having this import removed by pycln

    !{sys.executable} -m pip install -U pip

    # uninstall google protobuf conflicting with ray and sb3
    ! pip uninstall -y protobuf

    # install dev version for dev doc, or release version for release doc
    !{sys.executable} -m pip install git+https://github.com/airbus/discrete-optimization@master#egg=discrete-optimization

    # be sure to load the proper cffi (downgraded compared to the one initially on colab)
    import cffi

    importlib.reload(cffi)

    # install and configure minizinc
    !curl -o minizinc.AppImage -L https://github.com/MiniZinc/MiniZincIDE/releases/download/2.6.3/MiniZincIDE-2.6.3-x86_64.AppImage
    !chmod +x minizinc.AppImage
    !./minizinc.AppImage --appimage-extract
    os.environ["PATH"] = f"{os.getcwd()}/squashfs-root/usr/bin/:{os.environ['PATH']}"
    os.environ["LD_LIBRARY_PATH"] = (
        f"{os.getcwd()}/squashfs-root/usr/lib/:{os.environ['LD_LIBRARY_PATH']}"
    )

### Imports

In [None]:
import logging
import random

import matplotlib.pyplot as plt
import nest_asyncio
import networkx as nx
import numpy as np

from discrete_optimization.datasets import fetch_data_from_psplib

# Main module for RCPSP Model
from discrete_optimization.rcpsp.rcpsp_model import RCPSPSolution
from discrete_optimization.rcpsp.rcpsp_parser import get_data_available, parse_file
from discrete_optimization.rcpsp.rcpsp_utils import (
    Graph,
    compute_graph_rcpsp,
    plot_ressource_view,
    plot_task_gantt,
)

# patch asyncio so that applications using async functions can run in jupyter
nest_asyncio.apply()

# set logging level
logging.basicConfig(level=logging.INFO)

### Download datasets

If not yet available, we import the datasets from [psplib](https://www.om-db.wi.tum.de/psplib/data.html).

In [None]:
needed_datasets = ["j301_1.sm"]
download_needed = False
try:
    files_available_paths = get_data_available()
    for dataset in needed_datasets:
        if len([f for f in files_available_paths if dataset in f]) == 0:
            download_needed = True
            break
except:
    download_needed = True

if download_needed:
    fetch_data_from_psplib()

### Set random seed (for reproducible results in this notebook)

In [None]:
def set_random_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)


set_random_seed()

## View input data

We use here the instance of an RCPSP described in the file `j301_1.sm`.

### File structure
Let us have a look of that file.

In [None]:
filepath = [f for f in get_data_available() if "j301_1.sm" in f][0]
with open(filepath, "rt") as f:
    print(f.read())

There are 32 jobs, including the source task (1) and the sink task (32). 

- The first part of the file describe the precedence constraints : 
  > Task $1$ should finish before task $2$, $3$, $4$ start.
  
- The second part of the file details the duration and resource usage per task : 
  > Task $3$ lasts 4 units of times and requires 10 units of $R_1$
  

### Parsing file
We parse the file to get a RCPSP model object in discrete-optimization library.

In [None]:
filepath = [f for f in get_data_available() if "j301_1.sm" in f][0]
rcpsp_model = parse_file(filepath)
print(type(rcpsp_model))
print(rcpsp_model)
print("Nb jobs : ", rcpsp_model.n_jobs)
print("Precedences : ", rcpsp_model.successors)
print("Resources Availability : ", rcpsp_model.resources)

### Precedence graph
We can have a visual version of the precedence graph :

In [None]:
# compute graph
graph: Graph = compute_graph_rcpsp(rcpsp_model)
graph_nx = graph.to_networkx()
# compute positions
dfs = nx.dfs_tree(G=graph_nx, source=1, depth_limit=10)
shortest_path_length = nx.shortest_path_length(dfs, 1)
length_to_nodes = {}
position = {}
for node in sorted(shortest_path_length, key=lambda x: shortest_path_length[x]):
    length = shortest_path_length[node]
    while not (length not in length_to_nodes or len(length_to_nodes[length]) <= 5):
        length += 1
    if length not in length_to_nodes:
        length_to_nodes[length] = []
    length_to_nodes[length] += [node]
    position[node] = (length, len(length_to_nodes[length]))

# different color for source and sink task
sink_source_color = "#FFB000"
normal_task_color = "#648FFF"
node_color = len(graph_nx) * [normal_task_color]
node_color[0] = sink_source_color
node_color[-1] = sink_source_color

# plot
nx.draw_networkx(graph_nx, pos=position, node_color=node_color)
plt.show()

### Critical path 
We can compute the largest path possible from source to sink task, which gives a lower bound on the makespan. This method is usually called critical path. 
When we computed the graph in previous cell, each edges store the minimum duration of a task, we also store the opposite of this number in ```minus_min_duration``` attribute of an edge.

In [None]:
print(graph.edges[5])

This means to fulfill the (2, 15) precedence you have to accomplish the task 2, which takes minimum 8 unit times to do. Let's compute the critical path. 

In [None]:
path = nx.astar_path(
    G=graph_nx,
    source=1,
    target=rcpsp_model.n_jobs,
    heuristic=lambda x, y: -100 if x != rcpsp_model.n_jobs else 0,
    weight="minus_min_duration",
)
edges = [(n1, n2) for n1, n2 in zip(path[:-1], path[1:])]
duration = sum(graph_nx[n[0]][n[1]]["min_duration"] for n in edges)
print("Duration of critical path : ", duration)

We know that our makespan will be at minimum 38 then because we necessarly have to accomplish the task found in the critical path sequentially, and the sum of their duration is 38. We can visualize this path in the precedence graph : 

In [None]:
fig, ax = plt.subplots(1)
nx.draw_networkx(graph_nx, pos=position, node_color=node_color, ax=ax)
nx.draw_networkx_edges(graph_nx, pos=position, edgelist=edges, edge_color="r", ax=ax)
plt.show()

## Other procedure to compute critical path or minimum project duration

The critical path can be computed by a graph procedure described in https://www.youtube.com/watch?v=4oDLMs11Exs. It is a quite simple : forward and backward graph exploration. In the end it provides earliest start date, earliest finish date, and their counterpart (for a optimal schedule ignoring the resource requirements) : latest start date, latest finish date.

In [None]:
from discrete_optimization.rcpsp.solver.cpm import CPM

solver = CPM(problem=rcpsp_model)
critical_path = solver.run_classic_cpm()
edges = [(pi, pi1) for pi, pi1 in zip(critical_path[:-1], critical_path[1:])]
print(solver.map_node[rcpsp_model.sink_task])

The critical path can be identified as nodes where all the values are equals.

In [None]:
fig, ax = plt.subplots(1)
nx.draw_networkx(graph_nx, pos=position, node_color=node_color, ax=ax)
nx.draw_networkx_edges(graph_nx, pos=position, edgelist=edges, edge_color="r", ax=ax)
plt.show()

We find the same result as previously.
The CPM object is giving us more information on the problem than the pure longest path computation, let's look : 

In [None]:
for task in rcpsp_model.tasks_list:
    print(f"CPM output for task {task} : {solver.map_node[task]}")

We access to all the labels that the critical path forward and backward pass computed. 

## Compute a "Dummy" solution for RCPSP
A solution can be defined as a permutation of jobs which is then transformed into a feasible schedule if possible using the SGS routine, which stands for serial schedule generation scheme. It consists at scheduling an activity as soon as it is available following the permutation order if possible.
The following algorithm is the following.

![image](img/sgs.png)

### Compute a schedule from a given jobs permutation

In [None]:
permutation = list(range(rcpsp_model.n_jobs_non_dummy))
# We just pick a random permutation of [0, n]
random.shuffle(permutation)
print(f"priority list given to sgs : {permutation}")
mode_list = [1 for i in range(rcpsp_model.n_jobs)]
rcpsp_sol = RCPSPSolution(
    problem=rcpsp_model, rcpsp_permutation=permutation, rcpsp_modes=mode_list
)
print("schedule feasible: ", rcpsp_sol.rcpsp_schedule_feasible)
print("schedule: ", rcpsp_sol.rcpsp_schedule)
print("rcpsp_modes:", rcpsp_sol.rcpsp_modes)
fitnesses = rcpsp_model.evaluate(rcpsp_sol)
print("fitnesses: ", fitnesses)
resource_consumption = rcpsp_model.compute_resource_consumption(rcpsp_sol)
print("resource_consumption: ", resource_consumption)
print("mean_resource_reserve:", rcpsp_sol.compute_mean_resource_reserve())

### Plotting the solution

#### Resource consumption over time : 

In [None]:
fig_resource_view = plot_ressource_view(
    rcpsp_model=rcpsp_model, rcpsp_sol=rcpsp_sol, title_figure="Dummy solution"
)

#### Task view
We can plot the schedule from a task point of view too, each line of the plot show where the task should be accomplished.

In [None]:
fig_gantt = plot_task_gantt(rcpsp_model=rcpsp_model, rcpsp_sol=rcpsp_sol)

## Conclusion

In this notebook you've been introduced to the definition of RCPSP problem which is a classical scheduling problem, with precedence constraint and resource consumption constraint.
We have illustrated the precedence graph and ways of computing longest path that gives us a lower bound of the total duration of the schedule.
Finally we introduced a method called *SGS* that computes a feasible schedule from a priority list of task to schedule.

In following notebooks, you'll be introduced to scheduling solvers providing good quality schedules with different paradigm : 
- [greedy heuristics](RCPSP%20%232%20Heuristics%20Solving.ipynb)
- [Metaheuristics and genetic algorithm](RCPSP%20%233%20Local%20search.ipynb)
- [Linear programming](RCPSP%20%234%20Linear%20programming.ipynb)
- [Constraint programming](RCPSP%20%235%20Constraint%20Programming.ipynb)
- [Large Neighborhood search](RCPSP%20%236%20Large%20Neighbourhood%20Search%20.ipynb)
