# MIS
In a graph, the maximal independent set, also known as maximal stable set is an independent set that is not a subset of any other independent set. In other words, it is a set of vertices such that no two vertices in the set are adjacent. In this notebook, we will build a learning-based model to find the maximal independent set in a graph.

<img src="img/Independent_set_graph.png" alt="Solved MIS" style="width:400px; height:400px;">

SeaPearl currently supports learning-based value selection models trained with Reinforcement Learning. To train Reinforcement Learning agents, we start by generating training instances and we let the agent learn a value selection heuristic. Like all other Reinforcement Learning tasks, we need to define a reward function, a state representation and an action space. Finally, we also need to build a neural network that will learn the value selection heuristic. All of these components will be defined in the following sections.

## Setup
We will begin by activating the environment and importing the necessary packages.

In [37]:
using Revise
using Pkg
Pkg.activate("../../../")
Pkg.instantiate()
using SeaPearl
using Flux
using LightGraphs
using Random
using ReinforcementLearning
using CSV
const RL = ReinforcementLearning

[32m[1m  Activating[22m[39m project at `c:\Users\leobo\Desktop\École\Poly\SeaPearl\SeaPearlZoo.jl`


ReinforcementLearning

## Generating instances

SeaPearl provides a number of instance generators, including one for the MIS problem. Under the hood, this generator creates Barabasi-Albert graphs with `n` vertices. The graphs are grown by adding new vertices to an initial that has `k` vertices. New vertices are connected by `k` edges to `k` different vertices already present in the system by preferential attachment. The resulting graphs are undirected.

<img src="img/450px-Barabasi_albert_graph.png" alt="Barabasi-Albert Graph" style="width:600px; height:200px;">

In [2]:
numInitialVertices = 4
numNewVertices = 10
instance_generator = SeaPearl.MaximumIndependentSetGenerator(numNewVertices, numInitialVertices)

SeaPearl.MaximumIndependentSetGenerator(10, 4)

## The Reinforcement Learning setup

### The state representation
In SeaPearl, the state $s_t$ is defined as a pair $s_t = (P_t, x_t)$, with $P_t$ a partially solved combinatorial optimization problem and $x_t$ a variable selected at time $t$ of an episode. A terminal episode is reached if all variables are fixed or if a failure is detected.

### The action space
Given a state $s_t = (P_t, x_t)$, an action $a_t$ represents the selection of a value $v$ for the variable $x_t$. The action space is defined as the set of all possible values for the variable $x_t$ at time $t$.

### The transition function
Given a state $s_t = (P_t, x_t)$ and an action $a_t = v$, the transition function is comprised of three steps;
1. The value of variable $x_{267}$ is assigned as $v$ (i.e., $D(x_{t+1}) = v$).
2. The fix-point operation is applied on $P_t$ to prune the domains (i.e., $P_{t+1} = \text{{fixPoint}}(P_t)$).
3. The next variable to branch on is selected (i.e., $x_{t+1} = \text{{nextVariable}}(P_{t+1})$).
This results in a new state $s_{t+1} = (P_{t+1}, x_{t+1})$.


### The reward function
SeaPearl uses a "propagation-based reward". As the goal of the agent is to quickly find a good solution, it needs to learn to effectively prune the search space and move toward promising regions. An intuitive way to configure the reward is to give the agent the objective value, but this information is only available at the end of episodes. In other words, it makes the reward signal extremely sparse. To address this problem, SeaPearl uses both an intermediate reward and a final reward. The intuition behind the intermediate reward is this: it is computed by rewarding the pruning of high values from the variable's domain and penalizing the pruning of low values from the variable's domain. Mathematcially, the intermediate reward is defined as follows:

$$

r_t^{ub} = \{ v \in D_t(x_{\text{{obj}}}) \mid v \notin D_{t+1}(x_{\text{{obj}}}) \land v > \max(D_t(x_{\text{{obj}}})) \} \\
r_t^{lb} = \{ v \in D_t(x_{\text{{obj}}}) \mid v \notin D_{t+1}(x_{\text{{obj}}}) \land v < \min(D_t(x_{\text{{obj}}})) \} \\
r^{mid}_t = \frac{{r_t^{ub} - r_t^{lb}}}{{\lvert D_1(x_{\text{{obj}}}) \rvert}} \\
r^{end}_t = \begin{cases} -1 & \text{{if unfeasible solution found}} \\ 0 & \text{{otherwise}} \end{cases} \\
r_{acc} = \frac{{\sum_{t=1}^{T} (r^{mid}_t + r^{end}_t)}}{{T-1}}
$$

## Implementation

We will now begin to implement the MIS problem in SeaPearl. To start, we will define the reward, which comes directly from the mathematical definition above. It is implemented in the `GeneralReward` of SeaPearl.


In [6]:
reward = SeaPearl.GeneralReward

SeaPearl.GeneralReward

# The Neural Network

Next up, we need to define the neural network that will learn the value selection function. As the problems can differ in size, the use of graph neural networks (GNNs) is particularly appropriate. GNNs are a class of neural networks that operate on graphs, which means we need to convert the problem instances to graphs. In SeaPearl, we use tripartite graphs, which are graphs with three types of nodes: variables, values and constraints. There is one node for every variable, for every value and for every constraint. The edges are defined as follows:
 - There is an edge between a variable and a value if the value is in the domain of the variable.
 - There is an edge between a variable and a constraint if the variable appears in the constraint.

Nodes have the following features:
 - Values have a one-hot encoding of their value. For example, if the domain of a variable is $\{1, 2, 3\}$, then the values will have a one-hot encoding of $[1, 2, 3]$.
 - Constraints have a one-hot encoding of their type (i.e., constraint).

Other features can be used in the graph and we will define a featurization for it later on.

## Setting up the experiment

In the next cell, we will set up the experiment. We will create structs for the agent and the experiment. We will also define the hyperparameters of the experiment.

In [7]:
"""MisAgentConfig holds parameters for the configuration of the RL agent that will be used"""
struct MisAgentConfig
    gamma::Float32
    batch_size::Int
    output_size::Int
    update_horizon::Int
    min_replay_history::Int
    update_freq::Int
    target_update_freq::Int
    trajectory_capacity::Int
end

"""MisExperimentSettings holds parameters for the configuration of the experiment"""
struct MisExperimentSettings
    nbEpisodes::Int
    restartPerInstances::Int
    evalFreq::Int
    nbInstances::Int
    nbRandomHeuristics::Int
    nbNewVertices::Int
    nbInitialVertices::Int
    seedEval::Int
end

agent_config = MisAgentConfig(0.99f0, 64, instance_generator.n, 4, 400, 1, 100, 2000)
mis_settings = MisExperimentSettings(100, 1, 10, 10, 4, numNewVertices, numInitialVertices, 123)

MisExperimentSettings(100, 1, 10, 10, 4, 10, 4, 123)

## Further configuration

Next up, we will define additional configurations for the experiment:
- The random seed
- Number of steps per episode
- The update horizon
- The device to use (CPU or GPU)
- The evaluation frequency
- The steps for the explorer
- The parameter initialization function

In [15]:
n_step_per_episode = Int(round(mis_settings.nbNewVertices // 2)) + mis_settings.nbInitialVertices
update_horizon = Int(round(n_step_per_episode // 2))
device = cpu # change if you have a GPU

if device == gpu
    CUDA.device!(numDevice)
end

evalFreq = mis_settings.evalFreq
step_explorer = Int(floor(mis_settings.nbEpisodes * n_step_per_episode / 2))
generator = instance_generator
eval_generator = generator

rngExp = MersenneTwister(mis_settings.seedEval)
init = Flux.glorot_uniform(MersenneTwister(mis_settings.seedEval))

#1 (generic function with 1 method)

## Defining the Neural Network

In the next cell, we will define the neural network used by the RL agent. We will be using Graph Neural Networks (GNNs) as the learnable architecture. The inputs will be graphs as defined earlier. The features are defined in a Dict and contain elements coming from the problem instance.

## The GNN model

The model used in this example is the Heterogeneous Graph Convolutional Network (HetGCN) proposed by Zhang et al. in [Heterogeneous Graph Neural Network](https://arxiv.org/abs/2003.01332). The model is defined as follows:
For each relation type $r \in R $:
- $H^{(l)}_{r} = \sigma(D_{r}^{-\frac{1}{2}}A_{r}D_{r}^{-\frac{1}{2}}H^{(l-1)}_{r}W^{(l)}_{r})$ 
- Where:
- $H^{(l)}_{r}$ denotes the feature matrix of nodes of type $r$ in layer $l$.
- $A_{r}$ is the adjacency matrix for the relation $r$.
- $D_{r}$ is the degree matrix for the adjacency matrix $A_{r}$.
- $W^{(l)}_{r}$ is a layer-specific learnable weight matrix for nodes of type $r$.
- $\sigma$ is the activation function such as ReLU.
- Finally, to obtain a comprehensive feature representation of nodes, the feature representations obtained from different relations are concatenated:
- $H^{(l)}$ = CONCAT $(H^{(l)}_{r_1}, H^{(l)}_{r_2}, ..., H^{(l)}_{r_n})$
- Where  $r_1$, $r_2$, ..., $r_n$ are different types of relations.

In SeaPearl, the heterogeneous GCN is implemented in the HeterogeneousCPNN class.

In [13]:
struct HeterogeneousModel{A,B}
    Inputlayer::A
    Middlelayers::Vector{B}
end

# The size of the input features for each type of node (variable, constraint, value), respectively
feature_size = [6, 5, 2]

"""
    get_dense_chain(in, mid, out, n_layers, σ=Flux.identity; init=Flux.glorot_uniform)

Create a chain of dense layers for a neural network.

# Arguments
- `in::Int`: The size of the input layer.
- `mid::Int`: The size of the intermediate layers.
- `out::Int`: The size of the output layer.
- `n_layers::Int`: The number of layers in the chain.
- `σ::Function=Flux.identity`: The activation function to use.
- `init::Function=Flux.glorot_uniform`: The initialization method to use.

# Returns
A `Flux.Chain` object representing the chain of dense layers.

# Examples
```julia
julia> get_dense_chain(10, 20, 5, 3)
Chain(Dense(10, 20, σ), Dense(20, 20, σ), Dense(20, 5))
```
"""
function get_dense_chain(in, mid, out, n_layers, σ=Flux.identity; init=Flux.glorot_uniform)
    @assert n_layers >= 1
    layers = []
    if n_layers == 1
        push!(layers, Flux.Dense(in, out, init=init))
    elseif n_layers == 2
        push!(layers, Flux.Dense(in, mid, σ, init=init))
        push!(layers, Flux.Dense(mid, out, init=init))
    else
        push!(layers, Flux.Dense(in, mid, σ, init=init))
        for i in 2:(n_layers-1)
            push!(layers, Flux.Dense(mid, mid, σ, init=init))
        end
        push!(layers, Flux.Dense(mid, out, init=init))
    end
    return Flux.Chain(layers...)
end

# Builds the SeaPearl HeterogeneousFullFeaturedCPNN model
function build_model(; 
        feature_size,
        conv_size=8,
        dense_size=16,
        output_size=1,
        n_layers_graph=3,
        n_layers_node=2,
        n_layers_output=2,
        pool=SeaPearl.meanPooling(),
        σ=Flux.leakyrelu,
        init=Flux.glorot_uniform,
        device=cpu
    )
    input_layer = SeaPearl.HeterogeneousGraphConvInit(feature_size, conv_size, σ, init=init) # input layer
    middle_layers = SeaPearl.HeterogeneousGraphConv[] # middle layers
    for i in 1:n_layers_graph-1
        push!(middle_layers, SeaPearl.HeterogeneousGraphConv(conv_size => conv_size, feature_size, σ, pool=pool, init=init))
    end
    output_layer = SeaPearl.HeterogeneousGraphConv(conv_size => output_size, feature_size, σ, pool=pool, init=init) # output layer
    dense_layers = get_dense_chain(conv_size, dense_size, dense_size, n_layers_node, σ, init=init) # dense layers
    # Define the final output layer
    final_output_layer = get_dense_chain(2 * dense_size, dense_size, output_size, n_layers_output, σ, init=init)

    # Build the model
    model = SeaPearl.HeterogeneousFullFeaturedCPNN(
        HeterogeneousModel(input_layer, middle_layers),
        dense_layers,
        Flux.Chain(),
        final_output_layer
    ) |> device

    return model
end

build_model (generic function with 1 method)

## The State Representation

The state representation is defined in the `HeterogeneousStateRepresentation` class. It is a heterogeneous state representation, which means that it is comprised of multiple state representations. In this example, we will use the `DefaultFeaturization` and the `HeterogeneousTrajectoryState` state representations. The `DefaultFeaturization` state representation is a featurization that is used by default in SeaPearl and allows the user to select the graph features they want. The available features are the following.
### Variable Features:
- node_number_of_neighbors
- variable_initial_domain_size
- variable_domain_size
- variable_is_bound
- variable_is_branchable
- variable_is_objective
- variable_assigned_value
### Constraint Features
- node_number_of_neighbors
- constraint_activity
- nb_involved_constraint_propagation
- nb_not_bounded_variable
- constraint_type
### Value Features
- node_number_of_neighbors
- values_raw
- values_onehot

The featurization is used to convert the problem instance to a graph. The `HeterogeneousTrajectoryState` state representation is a state representation that is used to represent the state at a given point in the resolution of the problem. It contains the variable that is branched on, the feature graph at that point and the available values.

In [10]:
# Defines the features that will be used
chosen_features = Dict(
    "node_number_of_neighbors" => true,
    "constraint_type" => true,
    "constraint_activity" => true,
    "nb_not_bounded_variable" => true,
    "variable_initial_domain_size" => true,
    "variable_domain_size" => true,
    "variable_is_objective" => true,
    "variable_assigned_value" => true,
    "variable_is_bound" => true,
    "values_raw" => true
)
SR_heterogeneous = SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization,SeaPearl.HeterogeneousTrajectoryState}

SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}

## The Agent, Replay Buffer and Exploration Policy

We now have:
- The reward function
- An instance generator
- A GNN
- And all the settings we need!

We now need to define:
- The way we will store trajectories will be stored (in a circular buffer)
- The exploration policy (eps-greedy)
- The agent (DQN)

In [26]:
"""
    get_heterogeneous_slart_trajectory(; capacity, n_actions)

Create a circular buffer for storing trajectories in the context of reinforcement learning where not all actions are legal. 
SLART stands for State, Legal Actions, Reward, Terminal.

# Arguments
- `capacity::Int`: The maximum number of trajectories that can be stored in the buffer.
- `n_actions::Int`: The number of possible actions that can be taken at each time step.

# Returns
A `CircularArraySLARTTrajectory` object with the specified capacity and legal actions mask, and an empty state buffer.
"""
function get_heterogeneous_slart_trajectory(; capacity, n_actions)
    return RL.CircularArraySLARTTrajectory(
        capacity=capacity,
        state=SeaPearl.HeterogeneousTrajectoryState[] => (),
        legal_actions_mask=Vector{Bool} => (n_actions,),
    )
end

function get_heterogeneous_agent(; get_explorer, batch_size=16, update_horizon, min_replay_history, update_freq=1, target_update_freq=200, γ=0.999f0, get_heterogeneous_trajectory, get_heterogeneous_nn)
    return RL.Agent(
        policy=RL.QBasedPolicy(
            learner=get_heterogeneous_learner(batch_size, update_horizon, min_replay_history, update_freq, target_update_freq, get_heterogeneous_nn, γ),
            explorer=get_explorer(),
        ),
        trajectory=get_heterogeneous_trajectory()
    )
end

function get_heterogeneous_learner(batch_size, update_horizon, min_replay_history, update_freq, target_update_freq, get_heterogeneous_nn, γ)
    return RL.DQNLearner(
        approximator=RL.NeuralNetworkApproximator(
            model=get_heterogeneous_nn(),
            optimizer=ADAM()
        ),
        target_approximator=RL.NeuralNetworkApproximator(
            model=get_heterogeneous_nn(),
            optimizer=ADAM()
        ),
        loss_func=Flux.Losses.huber_loss,
        batch_size=batch_size,
        update_horizon=update_horizon,
        min_replay_history=min_replay_history,
        update_freq=update_freq,
        target_update_freq=target_update_freq,
        γ=γ
    )
end

Flux.@functor HeterogeneousModel
"""
function Flux.functor(::Type{<:HeterogeneousModel}, m)
    return (m.Inputlayer, m.Middlelayers), ls -> HeterogeneousModel(ls[1], ls[2])
end
"""
function (m::HeterogeneousModel)(fg)
    original_fg = deepcopy(fg)
    out = m.Inputlayer(fg)
    for layer in m.Middlelayers
        out = layer(out, original_fg)
    end
    return out
end

"""
    get_epsilon_greedy_explorer(decay_steps, ϵ_stable; rng=nothing)

Create an epsilon-greedy explorer for use in reinforcement learning.

# Arguments
- `decay_steps::Int`: The number of steps over which to decay the exploration rate.
- `ϵ_stable::Real`: The minimum exploration rate to use after decay.
- `rng::AbstractRNG`: (optional) A random number generator to use for sampling actions.

# Returns
An `EpsilonGreedyExplorer` object with the specified exploration rate decay and random number generator.
"""
function get_epsilon_greedy_explorer(decay_steps, ϵ_stable; rng=nothing)
    if isnothing(rng)
        return RL.EpsilonGreedyExplorer(
            ϵ_stable=ϵ_stable,
            kind=:exp,
            decay_steps=decay_steps,
            step=1
        )
    else
        return RL.EpsilonGreedyExplorer(
            ϵ_stable=ϵ_stable,
            kind=:exp,
            decay_steps=decay_steps,
            step=1,
            rng=rng
        )
    end
end
pool = SeaPearl.meanPooling()

agent = get_heterogeneous_agent(;
    get_heterogeneous_trajectory=() -> get_heterogeneous_slart_trajectory(capacity=agent_config.trajectory_capacity, n_actions=2),
    get_explorer=() -> get_epsilon_greedy_explorer(step_explorer, 0.01; rng=rngExp),
    batch_size=agent_config.batch_size,
    update_horizon=update_horizon,
    min_replay_history=Int(round(16 * n_step_per_episode // 2)),
    update_freq=agent_config.update_freq,
    target_update_freq=agent_config.target_update_freq,
    get_heterogeneous_nn=() -> build_model(
        feature_size=feature_size,
        conv_size=8,
        dense_size=16,
        output_size=1,
        n_layers_graph=3,
        n_layers_node=3,
        n_layers_output=2,
        pool=pool,
        σ=NNlib.leakyrelu,
        init=init,
        device=device
    ),
    γ=0.99f0
)

learned_heuristic = SeaPearl.SimpleLearnedHeuristic{SR_heterogeneous,reward,SeaPearl.FixedOutput}(agent; chosen_features=chosen_features)

SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}(typename(Agent)
├─ policy => typename(QBasedPolicy)
│  ├─ learner => typename(DQNLearner)
│  │  ├─ approximator => typename(NeuralNetworkApproximator)
│  │  │  ├─ model => typename(SeaPearl.HeterogeneousFullFeaturedCPNN)
│  │  │  │  ├─ graphChain => typename(HeterogeneousModel)
│  │  │  │  │  ├─ Inputlayer => typename(SeaPearl.HeterogeneousGraphConvInit)
│  │  │  │  │  │  ├─ weightsvar => 8×6 Matrix{Float32}
│  │  │  │  │  │  ├─ weightscon => 8×5 Matrix{Float32}
│  │  │  │  │  │  ├─ weightsval => 8×2 Matrix{Float32}
│  │  │  │  │  │  ├─ biasvar => 8-element Vector{Float32}
│  │  │  │  │  │  ├─ biascon => 8-element Vector{Float32}
│  │  │  │  │  │  ├─ biasval => 8-element Vector{Float32}
│  │  │  │  │  │  └─ σ => typename(typeof(leakyrelu))
│  │  │  │  │  └─ Middlelayers => 2-element Vector{SeaPearl

## Setting up comparisons and running the experiment

We now have everything we need to run the experiment. We will run the experiment for 1000 episodes and compare the performance of the agent with the performance of a random agent and another policy that always selects the max value available.


In [27]:
selectMax(x::SeaPearl.IntVar; cpmodel=nothing) = SeaPearl.maximum(x.domain)
heuristic_max = SeaPearl.BasicHeuristic(selectMax)

function select_random_value(x::SeaPearl.IntVar; cpmodel=nothing)
    selected_number = rand(1:length(x.domain))
    i = 1
    for value in x.domain
        if i == selected_number
            return value
        end
        i += 1
    end
    @assert false "This should not happen"
end

randomHeuristics = []
for i in 1:mis_settings.nbRandomHeuristics
    push!(randomHeuristics, SeaPearl.BasicHeuristic(select_random_value))
end

valueSelectionArray = [learned_heuristic, heuristic_max]
append!(valueSelectionArray, randomHeuristics)
variableSelection = SeaPearl.MinDomainVariableSelection{false}()

SeaPearl.MinDomainVariableSelection{false}()

## Solving the problem

Let's finally build a function that will help solve the problem.

In [38]:
function solve_learning_mis(
    agent::RL.Agent,
    agent_config::MisAgentConfig,
    mis_settings::MisExperimentSettings,
    instance_generator::SeaPearl.AbstractModelGenerator,
    save_experiment_parameters::Bool=false,
    save_model::Bool=false
)

    if save_experiment_parameters
        experiment_time = now()
        dir = mkdir(string("exp_", Base.replace("$(round(experiment_time, Dates.Second(3)))", ":" => "-")))
        experiment_parameters = get_experiment_parameters(agent, agent_config, mis_settings)
        open(dir * "/params.json", "w") do file
            JSON.print(file, experiment_parameters)
        end
    end

    metricsArray, eval_metricsArray = SeaPearl.train!(
        valueSelectionArray=valueSelectionArray,
        generator=instance_generator,
        nbEpisodes=mis_settings.nbEpisodes,
        strategy=SeaPearl.DFSearch(),
        variableHeuristic=variableSelection,
        out_solver=true,
        verbose=false,
        evaluator=SeaPearl.SameInstancesEvaluator(valueSelectionArray, instance_generator; evalFreq=mis_settings.evalFreq, nbInstances=mis_settings.nbInstances),
        restartPerInstances=mis_settings.restartPerInstances
    )
    if save_model
        model = agent.policy.learner.approximator
        @save dir * "/model_gc" * string(instance_generator.n) * ".bson" model
    end

    return metricsArray, eval_metricsArray
end

LoadError: LoadError: UndefVarError: @save not defined
in expression starting at In[38]:32

In [39]:
metricsArray, eval_metricsArray = solve_learning_mis(agent, agent_config, mis_settings, instance_generator)

Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}

0.0%┣                                              ┫ 0/100 [00:00<00:00, -0s/it]


Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristic

1.0%┣▍                                         ┫ 1/100 [00:03<Inf:Inf, InfGs/it]
2.0%┣█                                              ┫ 2/100 [00:03<05:17, 3s/it]
3.0%┣█▍                                             ┫ 3/100 [00:03<02:48, 2s/it]
4.0%┣█▉                                             ┫ 4/100 [00:04<01:57, 1s/it]
5.0%┣██▍                                            ┫ 5/100 [00:04<01:31, 1it/s]
6.0%┣██▉                                            ┫ 6/100 [00:04<01:15, 1it/s]
7.0%┣███▎                                           ┫ 7/100 [00:04<01:08, 1it/s]
8.0%┣███▊                                           ┫ 8/100 [00:05<00:59, 2it/s]
9.0%┣████▎                                          ┫ 9/100 [00:05<00:54, 2it/s]

Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.


10.0%┣████▌                                        ┫ 10/100 [00:05<00:50, 2it/s]


BasicHeuristic

11.0%┣█████                                        ┫ 11/100 [00:06<00:51, 2it/s]
12.0%┣█████▍                                       ┫ 12/100 [00:06<00:48, 2it/s]
13.0%┣█████▉                                       ┫ 13/100 [00:06<00:45, 2it/s]
14.0%┣██████▎                                      ┫ 14/100 [00:06<00:43, 2it/s]
15.0%┣██████▊                                      ┫ 15/100 [00:07<00:40, 2it/s]
16.0%┣███████▏                                     ┫ 16/100 [00:07<00:38, 2it/s]
17.0%┣███████▋                                     ┫ 17/100 [00:07<00:37, 2it/s]
18.0%┣████████                                     ┫ 18/100 [00:07<00:35, 2it/s]
19.0%┣████████▌                                    ┫ 19/100 [00:07<00:34, 2it/s]

Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.


20.0%┣█████████                                    ┫ 20/100 [00:08<00:32, 2it/s]


BasicHeuristic

21.0%┣█████████▌                                   ┫ 21/100 [00:08<00:34, 2it/s]
22.0%┣██████████                                   ┫ 22/100 [00:09<00:32, 2it/s]
23.0%┣██████████▍                                  ┫ 23/100 [00:09<00:31, 2it/s]
24.0%┣██████████▉                                  ┫ 24/100 [00:09<00:30, 3it/s]
25.0%┣███████████▎                                 ┫ 25/100 [00:09<00:29, 3it/s]
26.0%┣███████████▊                                 ┫ 26/100 [00:10<00:28, 3it/s]
27.0%┣████████████▏                                ┫ 27/100 [00:10<00:27, 3it/s]
28.0%┣████████████▋                                ┫ 28/100 [00:10<00:27, 3it/s]
29.0%┣█████████████                                ┫ 29/100 [00:10<00:26, 3it/s]


Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristic

30.0%┣█████████████▌                               ┫ 30/100 [00:10<00:25, 3it/s]
31.0%┣██████████████                               ┫ 31/100 [00:11<00:25, 3it/s]
32.0%┣██████████████▍                              ┫ 32/100 [00:11<00:25, 3it/s]
33.0%┣██████████████▉                              ┫ 33/100 [00:11<00:24, 3it/s]
34.0%┣███████████████▎                             ┫ 34/100 [00:12<00:23, 3it/s]
35.0%┣███████████████▊                             ┫ 35/100 [00:12<00:23, 3it/s]
36.0%┣████████████████▏                            ┫ 36/100 [00:12<00:22, 3it/s]
37.0%┣████████████████▋                            ┫ 37/100 [00:12<00:21, 3it/s]
38.0%┣█████████████████                            ┫ 38/100 [00:13<00:21, 3it/s]
39.0%┣█████████████████▌                           ┫ 39/100 [00:13<00:20, 3it/s]

Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.


40.0%┣██████████████████                           ┫ 40/100 [00:13<00:20, 3it/s]


BasicHeuristic

41.0%┣██████████████████▌                          ┫ 41/100 [00:14<00:20, 3it/s]
42.0%┣███████████████████                          ┫ 42/100 [00:14<00:20, 3it/s]
43.0%┣███████████████████▍                         ┫ 43/100 [00:14<00:19, 3it/s]
44.0%┣███████████████████▉                         ┫ 44/100 [00:14<00:19, 3it/s]
45.0%┣████████████████████▎                        ┫ 45/100 [00:15<00:18, 3it/s]
46.0%┣████████████████████▊                        ┫ 46/100 [00:15<00:18, 3it/s]
47.0%┣█████████████████████▏                       ┫ 47/100 [00:15<00:17, 3it/s]
48.0%┣█████████████████████▋                       ┫ 48/100 [00:15<00:17, 3it/s]
49.0%┣██████████████████████                       ┫ 49/100 [00:15<00:16, 3it/s]


Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.

50.0%┣██████████████████████▌                      ┫ 50/100 [00:16<00:16, 3it/s]


BasicHeuristic

51.0%┣███████████████████████                      ┫ 51/100 [00:16<00:16, 3it/s]
52.0%┣███████████████████████▍                     ┫ 52/100 [00:17<00:16, 3it/s]
53.0%┣███████████████████████▉                     ┫ 53/100 [00:17<00:15, 3it/s]
54.0%┣████████████████████████▎                    ┫ 54/100 [00:17<00:15, 3it/s]
55.0%┣████████████████████████▊                    ┫ 55/100 [00:17<00:14, 3it/s]
56.0%┣█████████████████████████▏                   ┫ 56/100 [00:18<00:14, 3it/s]
57.0%┣█████████████████████████▋                   ┫ 57/100 [00:18<00:14, 3it/s]
58.0%┣██████████████████████████                   ┫ 58/100 [00:18<00:13, 3it/s]
59.0%┣██████████████████████████▌                  ┫ 59/100 [00:18<00:13, 3it/s]

Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.


60.0%┣███████████████████████████                  ┫ 60/100 [00:18<00:12, 3it/s]


BasicHeuristic

61.0%┣███████████████████████████▌                 ┫ 61/100 [00:19<00:12, 3it/s]
62.0%┣████████████████████████████                 ┫ 62/100 [00:19<00:12, 3it/s]
63.0%┣████████████████████████████▍                ┫ 63/100 [00:19<00:12, 3it/s]
64.0%┣████████████████████████████▉                ┫ 64/100 [00:20<00:11, 3it/s]
65.0%┣█████████████████████████████▎               ┫ 65/100 [00:20<00:11, 3it/s]
66.0%┣█████████████████████████████▊               ┫ 66/100 [00:20<00:10, 3it/s]
67.0%┣██████████████████████████████▏              ┫ 67/100 [00:20<00:10, 3it/s]
68.0%┣██████████████████████████████▋              ┫ 68/100 [00:20<00:10, 3it/s]
69.0%┣███████████████████████████████              ┫ 69/100 [00:21<00:09, 3it/s]


Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}

70.0%┣███████████████████████████████▌             ┫ 70/100 [00:21<00:09, 3it/s]


Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristic

71.0%┣████████████████████████████████             ┫ 71/100 [00:22<00:09, 3it/s]
72.0%┣████████████████████████████████▍            ┫ 72/100 [00:22<00:09, 3it/s]
73.0%┣████████████████████████████████▉            ┫ 73/100 [00:22<00:08, 3it/s]
74.0%┣█████████████████████████████████▎           ┫ 74/100 [00:22<00:08, 3it/s]
75.0%┣█████████████████████████████████▊           ┫ 75/100 [00:23<00:08, 3it/s]
76.0%┣██████████████████████████████████▏          ┫ 76/100 [00:23<00:07, 3it/s]
77.0%┣██████████████████████████████████▋          ┫ 77/100 [00:23<00:07, 3it/s]
78.0%┣███████████████████████████████████          ┫ 78/100 [00:23<00:07, 3it/s]
79.0%┣███████████████████████████████████▌         ┫ 79/100 [00:23<00:06, 3it/s]


Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.

80.0%┣████████████████████████████████████         ┫ 80/100 [00:23<00:06, 3it/s]


BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristic

81.0%┣████████████████████████████████████▌        ┫ 81/100 [00:24<00:06, 3it/s]
82.0%┣█████████████████████████████████████        ┫ 82/100 [00:24<00:05, 3it/s]
83.0%┣█████████████████████████████████████▍       ┫ 83/100 [00:25<00:05, 3it/s]
84.0%┣█████████████████████████████████████▉       ┫ 84/100 [00:25<00:05, 3it/s]
85.0%┣██████████████████████████████████████▎      ┫ 85/100 [00:25<00:04, 3it/s]
86.0%┣██████████████████████████████████████▊      ┫ 86/100 [00:25<00:04, 3it/s]
87.0%┣███████████████████████████████████████▏     ┫ 87/100 [00:25<00:04, 3it/s]
88.0%┣███████████████████████████████████████▋     ┫ 88/100 [00:26<00:04, 3it/s]
89.0%┣████████████████████████████████████████     ┫ 89/100 [00:26<00:03, 3it/s]

Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.


90.0%┣████████████████████████████████████████▌    ┫ 90/100 [00:26<00:03, 3it/s]


BasicHeuristic

91.0%┣█████████████████████████████████████████    ┫ 91/100 [00:27<00:03, 3it/s]
92.0%┣█████████████████████████████████████████▍   ┫ 92/100 [00:27<00:02, 3it/s]
93.0%┣█████████████████████████████████████████▉   ┫ 93/100 [00:27<00:02, 3it/s]
94.0%┣██████████████████████████████████████████▎  ┫ 94/100 [00:28<00:02, 3it/s]
95.0%┣██████████████████████████████████████████▊  ┫ 95/100 [00:28<00:01, 3it/s]
96.0%┣███████████████████████████████████████████▏ ┫ 96/100 [00:28<00:01, 3it/s]
97.0%┣███████████████████████████████████████████▋ ┫ 97/100 [00:28<00:01, 3it/s]
98.0%┣████████████████████████████████████████████ ┫ 98/100 [00:29<00:01, 3it/s]
99.0%┣████████████████████████████████████████████▌┫ 99/100 [00:29<00:00, 3it/s]

Switching to agent : SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}Switching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristicSwitching to agent : SeaPearl.BasicHeuristic


100.0%┣███████████████████████████████████████████┫ 100/100 [00:29<00:00, 3it/s]
100.0%┣███████████████████████████████████████████┫ 100/100 [00:29<00:00, 3it/s]


(SeaPearl.AbstractMetrics[SeaPearl.BasicMetrics{SeaPearl.TakeObjective, SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}}(SeaPearl.SimpleLearnedHeuristic{SeaPearl.HeterogeneousStateRepresentation{SeaPearl.DefaultFeaturization, SeaPearl.HeterogeneousTrajectoryState}, SeaPearl.GeneralReward, SeaPearl.FixedOutput}(typename(Agent)
├─ policy => typename(QBasedPolicy)
│  ├─ learner => typename(DQNLearner)
│  │  ├─ approximator => typename(NeuralNetworkApproximator)
│  │  │  ├─ model => typename(SeaPearl.HeterogeneousFullFeaturedCPNN)
│  │  │  │  ├─ graphChain => typename(HeterogeneousModel)
│  │  │  │  │  ├─ Inputlayer => typename(SeaPearl.HeterogeneousGraphConvInit)
│  │  │  │  │  │  ├─ weightsvar => 8×6 Matrix{Float32}
│  │  │  │  │  │  ├─ weightscon => 8×5 Matrix{Float32}
│  │  │  │  │  │  ├─ weightsval => 8×2 Matrix{Float32}
│  │  │  │  │  │  ├─ bi

In [36]:
eval_metricsArray

10×6 Matrix{SeaPearl.AbstractMetrics}:
 BasicMetrics{TakeObjective, SimpleLearnedHeuristic{HeterogeneousStateRepresentation{DefaultFeaturization, HeterogeneousTrajectoryState}, GeneralReward, FixedOutput}}(SimpleLearnedHeuristic{HeterogeneousStateRepresentation{DefaultFeaturization, HeterogeneousTrajectoryState}, GeneralReward, FixedOutput}(typename(Agent)
├─ policy => typename(QBasedPolicy)
│  ├─ learner => typename(DQNLearner)
│  │  ├─ approximator => typename(NeuralNetworkApproximator)
│  │  │  ├─ model => typename(HeterogeneousFullFeaturedCPNN)
│  │  │  │  ├─ graphChain => typename(HeterogeneousModel)
│  │  │  │  │  ├─ Inputlayer => typename(HeterogeneousGraphConvInit)
│  │  │  │  │  │  ├─ weightsvar => 8×6 Matrix{Float32}
│  │  │  │  │  │  ├─ weightscon => 8×5 Matrix{Float32}
│  │  │  │  │  │  ├─ weightsval => 8×2 Matrix{Float32}
│  │  │  │  │  │  ├─ biasvar => 8-element Vector{Float32}
│  │  │  │  │  │  ├─ biascon => 8-element Vector{Float32}
│  │  │  │  │  │  ├─ biasval => 8-ele