# Branching with Imitation Learning and a GNN

In this tutorial we will reproduce a simplified version of the paper of Gasse et al. (2019) on learning to branch with Ecole with `pytorch` and `pytorch geometric`. We collect strong branching examples on randomly generated maximum set covering instances, then train a graph neural network with bipartite state encodings to imitate the expert by classification. Finally, we will evaluate the quality of the policy.

This tutorial requires the following libraries. The version numbers used when writing this tutorial are given in parentheses.
- `python` (3.8.2)
- `numpy` (1.19.4)
- `pytorch` (1.7.0)
- `pytorch-geometric` (1.6.2)
- `ecole` (0.4.2)

The tutorial was designed with the provided version numbers.

## 1. Data collection

Our first step will be to run explore-then-strong-branch on randomly generated maximum set covering instances, and save the branching decisions to build a dataset. We will also record the state of the branch-and-bound process as a bipartite graph, which is already implemented in Ecole with the same features as Gasse et al. (2019).

In [1]:
import gzip
import pickle
import numpy as np
import ecole
from pathlib import Path

MAX_SAMPLES = 1000

We will use the Ecole-provided set cover instance generator.

In [2]:
instances = ecole.instance.SetCoverGenerator(n_rows=500, n_cols=1000, density=0.05)

The explore-then-strong-branch scheme described in the paper is not implemented by default in Ecole, but we can easily write this branching rule in python, which showcasees the flexibility of the library.

In [3]:
class ExploreThenStrongBranch:
    def __init__(self, expert_probability):
        self.expert_probability = expert_probability
        self.pseudocosts_function = ecole.observation.Pseudocosts()
        self.strong_branching_function = ecole.observation.StrongBranchingScores()
    
    def before_reset(self, model):
        self.pseudocosts_function.before_reset(model)
        self.strong_branching_function.before_reset(model)
    
    def extract(self, model, done):
        probabilities = [1-self.expert_probability, self.expert_probability]
        expert_chosen = bool(np.random.choice(np.arange(2), p=probabilities))
        if expert_chosen:
            return (self.strong_branching_function.extract(model, done), True)
        else:
            return (self.pseudocosts_function.extract(model, done), False)

We can now create the environment with the correct parameters (no restarts, 1h time limit, 5% expert sampling probability).

In [4]:
scip_parameters = {'separating/maxrounds': 0, 'presolving/maxrestarts': 0, 'limits/time': 3600}
env = ecole.environment.Branching(observation_function=(ExploreThenStrongBranch(expert_probability=0.05), 
                                                        ecole.observation.NodeBipartite()), 
                                  scip_params=scip_parameters)

Now we loop over the instances, following the strong branching expert 5% of the time and saving its decision, until enough samples are collected.

In [5]:
episode_counter = 0
sample_counter = 0
max_samples_reached = False
Path('samples/').mkdir(exist_ok=True)
env.seed(0)

while not max_samples_reached:
    episode_counter += 1
    
    observation, action_set, _, done, _ = env.reset(next(instances))
    while not done:
        scores, node_observation = observation
        scores, scores_are_expert = scores
        node_observation = (node_observation.row_features,
                            (node_observation.edge_features.indices, 
                             node_observation.edge_features.values),
                            node_observation.column_features)

        action = action_set[scores[action_set].argmax()]

        if scores_are_expert and not max_samples_reached:
            sample_counter += 1
            data = [node_observation, action, action_set, scores]
            filename = f'samples/sample_{sample_counter}.pkl'

            with gzip.open(filename, 'wb') as f:
                pickle.dump(data, f)
            
            if sample_counter == MAX_SAMPLES:
                max_samples_reached = True

        observation, action_set, _, done, _ = env.step(action)
    
    print(f"Episode {episode_counter}, {sample_counter} samples collected so far")

Episode 1, 2 samples collected so far
Episode 2, 12 samples collected so far
Episode 3, 18 samples collected so far
Episode 4, 25 samples collected so far
Episode 5, 40 samples collected so far
Episode 6, 67 samples collected so far
Episode 7, 72 samples collected so far
Episode 8, 75 samples collected so far
Episode 9, 77 samples collected so far
Episode 10, 79 samples collected so far
Episode 11, 111 samples collected so far
Episode 12, 136 samples collected so far
Episode 13, 153 samples collected so far
Episode 14, 156 samples collected so far
Episode 15, 157 samples collected so far
Episode 16, 158 samples collected so far
Episode 17, 165 samples collected so far
Episode 18, 182 samples collected so far
Episode 19, 193 samples collected so far
Episode 20, 221 samples collected so far
Episode 21, 235 samples collected so far
Episode 22, 252 samples collected so far
Episode 23, 258 samples collected so far
Episode 24, 262 samples collected so far
Episode 25, 263 samples collected so

# 2. Train a GNN

Our next step is to train a GNN classifier on these collected samples to predict similar choices to strong branching.

In [6]:
import torch
import torch.nn.functional as F
import torch_geometric

LEARNING_RATE = 0.001
MAX_EPOCHS = 100
PATIENCE = 10
EARLY_STOPPING = 20
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

We will first define pytorch geometric data classes to handle the bipartite graph data.

In [7]:
class BipartiteNodeData(torch_geometric.data.Data):
    def __init__(self, constraint_features, edge_indices, edge_features, variable_features,
                 candidates, candidate_choice, candidate_scores):
        super().__init__()
        self.constraint_features = constraint_features
        self.edge_index = edge_indices
        self.edge_attr = edge_features
        self.variable_features = variable_features
        self.candidates = candidates
        self.nb_candidates = len(candidates)
        self.candidate_choices = candidate_choice
        self.candidate_scores = candidate_scores

    def __inc__(self, key, value):
        if key == 'edge_index':
            return torch.tensor([[self.constraint_features.size(0)], [self.variable_features.size(0)]])
        elif key == 'candidates':
            return self.variable_features.size(0)
        else:
            return super().__inc__(key, value)


class GraphDataset(torch_geometric.data.Dataset):
    def __init__(self, sample_files):
        super().__init__(root=None, transform=None, pre_transform=None)
        self.sample_files = sample_files

    def len(self):
        return len(self.sample_files)

    def get(self, index):
        with gzip.open(self.sample_files[index], 'rb') as f:
            sample = pickle.load(f)

        sample_observation, sample_action, sample_action_set, sample_scores = sample

        constraint_features, (edge_indices, edge_features), variable_features = sample_observation
        constraint_features = torch.FloatTensor(constraint_features)
        edge_indices = torch.LongTensor(edge_indices.astype(np.int32))
        edge_features = torch.FloatTensor(np.expand_dims(edge_features, axis=-1))
        variable_features = torch.FloatTensor(variable_features)

        candidates = torch.LongTensor(np.array(sample_action_set, dtype=np.int32))
        candidate_choice = torch.where(candidates == sample_action)[0][0]  # action index relative to candidates
        candidate_scores = torch.FloatTensor([sample_scores[j] for j in candidates])

        graph = BipartiteNodeData(constraint_features, edge_indices, edge_features, variable_features,
                                  candidates, candidate_choice, candidate_scores)
        graph.num_nodes = constraint_features.shape[0]+variable_features.shape[0]
        return graph

We can then prepare the data loaders.

In [8]:
sample_files = [str(path) for path in Path('samples/').glob('sample_*.pkl')]
train_files = sample_files[:int(0.8*len(sample_files))]
valid_files = sample_files[int(0.8*len(sample_files)):]

train_data = GraphDataset(train_files)
train_loader = torch_geometric.data.DataLoader(train_data, 32, shuffle=True)
valid_data = GraphDataset(valid_files)
valid_loader = torch_geometric.data.DataLoader(valid_data, 128, shuffle=False)

Next, we will define our graph neural network architecture.

In [9]:
class GNNPolicy(torch.nn.Module):
    def __init__(self):
        super().__init__()
        emb_size = 64
        cons_nfeats = 5
        edge_nfeats = 1
        var_nfeats = 19

        # CONSTRAINT EMBEDDING
        self.cons_embedding = torch.nn.Sequential(
            torch.nn.LayerNorm(cons_nfeats),
            torch.nn.Linear(cons_nfeats, emb_size),
            torch.nn.ReLU(),
            torch.nn.Linear(emb_size, emb_size),
            torch.nn.ReLU(),
        )

        # EDGE EMBEDDING
        self.edge_embedding = torch.nn.Sequential(
            torch.nn.LayerNorm(edge_nfeats),
        )

        # VARIABLE EMBEDDING
        self.var_embedding = torch.nn.Sequential(
            torch.nn.LayerNorm(var_nfeats),
            torch.nn.Linear(var_nfeats, emb_size),
            torch.nn.ReLU(),
            torch.nn.Linear(emb_size, emb_size),
            torch.nn.ReLU(),
        )

        self.conv_v_to_c = BipartiteGraphConvolution()
        self.conv_c_to_v = BipartiteGraphConvolution()

        self.output_module = torch.nn.Sequential(
            torch.nn.Linear(emb_size, emb_size),
            torch.nn.ReLU(),
            torch.nn.Linear(emb_size, 1, bias=False),
        )

    def forward(self, constraint_features, edge_indices, edge_features, variable_features):
        reversed_edge_indices = torch.stack([edge_indices[1], edge_indices[0]], dim=0)
        
        constraint_features = self.cons_embedding(constraint_features)
        edge_features = self.edge_embedding(edge_features)
        variable_features = self.var_embedding(variable_features)

        constraint_features = self.conv_v_to_c(variable_features, reversed_edge_indices, edge_features, constraint_features)
        variable_features = self.conv_c_to_v(constraint_features, edge_indices, edge_features, variable_features)

        output = self.output_module(variable_features).squeeze(-1)
        return output
    

class BipartiteGraphConvolution(torch_geometric.nn.MessagePassing):
    def __init__(self):
        super().__init__('add')
        emb_size = 64
        
        self.feature_module_left = torch.nn.Sequential(
            torch.nn.Linear(emb_size, emb_size)
        )
        self.feature_module_edge = torch.nn.Sequential(
            torch.nn.Linear(1, emb_size, bias=False)
        )
        self.feature_module_right = torch.nn.Sequential(
            torch.nn.Linear(emb_size, emb_size, bias=False)
        )
        self.feature_module_final = torch.nn.Sequential(
            torch.nn.LayerNorm(emb_size),
            torch.nn.ReLU(),
            torch.nn.Linear(emb_size, emb_size)
        )
        
        self.post_conv_module = torch.nn.Sequential(
            torch.nn.LayerNorm(emb_size)
        )

        # output_layers
        self.output_module = torch.nn.Sequential(
            torch.nn.Linear(2*emb_size, emb_size),
            torch.nn.ReLU(),
            torch.nn.Linear(emb_size, emb_size),
        )

    def forward(self, left_features, edge_indices, edge_features, right_features):
        output = self.propagate(edge_indices, size=(left_features.shape[0], right_features.shape[0]), 
                                node_features=(left_features, right_features), edge_features=edge_features)
        return self.output_module(torch.cat([self.post_conv_module(output), right_features], dim=-1))

    def message(self, node_features_i, node_features_j, edge_features):
        output = self.feature_module_final(self.feature_module_left(node_features_i) 
                                           + self.feature_module_edge(edge_features) 
                                           + self.feature_module_right(node_features_j))
        return output
    

policy = GNNPolicy().to(DEVICE)

With this model we can predict a probability distribution over actions as follows.

In [10]:
observation = train_data[0].to(DEVICE)

logits = policy(observation.constraint_features, observation.edge_index, observation.edge_attr, observation.variable_features)
action_distribution = F.softmax(logits[observation.candidates], dim=-1)

print(action_distribution)

tensor([0.0105, 0.0104, 0.0105, 0.0104, 0.0104, 0.0105, 0.0105, 0.0105, 0.0104,
        0.0105, 0.0105, 0.0104, 0.0105, 0.0104, 0.0104, 0.0104, 0.0105, 0.0104,
        0.0104, 0.0104, 0.0104, 0.0105, 0.0104, 0.0104, 0.0104, 0.0105, 0.0104,
        0.0105, 0.0105, 0.0105, 0.0104, 0.0104, 0.0105, 0.0104, 0.0105, 0.0105,
        0.0105, 0.0104, 0.0104, 0.0105, 0.0105, 0.0105, 0.0105, 0.0104, 0.0104,
        0.0104, 0.0104, 0.0104, 0.0105, 0.0104, 0.0105, 0.0105, 0.0105, 0.0105,
        0.0105, 0.0104, 0.0104, 0.0105, 0.0104, 0.0104, 0.0104, 0.0105, 0.0104,
        0.0104, 0.0104, 0.0104, 0.0104, 0.0105, 0.0104, 0.0105, 0.0104, 0.0105,
        0.0104, 0.0105, 0.0104, 0.0105, 0.0103, 0.0104, 0.0104, 0.0104, 0.0104,
        0.0104, 0.0105, 0.0104, 0.0104, 0.0104, 0.0104, 0.0104, 0.0104, 0.0104,
        0.0104, 0.0105, 0.0104, 0.0105, 0.0104, 0.0105], device='cuda:0',
       grad_fn=<SoftmaxBackward>)


As can be seen, with randomly initialized weights, the initial distributions tend to be close to uniform.
Next, we will define two helper functions: one to train or evaluate the model on a whole epoch and compute metrics for monitoring, and one for padding tensors when doing predictions on a batch of graphs of potentially different number of variables.

In [11]:
def process(policy, data_loader, optimizer=None):
    mean_loss = 0
    mean_kacc = np.zeros(len([1, 3, 5, 10]))

    n_samples_processed = 0
    with torch.set_grad_enabled(optimizer is not None):
        for batch in data_loader:
            batch = batch.to(DEVICE)
            logits = policy(batch.constraint_features, batch.edge_index, batch.edge_attr, batch.variable_features)
            logits = pad_tensor(logits[batch.candidates], batch.nb_candidates)
            loss = F.cross_entropy(logits, batch.candidate_choices)

            if optimizer is not None:
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

            true_scores = pad_tensor(batch.candidate_scores, batch.nb_candidates)
            true_bestscore = true_scores.max(dim=-1, keepdims=True).values

            kacc = []
            for k in [1, 3, 5, 10]:
                if logits.size()[-1] < k:
                    kacc.append(1.0)
                    continue
                pred_top_k = logits.topk(k).indices
                pred_top_k_true_scores = true_scores.gather(-1, pred_top_k)
                accuracy = (pred_top_k_true_scores == true_bestscore).any(dim=-1).float().mean().item()
                kacc.append(accuracy)
            kacc = np.asarray(kacc)

            mean_loss += loss.item() * batch.num_graphs
            mean_kacc += kacc * batch.num_graphs
            n_samples_processed += batch.num_graphs

    mean_loss /= n_samples_processed
    mean_kacc /= n_samples_processed
    return mean_loss, mean_kacc


def pad_tensor(input_, pad_sizes, pad_value=-1e8):
    max_pad_size = pad_sizes.max()
    output = input_.split(pad_sizes.cpu().numpy().tolist())
    output = torch.stack([F.pad(slice_, (0, max_pad_size-slice_.size(0)), 'constant', pad_value)
                          for slice_ in output], dim=0)
    return output

After this, we can actually create the model and train it.

In [12]:
optimizer = torch.optim.Adam(policy.parameters(), lr=LEARNING_RATE)
best_loss = np.inf
for epoch in range(MAX_EPOCHS + 1):
    train_loss, train_kacc = process(policy, train_loader, optimizer)
    print(f"TRAIN LOSS: {train_loss:0.3f} " + "".join([f" acc@{k}: {acc:0.3f}" for k, acc in zip([1, 3, 5, 10], train_kacc)]))

    valid_loss, valid_kacc = process(policy, valid_loader, None)
    print(f"VALID LOSS: {valid_loss:0.3f} " + "".join([f" acc@{k}: {acc:0.3f}" for k, acc in zip([1, 3, 5, 10], valid_kacc)]))

    if valid_loss < best_loss:
        plateau_count = 0
        best_loss = valid_loss
        torch.save(policy.state_dict(), 'trained_params.pkl')
        print(f"  best model so far")
    else:
        plateau_count += 1
        if plateau_count % EARLY_STOPPING == 0:
            print(f"  {plateau_count} epochs without improvement, early stopping")
            break
        if plateau_count % PATIENCE == 0:
            optimizer.param_groups[0]['lr'] *= 0.2
            print(f"  {plateau_count} epochs without improvement, decreasing learning rate to {optimizer.param_groups[0]['lr']}")

        
policy.load_state_dict(torch.load('trained_params.pkl'))
valid_loss, valid_kacc = process(policy, valid_loader, None)
print(f"BEST VALID LOSS: {valid_loss:0.3f} " + "".join([f" acc@{k}: {acc:0.3f}" for k, acc in zip([1, 3, 5, 10], valid_kacc)]))

TRAIN LOSS: 4.288  acc@1: 0.265 acc@3: 0.424 acc@5: 0.495 acc@10: 0.620
VALID LOSS: 3.957  acc@1: 0.285 acc@3: 0.485 acc@5: 0.560 acc@10: 0.645
  best model so far
TRAIN LOSS: 3.701  acc@1: 0.411 acc@3: 0.569 acc@5: 0.635 acc@10: 0.728
VALID LOSS: 3.524  acc@1: 0.490 acc@3: 0.615 acc@5: 0.700 acc@10: 0.810
  best model so far
TRAIN LOSS: 3.483  acc@1: 0.472 acc@3: 0.615 acc@5: 0.691 acc@10: 0.814
VALID LOSS: 3.581  acc@1: 0.455 acc@3: 0.580 acc@5: 0.660 acc@10: 0.805
TRAIN LOSS: 3.486  acc@1: 0.456 acc@3: 0.619 acc@5: 0.699 acc@10: 0.810
VALID LOSS: 3.496  acc@1: 0.485 acc@3: 0.590 acc@5: 0.690 acc@10: 0.800
  best model so far
TRAIN LOSS: 3.465  acc@1: 0.465 acc@3: 0.626 acc@5: 0.703 acc@10: 0.820
VALID LOSS: 3.520  acc@1: 0.465 acc@3: 0.620 acc@5: 0.695 acc@10: 0.795
TRAIN LOSS: 3.456  acc@1: 0.466 acc@3: 0.625 acc@5: 0.698 acc@10: 0.810
VALID LOSS: 3.452  acc@1: 0.485 acc@3: 0.620 acc@5: 0.690 acc@10: 0.820
  best model so far
TRAIN LOSS: 3.465  acc@1: 0.474 acc@3: 0.623 acc@5: 0.69

# 3 Evaluation

Finally, we can evaluate the performance of the model. We first define appropriate environments. For benchmarking purposes, we include a trivial environment that merely runs SCIP.

In [13]:
instances = ecole.instance.SetCoverGenerator(n_rows=500, n_cols=1000, density=0.05)
scip_parameters = {'separating/maxrounds': 0, 'presolving/maxrestarts': 0, 'limits/time': 3600}
env = ecole.environment.Branching(observation_function=ecole.observation.NodeBipartite(), 
                                  information_function={"nb_nodes": ecole.reward.NNodes(), "time": ecole.reward.SolvingTime()}, 
                                  scip_params=scip_parameters)
default_env = ecole.environment.Configuring(observation_function=None,
                                            information_function={"nb_nodes": ecole.reward.NNodes(), "time": ecole.reward.SolvingTime()}, 
                                            scip_params=scip_parameters)

Then we can simply follow the environments, taking steps appropriately according to the GNN policy.

In [14]:
for instance_count, instance in zip(range(20), instances):
    nb_nodes, time = 0, 0

    observation, action_set, _, done, info = env.reset(instance)
    while not done:
        with torch.no_grad():
            observation = (torch.from_numpy(observation.row_features.astype(np.float32)).to(DEVICE),
                           torch.from_numpy(observation.edge_features.indices.astype(np.int64)).to(DEVICE), 
                           torch.from_numpy(observation.edge_features.values.astype(np.float32)).view(-1, 1).to(DEVICE),
                           torch.from_numpy(observation.column_features.astype(np.float32)).to(DEVICE))
            logits = policy(*observation)
            action = action_set[logits[action_set.astype(np.int64)].argmax()]
            observation, action_set, _, done, info = env.step(action)
        nb_nodes += info['nb_nodes']
        time += info['time']

    # Run SCIP's default brancher
    default_env.reset(instance.copy_orig())
    _, _, _, _, default_info = default_env.step({})

    print(f"Instance {instance_count: >3} | SCIP nb nodes {int(default_info['nb_nodes']): >4d} "
          f"| GNN nb nodes {int(nb_nodes): >4d} | Gain {100*(1-nb_nodes/default_info['nb_nodes']): >8.2f}%")
    print(f"             | SCIP time   {default_info['time']: >6.2f} "
          f"| GNN time   {time: >6.2f} | Gain {100*(1-time/default_info['time']): >8.2f}%")

Instance   0 | SCIP nb nodes  956 | GNN nb nodes 1294 | Gain   -35.36%
             | SCIP time    10.27 | GNN time     6.40 | Gain    37.68%
Instance   1 | SCIP nb nodes   88 | GNN nb nodes  308 | Gain  -250.00%
             | SCIP time     5.64 | GNN time     1.74 | Gain    69.15%
Instance   2 | SCIP nb nodes   37 | GNN nb nodes  270 | Gain  -629.73%
             | SCIP time     5.17 | GNN time     1.34 | Gain    74.08%
Instance   3 | SCIP nb nodes  145 | GNN nb nodes  268 | Gain   -84.83%
             | SCIP time     6.55 | GNN time     1.55 | Gain    76.34%
Instance   4 | SCIP nb nodes   34 | GNN nb nodes  248 | Gain  -629.41%
             | SCIP time     4.30 | GNN time     1.14 | Gain    73.49%
Instance   5 | SCIP nb nodes   17 | GNN nb nodes  110 | Gain  -547.06%
             | SCIP time     4.39 | GNN time     0.59 | Gain    86.56%
Instance   6 | SCIP nb nodes   69 | GNN nb nodes  196 | Gain  -184.06%
             | SCIP time     5.22 | GNN time     1.18 | Gain    77.39%
Instan

We can also evaluate on instances larger and harder than those trained on, say with 600 rather than 500 contraints.
In addition, we showcase that the cumulative number of nodes and time required to solve an instance can also be computed directly using the `.cumsum()` method.

In [15]:
instances = ecole.instance.SetCoverGenerator(n_rows=600, n_cols=1000, density=0.05)
scip_parameters = {'separating/maxrounds': 0, 'presolving/maxrestarts': 0, 'limits/time': 3600}
env = ecole.environment.Branching(observation_function=ecole.observation.NodeBipartite(), 
                                  information_function={"nb_nodes": ecole.reward.NNodes().cumsum(), 
                                                        "time": ecole.reward.SolvingTime().cumsum()}, 
                                  scip_params=scip_parameters)
default_env = ecole.environment.Configuring(observation_function=None,
                                            information_function={"nb_nodes": ecole.reward.NNodes().cumsum(), 
                                                                  "time": ecole.reward.SolvingTime().cumsum()}, 
                                            scip_params=scip_parameters)

for instance_count, instance in zip(range(20), instances):
    observation, action_set, _, done, info = env.reset(instance)
    while not done:
        with torch.no_grad():
            observation = (torch.from_numpy(observation.row_features.astype(np.float32)).to(DEVICE),
                           torch.from_numpy(observation.edge_features.indices.astype(np.int64)).to(DEVICE), 
                           torch.from_numpy(observation.edge_features.values.astype(np.float32)).view(-1, 1).to(DEVICE),
                           torch.from_numpy(observation.column_features.astype(np.float32)).to(DEVICE))
            logits = policy(*observation)
            action = action_set[logits[action_set.astype(np.int64)].argmax()]
            observation, action_set, _, done, info = env.step(action)
    nb_nodes = info['nb_nodes']
    time = info['time']

    # Run SCIP's default brancher
    default_env.reset(instance.copy_orig())
    _, _, _, _, default_info = default_env.step({})

    print(f"Instance {instance_count: >3} | SCIP nb nodes {int(default_info['nb_nodes']): >4d} "
          f"| GNN nb nodes {int(nb_nodes): >4d} | Gain {100*(1-nb_nodes/default_info['nb_nodes']): >8.2f}%")
    print(f"             | SCIP time   {default_info['time']: >6.2f} "
          f"| GNN time   {time: >6.2f} | Gain {100*(1-time/default_info['time']): >8.2f}%")

Instance   0 | SCIP nb nodes   93 | GNN nb nodes  502 | Gain  -439.78%
             | SCIP time     5.07 | GNN time     4.72 | Gain     6.90%
Instance   1 | SCIP nb nodes 1601 | GNN nb nodes 3109 | Gain   -94.19%
             | SCIP time    14.64 | GNN time    20.19 | Gain   -37.91%
Instance   2 | SCIP nb nodes  361 | GNN nb nodes  878 | Gain  -143.21%
             | SCIP time     9.74 | GNN time     8.84 | Gain     9.24%
Instance   3 | SCIP nb nodes    5 | GNN nb nodes   29 | Gain  -480.00%
             | SCIP time     4.29 | GNN time     3.07 | Gain    28.44%
Instance   4 | SCIP nb nodes  641 | GNN nb nodes 1616 | Gain  -152.11%
             | SCIP time    11.48 | GNN time    12.78 | Gain   -11.32%
Instance   5 | SCIP nb nodes   11 | GNN nb nodes  151 | Gain -1272.73%
             | SCIP time     5.29 | GNN time     4.03 | Gain    23.82%
Instance   6 | SCIP nb nodes    5 | GNN nb nodes   47 | Gain  -840.00%
             | SCIP time     3.29 | GNN time     2.63 | Gain    20.06%
Instan