# Branching with Imitation Learning and a GNN

In this tutorial we will reproduce a simplified version of the paper of Gasse et al. (2019) on learning to branch with Ecole with `pytorch` and `pytorch geometric`. We collect strong branching examples on randomly generated maximum set covering instances, then train a graph neural network with bipartite state encodings to imitate the expert by classification. Finally, we will evaluate the quality of the policy.

The biggest difference with Gasse et al. (2019) is that only n=1,000 training examples of expert decisions are collected for training, to keep the time needed to run the tutorial reasonable. As a consequence, the resulting policy is undertrained and is not competitive with SCIP's default branching rule.

Users that are interested in reproducing competitive performance should use a larger sample size, such as the n=100,000 samples used for training in the paper. In this case, we strongly recommend to parallelize data collection, as in the original Gasse et al. (2019) code.

### Requirements
This tutorial requires the following libraries. The version numbers used when writing this tutorial are given in parentheses.
- `python` (3.8.2)
- `numpy` (1.19.4)
- `pytorch` (1.7.0)
- `pytorch-geometric` (1.6.2)
- `ecole` (0.6.0)

The tutorial was designed with the provided version numbers.

## 1. Data collection

Our first step will be to run explore-then-strong-branch on randomly generated maximum set covering instances, and save the branching decisions to build a dataset. We will also record the state of the branch-and-bound process as a bipartite graph, which is already implemented in Ecole with the same features as Gasse et al. (2019).

In [1]:
import gzip
import pickle
import numpy as np
import ecole
from pathlib import Path

MAX_SAMPLES = 1000

We will use the Ecole-provided set cover instance generator.

In [2]:
instances = ecole.instance.SetCoverGenerator(n_rows=500, n_cols=1000, density=0.05)

The explore-then-strong-branch scheme described in the paper is not implemented by default in Ecole. In this scheme, to diversify the states in which we collect examples of strong branching behavior, we mostly follow a weak but cheap expert (pseudocost branching) and only occasionally call the strong expert (strong branching). This also ensures that samples are closer to being independent and identically distributed.

This can be realized in Ecole by creating a custom observation function, which will randomly compute and return the pseudocost scores (cheap) or the strong branching scores (expensive). It also showcases extensibility in Ecole by showing how easily a custom observation function can be created and used, directly in Python.

In [3]:
class ExploreThenStrongBranch:
    """
    This custom observation function class will randomly return either strong branching scores (expensive expert) 
    or pseudocost scores (weak expert for exploration) when called at every node.
    """
    def __init__(self, expert_probability):
        self.expert_probability = expert_probability
        self.pseudocosts_function = ecole.observation.Pseudocosts()
        self.strong_branching_function = ecole.observation.StrongBranchingScores()
    
    def before_reset(self, model):
        """
        This function will be called at initialization of the environment (before dynamics are reset).
        """
        self.pseudocosts_function.before_reset(model)
        self.strong_branching_function.before_reset(model)
    
    def extract(self, model, done):
        """
        Should we return strong branching or pseudocost scores at time node?
        """
        probabilities = [1-self.expert_probability, self.expert_probability]
        expert_chosen = bool(np.random.choice(np.arange(2), p=probabilities))
        if expert_chosen:
            return (self.strong_branching_function.extract(model, done), True)
        else:
            return (self.pseudocosts_function.extract(model, done), False)

We can now create the environment with the correct parameters (no restarts, 1h time limit, 5% expert sampling probability).

Besides the (pseudocost or strong branching) scores, our environment will return the node bipartite graph representation of 
branch-and-bound states used in Gasse et al. (2019), using the `ecole.observation.NodeBipartite` observation function.
On one side of that bipartite graph, nodes represent the variables of the problem, with a vector encoding features of 
that variable. On the other side of the bipartite graph, nodes represent the constraints of the problem, similarly with 
a vector encoding features of that constraint. An edge links a variable and a constraint node if the variable participates 
in that constraint, that is, its coefficient is nonzero in that constraint. The constraint coefficient is attached as an
attribute of the edge.

In [4]:
# We can pass custom SCIP parameters easily
scip_parameters = {'separating/maxrounds': 0, 'presolving/maxrestarts': 0, 'limits/time': 3600}

# Note how we can tuple observation functions to return complex state information
env = ecole.environment.Branching(observation_function=(ExploreThenStrongBranch(expert_probability=0.05), 
                                                        ecole.observation.NodeBipartite()), 
                                  scip_params=scip_parameters)

# This will seed the environment for reproducibility
env.seed(0)

Now we loop over the instances, following the strong branching expert 5% of the time and saving its decision, until enough samples are collected.

In [5]:
episode_counter, sample_counter = 0, 0
Path('samples/').mkdir(exist_ok=True)

# We will solve problems (run episodes) until we have saved enough samples
max_samples_reached = False
while not max_samples_reached:
    episode_counter += 1
    
    observation, action_set, _, done, _ = env.reset(next(instances))
    while not done:
        (scores, scores_are_expert), node_observation = observation
        action = action_set[scores[action_set].argmax()]

        # Only save samples if they are coming from the expert (strong branching)
        if scores_are_expert and not max_samples_reached:
            sample_counter += 1
            data = [node_observation, action, action_set, scores]
            filename = f'samples/sample_{sample_counter}.pkl'

            with gzip.open(filename, 'wb') as f:
                pickle.dump(data, f)
            
            # If we collected enough samples, we finish the current episode but stop saving samples
            if sample_counter == MAX_SAMPLES:
                max_samples_reached = True

        observation, action_set, _, done, _ = env.step(action)
    
    print(f"Episode {episode_counter}, {sample_counter} samples collected so far")

Episode 1, 2 samples collected so far
Episode 2, 2 samples collected so far
Episode 3, 2 samples collected so far
Episode 4, 7 samples collected so far
Episode 5, 7 samples collected so far
Episode 6, 7 samples collected so far
Episode 7, 7 samples collected so far
Episode 8, 8 samples collected so far
Episode 9, 8 samples collected so far
Episode 10, 8 samples collected so far
Episode 11, 8 samples collected so far
Episode 12, 9 samples collected so far
Episode 13, 9 samples collected so far
Episode 14, 10 samples collected so far
Episode 15, 13 samples collected so far
Episode 16, 13 samples collected so far
Episode 17, 16 samples collected so far
Episode 18, 24 samples collected so far
Episode 19, 24 samples collected so far
Episode 20, 57 samples collected so far
Episode 21, 59 samples collected so far
Episode 22, 59 samples collected so far
Episode 23, 60 samples collected so far
Episode 24, 60 samples collected so far
Episode 25, 60 samples collected so far
Episode 26, 60 samples

# 2. Train a GNN

Our next step is to train a GNN classifier on these collected samples to predict similar choices to strong branching.

In [6]:
import torch
import torch.nn.functional as F
import torch_geometric

LEARNING_RATE = 0.001
NB_EPOCHS = 50
PATIENCE = 10
EARLY_STOPPING = 20
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

We will first define pytorch geometric data classes to handle the bipartite graph data.

In [7]:
class BipartiteNodeData(torch_geometric.data.Data):
    """
    This class encode a node bipartite graph observation as returned by the `ecole.observation.NodeBipartite` 
    observation function in a format understood by the pytorch geometric data handlers.
    """
    def __init__(self, constraint_features, edge_indices, edge_features, variable_features,
                 candidates, candidate_choice, candidate_scores):
        super().__init__()
        self.constraint_features = torch.FloatTensor(constraint_features)
        self.edge_index = torch.LongTensor(edge_indices.astype(np.int64))
        self.edge_attr = torch.FloatTensor(edge_features).unsqueeze(1)
        self.variable_features = torch.FloatTensor(variable_features)
        self.candidates = candidates
        self.nb_candidates = len(candidates)
        self.candidate_choices = candidate_choice
        self.candidate_scores = candidate_scores

    def __inc__(self, key, value):
        """
        We overload the pytorch geometric method that tells how to increment indices when concatenating graphs 
        for those entries (edge index, candidates) for which this is not obvious.
        """
        if key == 'edge_index':
            return torch.tensor([[self.constraint_features.size(0)], [self.variable_features.size(0)]])
        elif key == 'candidates':
            return self.variable_features.size(0)
        else:
            return super().__inc__(key, value)


class GraphDataset(torch_geometric.data.Dataset):
    """
    This class encodes a collection of graphs, as well as a method to load such graphs from the disk.
    It can be used in turn by the data loaders provided by pytorch geometric.
    """
    def __init__(self, sample_files):
        super().__init__(root=None, transform=None, pre_transform=None)
        self.sample_files = sample_files

    def len(self):
        return len(self.sample_files)

    def get(self, index):
        """
        This method loads a node bipartite graph observation as saved on the disk during data collection.
        """
        with gzip.open(self.sample_files[index], 'rb') as f:
            sample = pickle.load(f)

        sample_observation, sample_action, sample_action_set, sample_scores = sample
        
        # We note on which variables we were allowed to branch, the scores as well as the choice 
        # taken by strong branching (relative to the candidates)
        candidates = torch.LongTensor(np.array(sample_action_set, dtype=np.int32))
        candidate_scores = torch.FloatTensor([sample_scores[j] for j in candidates])
        candidate_choice = torch.where(candidates == sample_action)[0][0]

        graph = BipartiteNodeData(sample_observation.row_features, sample_observation.edge_features.indices, 
                                  sample_observation.edge_features.values, sample_observation.column_features,
                                  candidates, candidate_choice, candidate_scores)
        
        # We must tell pytorch geometric how many nodes there are, for indexing purposes
        graph.num_nodes = sample_observation.row_features.shape[0]+sample_observation.column_features.shape[0]
        
        return graph

We can then prepare the data loaders.

In [8]:
sample_files = [str(path) for path in Path('samples/').glob('sample_*.pkl')]
train_files = sample_files[:int(0.8*len(sample_files))]
valid_files = sample_files[int(0.8*len(sample_files)):]

train_data = GraphDataset(train_files)
train_loader = torch_geometric.data.DataLoader(train_data, batch_size=32, shuffle=True)
valid_data = GraphDataset(valid_files)
valid_loader = torch_geometric.data.DataLoader(valid_data, batch_size=128, shuffle=False)

Next, we will define our graph neural network architecture.

In [9]:
class GNNPolicy(torch.nn.Module):
    def __init__(self):
        super().__init__()
        emb_size = 64
        cons_nfeats = 5
        edge_nfeats = 1
        var_nfeats = 19

        # CONSTRAINT EMBEDDING
        self.cons_embedding = torch.nn.Sequential(
            torch.nn.LayerNorm(cons_nfeats),
            torch.nn.Linear(cons_nfeats, emb_size),
            torch.nn.ReLU(),
            torch.nn.Linear(emb_size, emb_size),
            torch.nn.ReLU(),
        )

        # EDGE EMBEDDING
        self.edge_embedding = torch.nn.Sequential(
            torch.nn.LayerNorm(edge_nfeats),
        )

        # VARIABLE EMBEDDING
        self.var_embedding = torch.nn.Sequential(
            torch.nn.LayerNorm(var_nfeats),
            torch.nn.Linear(var_nfeats, emb_size),
            torch.nn.ReLU(),
            torch.nn.Linear(emb_size, emb_size),
            torch.nn.ReLU(),
        )

        self.conv_v_to_c = BipartiteGraphConvolution()
        self.conv_c_to_v = BipartiteGraphConvolution()

        self.output_module = torch.nn.Sequential(
            torch.nn.Linear(emb_size, emb_size),
            torch.nn.ReLU(),
            torch.nn.Linear(emb_size, 1, bias=False),
        )

    def forward(self, constraint_features, edge_indices, edge_features, variable_features):
        reversed_edge_indices = torch.stack([edge_indices[1], edge_indices[0]], dim=0)
        
        # First step: linear embedding layers to a common dimension (64)
        constraint_features = self.cons_embedding(constraint_features)
        edge_features = self.edge_embedding(edge_features)
        variable_features = self.var_embedding(variable_features)

        # Two half convolutions
        constraint_features = self.conv_v_to_c(variable_features, reversed_edge_indices, edge_features, constraint_features)
        variable_features = self.conv_c_to_v(constraint_features, edge_indices, edge_features, variable_features)

        # A final MLP on the variable features
        output = self.output_module(variable_features).squeeze(-1)
        return output
    

class BipartiteGraphConvolution(torch_geometric.nn.MessagePassing):
    """
    The bipartite graph convolution is already provided by pytorch geometric and we merely need 
    to provide the exact form of the messages being passed.
    """
    def __init__(self):
        super().__init__('add')
        emb_size = 64
        
        self.feature_module_left = torch.nn.Sequential(
            torch.nn.Linear(emb_size, emb_size)
        )
        self.feature_module_edge = torch.nn.Sequential(
            torch.nn.Linear(1, emb_size, bias=False)
        )
        self.feature_module_right = torch.nn.Sequential(
            torch.nn.Linear(emb_size, emb_size, bias=False)
        )
        self.feature_module_final = torch.nn.Sequential(
            torch.nn.LayerNorm(emb_size),
            torch.nn.ReLU(),
            torch.nn.Linear(emb_size, emb_size)
        )
        
        self.post_conv_module = torch.nn.Sequential(
            torch.nn.LayerNorm(emb_size)
        )

        # output_layers
        self.output_module = torch.nn.Sequential(
            torch.nn.Linear(2*emb_size, emb_size),
            torch.nn.ReLU(),
            torch.nn.Linear(emb_size, emb_size),
        )

    def forward(self, left_features, edge_indices, edge_features, right_features):
        """
        This method sends the messages, computed in the message method.
        """
        output = self.propagate(edge_indices, size=(left_features.shape[0], right_features.shape[0]), 
                                node_features=(left_features, right_features), edge_features=edge_features)
        return self.output_module(torch.cat([self.post_conv_module(output), right_features], dim=-1))

    def message(self, node_features_i, node_features_j, edge_features):
        output = self.feature_module_final(self.feature_module_left(node_features_i) 
                                           + self.feature_module_edge(edge_features) 
                                           + self.feature_module_right(node_features_j))
        return output
    

policy = GNNPolicy().to(DEVICE)

With this model we can predict a probability distribution over actions as follows.

In [10]:
observation = train_data[0].to(DEVICE)

logits = policy(observation.constraint_features, observation.edge_index, observation.edge_attr, observation.variable_features)
action_distribution = F.softmax(logits[observation.candidates], dim=-1)

print(action_distribution)

tensor([0.0159, 0.0159, 0.0159, 0.0158, 0.0158, 0.0159, 0.0159, 0.0159, 0.0159,
        0.0159, 0.0159, 0.0159, 0.0158, 0.0158, 0.0159, 0.0158, 0.0159, 0.0159,
        0.0159, 0.0159, 0.0159, 0.0159, 0.0159, 0.0159, 0.0159, 0.0159, 0.0159,
        0.0158, 0.0159, 0.0159, 0.0159, 0.0158, 0.0159, 0.0159, 0.0159, 0.0159,
        0.0158, 0.0158, 0.0159, 0.0159, 0.0158, 0.0159, 0.0159, 0.0158, 0.0159,
        0.0159, 0.0159, 0.0159, 0.0159, 0.0158, 0.0158, 0.0159, 0.0159, 0.0159,
        0.0159, 0.0159, 0.0159, 0.0159, 0.0159, 0.0159, 0.0158, 0.0159, 0.0159],
       grad_fn=<SoftmaxBackward>)


As can be seen, with randomly initialized weights, the initial distributions tend to be close to uniform.
Next, we will define two helper functions: one to train or evaluate the model on a whole epoch and compute metrics for monitoring, and one for padding tensors when doing predictions on a batch of graphs of potentially different number of variables.

In [11]:
def process(policy, data_loader, optimizer=None):
    """
    This function will process a whole epoch of training or validation, depending on whether an optimizer is provided.
    """
    mean_loss = 0
    mean_acc = 0

    n_samples_processed = 0
    with torch.set_grad_enabled(optimizer is not None):
        for batch in data_loader:
            batch = batch.to(DEVICE)
            # Compute the logits (i.e. pre-softmax activations) according to the policy on the concatenated graphs
            logits = policy(batch.constraint_features, batch.edge_index, batch.edge_attr, batch.variable_features)
            # Index the results by the candidates, and split and pad them
            logits = pad_tensor(logits[batch.candidates], batch.nb_candidates)
            # Compute the usual cross-entropy classification loss
            loss = F.cross_entropy(logits, batch.candidate_choices)

            if optimizer is not None:
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

            true_scores = pad_tensor(batch.candidate_scores, batch.nb_candidates)
            true_bestscore = true_scores.max(dim=-1, keepdims=True).values
            
            predicted_bestindex = logits.max(dim=-1, keepdims=True).indices
            accuracy = (true_scores.gather(-1, predicted_bestindex) == true_bestscore).float().mean().item()

            mean_loss += loss.item() * batch.num_graphs
            mean_acc += accuracy * batch.num_graphs
            n_samples_processed += batch.num_graphs

    mean_loss /= n_samples_processed
    mean_acc /= n_samples_processed
    return mean_loss, mean_acc


def pad_tensor(input_, pad_sizes, pad_value=-1e8):
    """
    This utility function splits a tensor and pads each split to make them all the same size, then stacks them.
    """
    max_pad_size = pad_sizes.max()
    output = input_.split(pad_sizes.cpu().numpy().tolist())
    output = torch.stack([F.pad(slice_, (0, max_pad_size-slice_.size(0)), 'constant', pad_value)
                          for slice_ in output], dim=0)
    return output

After this, we can actually create the model and train it.

In [12]:
optimizer = torch.optim.Adam(policy.parameters(), lr=LEARNING_RATE)
for epoch in range(NB_EPOCHS):
    print(f"Epoch {epoch+1}")
    
    train_loss, train_acc = process(policy, train_loader, optimizer)
    print(f"Train loss: {train_loss:0.3f}, accuracy {train_acc:0.3f}" )

    valid_loss, valid_acc = process(policy, valid_loader, None)
    print(f"Valid loss: {valid_loss:0.3f}, accuracy {valid_acc:0.3f}" )

torch.save(policy.state_dict(), 'trained_params.pkl')

Epoch 1
Train loss: 3.963, accuracy 0.367
Valid loss: 3.555, accuracy 0.425
Epoch 2
Train loss: 3.429, accuracy 0.485
Valid loss: 3.412, accuracy 0.440
Epoch 3
Train loss: 3.373, accuracy 0.490
Valid loss: 3.439, accuracy 0.445
Epoch 4
Train loss: 3.392, accuracy 0.492
Valid loss: 3.376, accuracy 0.460
Epoch 5
Train loss: 3.370, accuracy 0.490
Valid loss: 3.435, accuracy 0.425
Epoch 6
Train loss: 3.351, accuracy 0.496
Valid loss: 3.375, accuracy 0.425
Epoch 7
Train loss: 3.332, accuracy 0.487
Valid loss: 3.446, accuracy 0.445
Epoch 8
Train loss: 3.319, accuracy 0.509
Valid loss: 3.419, accuracy 0.430
Epoch 9
Train loss: 3.336, accuracy 0.501
Valid loss: 3.427, accuracy 0.485
Epoch 10
Train loss: 3.336, accuracy 0.499
Valid loss: 3.469, accuracy 0.425
Epoch 11
Train loss: 3.359, accuracy 0.494
Valid loss: 3.381, accuracy 0.430
Epoch 12
Train loss: 3.332, accuracy 0.501
Valid loss: 3.453, accuracy 0.455
Epoch 13
Train loss: 3.315, accuracy 0.500
Valid loss: 3.400, accuracy 0.485
Epoch 14

# 3 Evaluation

Finally, we can evaluate the performance of the model. We first define appropriate environments. For benchmarking purposes, we include a trivial environment that merely runs SCIP.

In [13]:
scip_parameters = {'separating/maxrounds': 0, 'presolving/maxrestarts': 0, 'limits/time': 3600}
env = ecole.environment.Branching(observation_function=ecole.observation.NodeBipartite(), 
                                  information_function={"nb_nodes": ecole.reward.NNodes(), 
                                                        "time": ecole.reward.SolvingTime()}, 
                                  scip_params=scip_parameters)
default_env = ecole.environment.Configuring(observation_function=None,
                                            information_function={"nb_nodes": ecole.reward.NNodes(), 
                                                                  "time": ecole.reward.SolvingTime()}, 
                                            scip_params=scip_parameters)

Then we can simply follow the environments, taking steps appropriately according to the GNN policy.

In [14]:
instances = ecole.instance.SetCoverGenerator(n_rows=500, n_cols=1000, density=0.05)
for instance_count, instance in zip(range(20), instances):
    # Run the GNN brancher
    nb_nodes, time = 0, 0
    observation, action_set, _, done, info = env.reset(instance)
    nb_nodes += info['nb_nodes']
    time += info['time']
    while not done:
        with torch.no_grad():
            observation = (torch.from_numpy(observation.row_features.astype(np.float32)).to(DEVICE),
                           torch.from_numpy(observation.edge_features.indices.astype(np.int64)).to(DEVICE), 
                           torch.from_numpy(observation.edge_features.values.astype(np.float32)).view(-1, 1).to(DEVICE),
                           torch.from_numpy(observation.column_features.astype(np.float32)).to(DEVICE))
            logits = policy(*observation)
            action = action_set[logits[action_set.astype(np.int64)].argmax()]
            observation, action_set, _, done, info = env.step(action)
        nb_nodes += info['nb_nodes']
        time += info['time']

    # Run SCIP's default brancher
    default_env.reset(instance)
    _, _, _, _, default_info = default_env.step({})
    
    print(f"Instance {instance_count: >3} | SCIP nb nodes    {int(default_info['nb_nodes']): >4d}  | SCIP time   {default_info['time']: >6.2f} ")
    print(f"             | GNN  nb nodes    {int(nb_nodes): >4d}  | GNN  time   {time: >6.2f} ")
    print(f"             | Gain         {100*(1-nb_nodes/default_info['nb_nodes']): >8.2f}% | Gain      {100*(1-time/default_info['time']): >8.2f}%")

Instance   0 | SCIP nb nodes       3  | SCIP time     2.42 
             | GNN  nb nodes      47  | GNN  time     3.23 
             | Gain         -1466.67% | Gain        -33.69%
Instance   1 | SCIP nb nodes      13  | SCIP time     4.20 
             | GNN  nb nodes     145  | GNN  time     7.29 
             | Gain         -1015.38% | Gain        -73.39%
Instance   2 | SCIP nb nodes       1  | SCIP time     0.37 
             | GNN  nb nodes       1  | GNN  time     0.32 
             | Gain             0.00% | Gain         15.18%
Instance   3 | SCIP nb nodes      17  | SCIP time     2.41 
             | GNN  nb nodes     123  | GNN  time     4.58 
             | Gain          -623.53% | Gain        -89.74%
Instance   4 | SCIP nb nodes      11  | SCIP time     2.06 
             | GNN  nb nodes      91  | GNN  time     3.33 
             | Gain          -727.27% | Gain        -61.42%
Instance   5 | SCIP nb nodes       1  | SCIP time     1.16 
             | GNN  nb nodes       9  | 

We can also evaluate on instances larger and harder than those trained on, say with 600 rather than 500 constraints.
In addition, we showcase that the cumulative number of nodes and time required to solve an instance can also be computed directly using the `.cumsum()` method.

In [15]:
instances = ecole.instance.SetCoverGenerator(n_rows=600, n_cols=1000, density=0.05)
scip_parameters = {'separating/maxrounds': 0, 'presolving/maxrestarts': 0, 'limits/time': 3600}
env = ecole.environment.Branching(observation_function=ecole.observation.NodeBipartite(), 
                                  information_function={"nb_nodes": ecole.reward.NNodes().cumsum(), 
                                                        "time": ecole.reward.SolvingTime().cumsum()}, 
                                  scip_params=scip_parameters)
default_env = ecole.environment.Configuring(observation_function=None,
                                            information_function={"nb_nodes": ecole.reward.NNodes().cumsum(), 
                                                                  "time": ecole.reward.SolvingTime().cumsum()}, 
                                            scip_params=scip_parameters)

for instance_count, instance in zip(range(20), instances):
    # Run the GNN brancher
    observation, action_set, _, done, info = env.reset(instance)
    while not done:
        with torch.no_grad():
            observation = (torch.from_numpy(observation.row_features.astype(np.float32)).to(DEVICE),
                           torch.from_numpy(observation.edge_features.indices.astype(np.int64)).to(DEVICE), 
                           torch.from_numpy(observation.edge_features.values.astype(np.float32)).view(-1, 1).to(DEVICE),
                           torch.from_numpy(observation.column_features.astype(np.float32)).to(DEVICE))
            logits = policy(*observation)
            action = action_set[logits[action_set.astype(np.int64)].argmax()]
            observation, action_set, _, done, info = env.step(action)
    nb_nodes = info['nb_nodes']
    time = info['time']

    # Run SCIP's default brancher
    default_env.reset(instance)
    _, _, _, _, default_info = default_env.step({})

    print(f"Instance {instance_count: >3} | SCIP nb nodes    {int(default_info['nb_nodes']): >4d}  | SCIP time   {default_info['time']: >6.2f} ")
    print(f"             | GNN  nb nodes    {int(nb_nodes): >4d}  | GNN  time   {time: >6.2f} ")
    print(f"             | Gain         {100*(1-nb_nodes/default_info['nb_nodes']): >8.2f}% | Gain      {100*(1-time/default_info['time']): >8.2f}%")

Instance   0 | SCIP nb nodes       7  | SCIP time     3.10 
             | GNN  nb nodes      79  | GNN  time     4.27 
             | Gain         -1028.57% | Gain        -37.85%
Instance   1 | SCIP nb nodes       7  | SCIP time     2.72 
             | GNN  nb nodes      59  | GNN  time     3.26 
             | Gain          -742.86% | Gain        -19.87%
Instance   2 | SCIP nb nodes       3  | SCIP time     2.75 
             | GNN  nb nodes      17  | GNN  time     2.35 
             | Gain          -466.67% | Gain         14.46%
Instance   3 | SCIP nb nodes       9  | SCIP time     2.67 
             | GNN  nb nodes      89  | GNN  time     4.16 
             | Gain          -888.89% | Gain        -55.91%
Instance   4 | SCIP nb nodes      19  | SCIP time     3.64 
             | GNN  nb nodes     115  | GNN  time     5.58 
             | Gain          -505.26% | Gain        -53.45%
Instance   5 | SCIP nb nodes      23  | SCIP time     3.39 
             | GNN  nb nodes      83  | 

### References

Gasse, M., Chételat, D., Ferroni, N., Charlin, L. and Lodi, A. (2019). Exact combinatorial optimization with graph convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 15580-15592).