If we pre-solve an instance before giving it to the agent, then at each step include the final solution's variable values in the variable features we give to the NN, does this at least improve the agent? 

Ideally we want to show that the agent can beat strong branching and/or Gasse et al. 2019 imitation_100k. By including the final solution variable values in the observation features, hope is that agent can learn that e.g. branching decisions which would result in bounding the variable values away from optimum value are poor. This assumes that the agent can learn these dynamics 

 

In [None]:
%load_ext autoreload
%autoreload
from retro_branching.environments import EcoleBranching, EcoleConfiguring
from retro_branching.agents import StrongBranchingAgent, PseudocostBranchingAgent

import ecole
import numpy as np

First, lets check that solving an instance with EcoleConfiguring leads to the same result as using e.g. strong branching in our normal ecole branching environment. If it does, then it is valid to use EcoleConfiguring to pre-solve, which will be faster.

In [None]:
# agent
agent = PseudocostBranchingAgent() # PseudocostBranchingAgent() StrongBranchingAgent()

# envs
env = EcoleBranching(observation_function='default',
                      information_function='default',
                      reward_function='default',
                      scip_params='default')
env.seed(0)
configuring_env = EcoleConfiguring(observation_function='default',
                                  information_function='default',
                                  scip_params='default')
configuring_env.seed(0)

# instances
instances = ecole.instance.SetCoverGenerator(n_rows=500, n_cols=1000, density=0.05)

In [None]:
num_episodes = 10
for ep in range(num_episodes):
    print(f'> Episode {ep} <')
    
    # find an instance not pre-solved by environment
    obs = None
    while obs is None:
        env.seed(0)
        instance = next(instances)
        instance_before_reset = instance.copy_orig()
        agent.before_reset(instance_before_reset.copy_orig())
        obs, action_set, reward, done, info = env.reset(instance)
    configuring_env.seed(0)
    _, _, _, _, configuring_info = configuring_env.reset(instance_before_reset.copy_orig())
    
    # pre-solve instance with configuring env
    _, _, _, _, configuring_info = configuring_env.step({})
    
    # solve instance with standard branching env
    while not done:
        action, action_idx = agent.action_select(action_set, env.model, done)
        obs, action_set, reward, done, info = env.step(action)
    
    configuring_m = configuring_env.model.as_pyscipopt()
    config_vars = configuring_m.getVars()
    config_sol = configuring_m.getBestSol()
    config_dual, config_primal, config_gap = configuring_m.getDualbound(), configuring_m.getPrimalbound(), configuring_m.getGap()
    
    m = env.model.as_pyscipopt()
    branching_vars = m.getVars()
    branching_sol = m.getBestSol()
    branching_dual, branching_primal,branching_gap = m.getDualbound(), m.getPrimalbound(), m.getGap()
    
    print('Configuring env num_nodes: {} | dual: {} | primal: {} | gap: {}'.format(configuring_info['num_nodes'], config_dual, config_primal, config_gap))
    print('Branching env num_nodes: {} | dual: {} | primal: {} | gap: {}'.format(info['num_nodes'], branching_dual, branching_primal, branching_gap))
    
    for config_var, branching_var in zip(config_vars, branching_vars):
        print('config/branching var {}/{}: configuring env={}, branching_env={} -> equal={}'.format(config_var, branching_var, config_sol[config_var], branching_sol[branching_var], config_sol[config_var] == branching_sol[branching_var]))
        if config_sol[config_var] != branching_sol[branching_var]:
            raise Exception('Diff vals found')

From above study, found that around 20-30% of the time, final solution variable values are different for the same instance (but optimality i.e. primal and dual bound are same), so some problems can have different optimum solutions. This is fine, but does mean certain branching decisions which would be labelled as 'bad' by our pre-solving labelling method may actually not be bad (but we're just guiding agent by labelling obs with one possible solution, so doesn't really matter).

On the whole, confident that can use the EcoleConfiguring() environment to pre-solve the problems.

In [None]:
num_episodes = 1
for ep in range(num_episodes):
    print(f'> Episode {ep} <')
    
    # find an instance not pre-solved by environment
    obs = None
    while obs is None:
        env.seed(0)
        instance = next(instances)
        instance_before_reset = instance.copy_orig()
        agent.before_reset(instance_before_reset.copy_orig())
        obs, action_set, reward, done, info = env.reset(instance)
    configuring_env.seed(0)
    _, _, _, _, _ = configuring_env.reset(instance_before_reset.copy_orig())
    
    # pre-solve instance with configuring env
    _, _, _, _, _ = configuring_env.step({})
    
    # label vars in obs with final solution values
    m = configuring_env.model.as_pyscipopt()
    solution = m.getBestSol()
    sol_vals = np.array([solution[var] for var in m.getVars()]).T
    obs.column_features = np.column_stack((obs.column_features, sol_vals))
    
    

Have implemented the above in retro_branching. Sanity checking....

In [None]:
%autoreload
from retro_branching.environments import EcoleBranching, EcoleConfiguring

# normal env no labeling
env = EcoleBranching()
env.seed(0)
obs, action_set, reward, done, info = env.reset(instance_before_reset.copy_orig())

# env w/ labeling
env2 = EcoleBranching(observation_function='label_solution_values')
env2.seed(0)
obs2, action_set2, reward2, done2, info2 = env2.reset(instance_before_reset.copy_orig())

In [None]:
print(f'env w/o labeling: {obs.column_features.shape} | env w/ labeling: {obs2.column_features.shape}')

 Check that assinging env2 variable values would solve env instance

In [None]:
# get pre-solved solution objective value
presolve_solution = env2.observation_function.presolve_solution
print(f'Pre-solved solution objective value: {env2.observation_function.presolve_m.getSolObjVal(presolve_solution)}')

# check pre-solved solution objective value in normal env
m = env.model.as_pyscipopt()
print(f'Pre-solved solution objective value in normal env: {m.getSolObjVal(presolve_solution)}')

# # assign values to env solution and check that they solve the problem
# m = env.model.as_pyscipopt()
# print(f'dual/primal/gap: {m.getDualbound()} {m.getPrimalbound()} {m.getGap()}')
# sol = m.getBestSol()
# for var in m.getVars():
#     print(f'Setting variable {var} to {presolve_solution[var]}')
#     m.setSolVal(sol, var, presolve_solution[var])
# print(f'dual/primal/gap: {m.getDualbound()} {m.getPrimalbound()} {m.getGap()}')

As shown, objective function values are the same, so can be confident we have correctly labelled the variables with their final values and that ordering etc. hasn't changed.