<a href="https://colab.research.google.com/github/ClaireZixiWang/learn2cut/blob/main/example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## See README.md file for further details about the project and the environment.

### State-Action Description

### State
State s is an array with give components

* s[0]:  constraint matrix $A$of the current LP ($\max  -c^Tx \text{ s.t. }Ax \le  b$) . Dimension is $m \times n$. See by printing s[0].shape. Here $n$ is the (fixed) number of variables. For instances of size 60 by 60 used in the above command, $n$ will remain fixed as 60. And $m$ is the current number of constraints. Initially, $m$ is to the number of constraints in the IP instance. (For instances generated with --num-c=60, $m$ is 60 at the first step).  But $m$ will increase by one in every step of the episode as one new constraint (cut) is added on taking an action.
* s[1]: rhs $b$ for the current LP ($Ax\le b$). Dimension same as the number $m$ in matrix A.
* s[2]: coefficient vector $c$ from the LP objective ($-c^Tx$). Dimension same as the number of variables, i.e., $n$.
* s[3],  s[4]: Gomory cuts available in the current round of Gomory's cutting plane algorithm. Each cut $i$ is of the form $D_i x\le d_i$.   s[3] gives the matrix $D$ (of dimension $k \times n$) of cuts and s[4] gives the rhs $d$ (of dimension $k$). The number of cuts $k$ available in each round changes, you can find it out by printing the size of last component of state, i.e., s[4].size or s[-1].size.

### Actions
There are k=s[4].size actions available in each state $s$, with $i^{th}$ action corresponding to the $i^{th}$ cut with inequality $D_i x\le d_i$ in $s[3], s[4]$.

### ***QUESTIONS***:
1. By "current" LP, you mean the LP that the agent was running in the last state? As in, the LP with all the added constraints? 
  * ==> I think so.
1. What do you mean Gomory cuts *available*? As in, after doing Simplex methods, the *variables* that you can choose to cut?
  * Yes I think so.
2. Isn't the number of variables (n) changing? in the C-G cutting plane method?
  * No, as the spec says, **$n$ is the fixed number of variables**.
  * If you look that cuttng plane lecture notes, you can see that after each step, the dummy variable is not added in the constraint. They are merely there for the sake of the LP solver (simplex method), but not really relevant for us.
    * This is not correct, I think they are still very much relevant, it's just that I think among the 60 variables a lot of them are space holders for dummy variables so that our $n$ is fixed, so that we don't have to worry about using LSTM. Since each time the sequence [a, b] will be of size n+1. And we can just use a fixed-input-size network to do that.
    * But still need to verify with the TA about the place holder understanding.
3. dimension of s[3] and s[4]? Where is the "available all" stored? In which dimension?
  * Each row of D is an "available cut". Therefore each $D_i x\le d_i$ is an "available" cut in CG method solved from the simplex method.
4. pointing towards the slides: why does the number of constraints m increase 1 in each step, if you can choose *multiple* cuts in one step? (OR in the algorithm we just choose one cut each time? or is that a more vanilla version to start, but to expand on multiple cuts a time later?)
5. What do you mean by each "instance"?

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd drive/MyDrive/IEOR_RL/Project_learn2cut
%pwd

/content/drive/MyDrive/IEOR_RL/Project_learn2cut


'/content/drive/MyDrive/IEOR_RL/Project_learn2cut'

In [None]:
!pwd

/content/drive/MyDrive/IEOR_RL/Project_learn2cut


In [None]:
!pip install -i https://pypi.gurobi.com gurobipy

Looking in indexes: https://pypi.gurobi.com
Collecting gurobipy
  Downloading gurobipy-9.5.1-cp37-cp37m-manylinux2014_x86_64.whl (11.5 MB)
[K     |████████████████████████████████| 11.5 MB 5.9 MB/s 
[?25hInstalling collected packages: gurobipy
Successfully installed gurobipy-9.5.1


In [None]:
!pip install wandb -qqq

[K     |████████████████████████████████| 1.8 MB 7.4 MB/s 
[K     |████████████████████████████████| 144 kB 46.0 MB/s 
[K     |████████████████████████████████| 181 kB 66.0 MB/s 
[K     |████████████████████████████████| 63 kB 1.6 MB/s 
[?25h  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [None]:
# MLP model for policy model:
#   model.forward
#   model.train --> What is in this function? what are the function arguments?
# Q value model
#   Can I just use the one in Lab4? What does it mean? what does the states and actions mean? --> Print out the s, r to check
#   What is a Q-value in our set-up?
#   How do I used this? 
#   (What's the baseline function??)

In [None]:
import gymenv_v2
from gymenv_v2 import make_multiple_env
import numpy as np


import wandb
wandb.login()
run=wandb.init(project="finalproject", entity="ieor4575-spring2022", tags=["training-easy"])
#run=wandb.init(project="finalproject", entity="ieor-4575", tags=["training-hard"])
#run=wandb.init(project="finalproject", entity="ieor-4575", tags=["test"])

### TRAINING

# Setup: You may generate your own instances on which you train the cutting agent.
custom_config = {
    "load_dir"        : 'instances/randomip_n60_m60',   # this is the location of the randomly generated instances (you may specify a different directory)
    "idx_list"        : list(range(20)),                # take the first 20 instances from the directory
    "timelimit"       : 50,                             # the maximum horizon length is 50
    "reward_type"     : 'obj'                           # DO NOT CHANGE reward_type
}

# Easy Setup: Use the following environment settings. We will evaluate your agent with the same easy config below:
easy_config = {
    "load_dir"        : 'instances/train_10_n60_m60',
    "idx_list"        : list(range(10)),
    "timelimit"       : 50,
    "reward_type"     : 'obj'
}

# Hard Setup: Use the following environment settings. We will evaluate your agent with the same hard config below:
hard_config = {
    "load_dir"        : 'instances/train_100_n60_m60',
    "idx_list"        : list(range(99)),
    "timelimit"       : 50,
    "reward_type"     : 'obj'
}

if __name__ == "__main__":
    # create env
    env = make_multiple_env(**easy_config) 

    for e in range(20):
        # gym loop
        s = env.reset()   # samples a random instance every time env.reset() is called
        d = False
        t = 0
        repisode = 0

        while not d:
            #Take a random action
            a = np.random.randint(0, s[-1].size, 1)            # s[-1].size shows the number of actions, i.e., cuts available at state s
            
            #simulate the environment to get the next state
            s, r, d, _ = env.step(list(a))
            print('episode', e, 'step', t, 'reward', r, 'action space size', s[-1].size, 'action', a[0])
            
            A, b, c0, cuts_a, cuts_b = s
            #print(A.shape, b.shape, c0.shape, cuts_a.shape, cuts_b.shape)

            t += 1
            repisode += r

    	    #wandb logging
            wandb.log({"Training reward (easy config)" : repisode})
	    #make sure to use the correct tag in wandb.init in the initialization on top



<IPython.core.display.Javascript object>

[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize


wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mclaire-zixi-wang[0m ([33mieor4575-spring2022[0m). Use [1m`wandb login --relogin`[0m to force relogin


loading training instances, dir instances/train_10_n60_m60 idx 0
loading training instances, dir instances/train_10_n60_m60 idx 1
loading training instances, dir instances/train_10_n60_m60 idx 2
loading training instances, dir instances/train_10_n60_m60 idx 3
loading training instances, dir instances/train_10_n60_m60 idx 4
loading training instances, dir instances/train_10_n60_m60 idx 5
loading training instances, dir instances/train_10_n60_m60 idx 6
loading training instances, dir instances/train_10_n60_m60 idx 7
loading training instances, dir instances/train_10_n60_m60 idx 8
loading training instances, dir instances/train_10_n60_m60 idx 9
Restricted license - for non-production use only - expires 2023-10-25
episode 0 step 0 reward 0.04913616336898485 action space size 59 action 54
episode 0 step 1 reward 0.009015209293920634 action space size 63 action 45
episode 0 step 2 reward 0.018220236171600845 action space size 62 action 47
episode 0 step 3 reward 0.0009171547822006687 action 