## See README.md file for further details about the project and the environment.

### State-Action Description

### State
State s is an array with give components

* s[0]:  constraint matrix $A$of the current LP ($\max  -c^Tx \text{ s.t. }Ax \le  b$) . Dimension is $m \times n$. See by printing s[0].shape. Here $n$ is the (fixed) number of variables. For instances of size 60 by 60 used in the above command, $n$ will remain fixed as 60. And $m$ is the current number of constraints. Initially, $m$ is to the number of constraints in the IP instance. (For instances generated with --num-c=60, $m$ is 60 at the first step).  But $m$ will increase by one in every step of the episode as one new constraint (cut) is added on taking an action.
* s[1]: rhs $b$ for the current LP ($Ax\le b$). Dimension same as the number $m$ in matrix A.
* s[2]: coefficient vector $c$ from the LP objective ($-c^Tx$). Dimension same as the number of variables, i.e., $n$.
* s[3],  s[4]: Gomory cuts available in the current round of Gomory's cutting plane algorithm. Each cut $i$ is of the form $D_i x\le d_i$.   s[3] gives the matrix $D$ (of dimension $k \times n$) of cuts and s[4] gives the rhs $d$ (of dimension $k$). The number of cuts $k$ available in each round changes, you can find it out by printing the size of last component of state, i.e., s[4].size or s[-1].size.

### Actions
There are k=s[4].size actions available in each state $s$, with $i^{th}$ action corresponding to the $i^{th}$ cut with inequality $D_i x\le d_i$ in $s[3], s[4]$.

In [None]:
#Run below after copying the folder "Project_learn2cut" to your google drive

#You will need to allow google drive to mount

from google.colab import drive
drive.mount('/content/drive')
from google.colab import files

#IMPORTANT change below to 
#!cp -av /content/drive/<path>  /content/ 
#where <path> is the path to folder Project_learn2cut in your google drive. You can click on the folder icon on left and navigate to the path of this folder under drive/MyDrive to find the path.

!cp -av /content/drive/MyDrive/Colab\ Notebooks/IEOR\ 4575\ Spring\ 2022\ public/Project_learn2cut/* /content/

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
'/content/drive/MyDrive/Colab Notebooks/IEOR 4575 Spring 2022 public/Project_learn2cut/example.ipynb' -> '/content/example.ipynb'
'/content/drive/MyDrive/Colab Notebooks/IEOR 4575 Spring 2022 public/Project_learn2cut/example.py' -> '/content/example.py'
'/content/drive/MyDrive/Colab Notebooks/IEOR 4575 Spring 2022 public/Project_learn2cut/findgaps.py' -> '/content/findgaps.py'
'/content/drive/MyDrive/Colab Notebooks/IEOR 4575 Spring 2022 public/Project_learn2cut/generate_randomip.py' -> '/content/generate_randomip.py'
'/content/drive/MyDrive/Colab Notebooks/IEOR 4575 Spring 2022 public/Project_learn2cut/gurobiutils.py' -> '/content/gurobiutils.py'
'/content/drive/MyDrive/Colab Notebooks/IEOR 4575 Spring 2022 public/Project_learn2cut/gymenv.py' -> '/content/gymenv.py'
'/content/drive/MyDrive/Colab Notebooks/IEOR 4575 Spring 2022 public/Project_learn2cut/gymenv

In [None]:
!pip install -i https://pypi.gurobi.com gurobipy

Looking in indexes: https://pypi.gurobi.com


In [None]:
!pip install wandb -qqq

In [None]:
import gymenv_v2
from gymenv_v2 import make_multiple_env
import numpy as np


import wandb
wandb.login()
run=wandb.init(project="finalproject", entity="ieor4575-spring2022", tags=["training-easy"])
#run=wandb.init(project="finalproject", entity="ieor-4575", tags=["training-hard"])
#run=wandb.init(project="finalproject", entity="ieor-4575", tags=["test"])

### TRAINING

# Setup: You may generate your own instances on which you train the cutting agent.
custom_config = {
    "load_dir"        : 'instances/randomip_n60_m60',   # this is the location of the randomly generated instances (you may specify a different directory)
    "idx_list"        : list(range(20)),                # take the first 20 instances from the directory
    "timelimit"       : 50,                             # the maximum horizon length is 50
    "reward_type"     : 'obj'                           # DO NOT CHANGE reward_type
}

# Easy Setup: Use the following environment settings. We will evaluate your agent with the same easy config below:
easy_config = {
    "load_dir"        : 'instances/train_10_n60_m60',
    "idx_list"        : list(range(10)),
    "timelimit"       : 50,
    "reward_type"     : 'obj'
}

# Hard Setup: Use the following environment settings. We will evaluate your agent with the same hard config below:
hard_config = {
    "load_dir"        : 'instances/train_100_n60_m60',
    "idx_list"        : list(range(99)),
    "timelimit"       : 50,
    "reward_type"     : 'obj'
}

if __name__ == "__main__":
    # create env
    env = make_multiple_env(**easy_config) 

    for e in range(20):
        # gym loop
        s = env.reset()   # samples a random instance every time env.reset() is called
        d = False
        t = 0
        repisode = 0

        while not d:
            #Take a random action
            a = np.random.randint(0, s[-1].size, 1)            # s[-1].size shows the number of actions, i.e., cuts available at state s
            
            #simulate the environment to get the next state
            s, r, d, _ = env.step(list(a))
            print('episode', e, 'step', t, 'reward', r, 'action space size', s[-1].size, 'action', a[0])
            
            A, b, c0, cuts_a, cuts_b = s
            #print(A.shape, b.shape, c0.shape, cuts_a.shape, cuts_b.shape)

            t += 1
            repisode += r

    	    #wandb logging
            wandb.log({"Training reward (easy config)" : repisode})
	    #make sure to use the correct tag in wandb.init in the initialization on top



<IPython.core.display.Javascript object>

[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize


wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33magrawals[0m (use `wandb login --relogin` to force relogin)


loading training instances, dir instances/train_10_n60_m60 idx 0
loading training instances, dir instances/train_10_n60_m60 idx 1
loading training instances, dir instances/train_10_n60_m60 idx 2
loading training instances, dir instances/train_10_n60_m60 idx 3
loading training instances, dir instances/train_10_n60_m60 idx 4
loading training instances, dir instances/train_10_n60_m60 idx 5
loading training instances, dir instances/train_10_n60_m60 idx 6
loading training instances, dir instances/train_10_n60_m60 idx 7
loading training instances, dir instances/train_10_n60_m60 idx 8
loading training instances, dir instances/train_10_n60_m60 idx 9
Restricted license - for non-production use only - expires 2023-10-25
episode 0 step 0 reward 0.004097137678400031 action space size 62 action 2
episode 0 step 1 reward 0.003368279641108529 action space size 63 action 5
episode 0 step 2 reward 0.010062164006740204 action space size 63 action 15
episode 0 step 3 reward 0.00034169465379818575 action 