In [1]:
import sympy as sp

In [2]:
from deepgroebner.pg import PPOAgent
from deepgroebner.networks import ParallelMultilayerPerceptron
from deepgroebner.ideals import FixedIdealGenerator
from deepgroebner.buchberger import BuchbergerEnv, LeadMonomialsWrapper

First create the list of polynomials using SymPy, and then construct a `BuchbergerEnv` which always starts with the given list.

In [3]:
R, x, y, z = sp.ring('x,y,z', sp.FF(32003), 'grevlex')
F = [x + y + z, x*y + y*z + x*z, x*y*z - 1]
ideal_gen = FixedIdealGenerator(F)
env = BuchbergerEnv(ideal_gen)

You can play with the environment yourself using the `reset` and `step` methods. The `step` method takes in a choice of one of the available pairs, and returns a tuple of `(next_state, reward, done, info)`. This follows the interface of [OpenAI Gym](https://gym.openai.com/).

In [4]:
env.reset()

([x + y + z, x*y + x*z + y*z, x*y*z + 32002 mod 32003], {(0, 1), (0, 2)})

In [5]:
env.step((0, 1))

(([x + y + z, x*y + x*z + y*z, x*y*z + 32002 mod 32003, y**2 + y*z + z**2],
  {(0, 2)}),
 -2,
 False,
 {})

In [6]:
env.step((0, 2))

(([x + y + z,
   x*y + x*z + y*z,
   x*y*z + 32002 mod 32003,
   y**2 + y*z + z**2,
   z**3 + 32002 mod 32003],
  set()),
 -2,
 True,
 {})

Now find where you saved the weights of a previously trained model. Make sure `n` (the number of variables), `k` (the number of lead monomials visible), and `policy_hl` (the size of hidden layers in the network) are all the same as in training.

The current `run.py` script saves the weights in subdirectories of `data/runs/` that are named from the date and a hash of the parameters.

In [7]:
n = 3
k = 2
policy_hl = [128]

# replace with your saved policy weights file
filename = "../data/runs/run1/policy-50.h5"

With these numbers fixed, we can construct the agent and and wrapped environment that it can act on (the state of the wrapped environment is a matrix with rows corresponding to each pair, and an action is a choice of row).

In [8]:
wrapped_env = LeadMonomialsWrapper(env, k=k)
network = ParallelMultilayerPerceptron(2 * k * n, policy_hl)
agent = PPOAgent(network)
agent.load_policy_weights(filename)

Use the environments `reset` and `step` methods as above, and the agent's `act` method, which returns the agent's choice of action, to step through a computation.

In [9]:
state = wrapped_env.reset()
state

array([[1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1],
       [1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0]])

In [10]:
action = agent.act(state)
action

1

In [11]:
state = wrapped_env.step(action)
state

(array([[1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1]]), -1, False, {})

You can also view the state of the original unwrapped environment.

In [12]:
wrapped_env.env.G, wrapped_env.env.P

([x + y + z,
  x*y + x*z + y*z,
  x*y*z + 32002 mod 32003,
  y**2*z + y*z**2 + 1 mod 32003],
 {(0, 1)})