License

```
Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the
LICENSE file in the root directory of this source tree.
```

# Using CompilerGym environments with RLlib

In this notebook we will use [RLlib](https://docs.ray.io/en/master/rllib.html) to train an agent for CompilerGym's [LLVM environment](https://facebookresearch.github.io/CompilerGym/llvm/index.html). RLlib is a popular library for scalable reinforcement learning, built on [Ray](https://docs.ray.io/en/master/index.html). It provides distributed implementations of several standard reinforcement learning algorithms.

Our goal is not to produce the best agent, but to demonstrate how to integrate CompilerGym with RLlib. It will take about 20 minutes to work through. Let's get started!

## Installation

We'll begin by installing the `compiler_gym` and `ray` packages:

In [8]:
!pip install compiler_gym 'ray[default,rllib]' &>/dev/null || echo "Install failed!"

# Print the versions of the libraries that we are using:
import compiler_gym
import ray

print("compiler_gym version:", compiler_gym.__version__)
print("ray version:", ray.__version__)

compiler_gym version: 0.2.3
ray version: 1.9.0


## Defining an Environment

Next we will define the environment to use for our experiments. For the purposes of a simple demo we will apply two simplifying constraints to CompilerGym's LLVM environment:

1. We will use only a small subset of the command line flag action space.
2. We will clip the length of episodes to a maximum number of steps.

To make things simple we will define a `make_env()` helper function to create our environment, and use the [compiler_gym.wrappers](https://facebookresearch.github.io/CompilerGym/compiler_gym/wrappers.html) API to implement these constraints. There is quite a lot going on in this cell, be sure to read through the comments for an explanation of what is going on!

In [15]:
import sys
import os
os.environ.setdefault("COMPILER_GYM_USER_DATA", "/home/dx4/tools/compiler2/compiler2_service/benchmarks")



'/home/dx4/tools/compiler2/compiler2_service/benchmarks'

In [10]:
from compiler_gym.spaces import Reward
from compiler_gym.third_party import llvm
from compiler_gym.util.logging import init_logging
from compiler_gym.util.registration import register
from compiler_gym.util.runfiles_path import runfiles_path, site_data_path
from compiler_gym.service.connection import ServiceError
import compiler2_service.utils

In [11]:
from compiler_gym.envs.llvm.datasets import (
    AnghaBenchDataset,
    BlasDataset,
    CBenchDataset,
    CBenchLegacyDataset,
    CBenchLegacyDataset2,
    CHStoneDataset,
    CsmithDataset,
    NPBDataset,
)

from compiler2_service.agent_py.rewards import perf_reward, runtime_reward
from compiler2_service.agent_py.datasets import poj104_dataset, hpctoolkit_dataset


def register_env():
    register(
        id="perf-v0",
        entry_point="compiler_gym.envs:CompilerEnv",
        kwargs={
            "service": compiler2_service.utils.COMPILER2_SERVICE_PY,
            "rewards": [ perf_reward.Reward(),
                         runtime_reward.Reward()
            ],
            "datasets": [
                hpctoolkit_dataset.Dataset(),
                CBenchDataset(site_data_path("llvm-v0")),
                CsmithDataset(site_data_path("llvm-v0"), sort_order=0),
                NPBDataset(site_data_path("llvm-v0"), sort_order=0),
                BlasDataset(site_data_path("llvm-v0"), sort_order=0),
                AnghaBenchDataset(site_data_path("llvm-v0"), sort_order=0),
                CHStoneDataset(site_data_path("llvm-v0"), sort_order=0),
            ],
        },
    )

register_env()

/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)


In [12]:
from compiler_gym.wrappers import ConstrainedCommandline, TimeLimit
from ray import tune

def make_env() -> compiler_gym.envs.CompilerEnv:
    """Make the reinforcement learning environment for this experiment."""
    # We will use LLVM as our base environment. Here we specify the observation
    # space from this paper: https://arxiv.org/pdf/2003.00671.pdf and the total
    # IR instruction count as our reward space, normalized against the 
    # performance of LLVM's -Oz policy.
    env = compiler_gym.make(
        "perf-v0",
        # action_space="llvm-autophase"
        observation_space="perf",
        reward_space="perf",
    )
   
    env = TimeLimit(env, max_episode_steps=5)
    return env

In [16]:
# Let's create an environment and print a few attributes just to check that we 
# have everything set up the way that we would like.
with make_env() as env:
    print("Action space:", env.action_space)
    print("Observation space:", env.observation_space)
    print("Reward space:", env.reward_space)

What is the path /home/dx4/tools/miniconda3/lib/python3.8/site-packages/compiler_gym_examples-0.2.3-py3.8.egg/compiler2_service/service_py/example_service.py
Is that file:  True


/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
Using backend: pytorch
./compiler_gym-llvm-service: /lib64/libtinfo.so.5: no version information available (required by ./compiler_gym-llvm-service)


Action space: NamedDiscrete([-add-discriminators, -adce, -aggressive-instcombine, -alignment-from-assumptions, -always-inline, -argpromotion, -attributor, -barrier, -bdce, -break-crit-edges, -simplifycfg, -callsite-splitting, -called-value-propagation, -canonicalize-aliases, -consthoist, -constmerge, -constprop, -coro-cleanup, -coro-early, -coro-elide, -coro-split, -correlated-propagation, -cross-dso-cfi, -deadargelim, -dce, -die, -dse, -reg2mem, -div-rem-pairs, -early-cse-memssa, -early-cse, -elim-avail-extern, -ee-instrument, -flattencfg, -float2int, -forceattrs, -inline, -insert-gcov-profiling, -gvn-hoist, -gvn, -globaldce, -globalopt, -globalsplit, -guard-widening, -hotcoldsplit, -ipconstprop, -ipsccp, -indvars, -irce, -infer-address-spaces, -inferattrs, -inject-tli-mappings, -instsimplify, -instcombine, -instnamer, -jump-threading, -lcssa, -licm, -libcalls-shrinkwrap, -load-store-vectorizer, -loop-data-prefetch, -loop-deletion, -loop-distribute, -loop-fusion, -loop-guard-widening,

## Datasets

Now that we have an environment, we will need a set of programs to train on. In CompilerGym, these programs are called *benchmarks*. CompilerGym ships with [several sets of benchmarks](https://facebookresearch.github.io/CompilerGym/llvm/index.html#datasets). Here we will take a handful of benchmarks from the `npb-v0` dataset for training. We will then further divide this set into training and validation sets. We will use `chstone-v0` as a holdout test set.

In [17]:
from itertools import islice

with make_env() as env:
  # The two datasets we will be using:
  npb = env.datasets["benchmark://hpctoolkit-cpu-v0"]
  chstone = env.datasets["chstone-v0"]

  # Each dataset has a `benchmarks()` method that returns an iterator over the
  # benchmarks within the dataset. Here we will use iterator sliceing to grab a 
  # handful of benchmarks for training and validation.
  train_benchmarks = list(islice(npb.benchmarks(), 55))
  train_benchmarks, val_benchmarks = train_benchmarks[:50], train_benchmarks[50:]
  # We will use the entire chstone-v0 dataset for testing.
  test_benchmarks = list(chstone.benchmarks())

print("Number of benchmarks for training:", len(train_benchmarks))
print("Number of benchmarks for validation:", len(val_benchmarks))
print("Number of benchmarks for testing:", len(test_benchmarks))

What is the path /home/dx4/tools/miniconda3/lib/python3.8/site-packages/compiler_gym_examples-0.2.3-py3.8.egg/compiler2_service/service_py/example_service.py
Is that file:  True


/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
Using backend: pytorch
./compiler_gym-llvm-service: /lib64/libtinfo.so.5: no version information available (required by ./compiler_gym-llvm-service)


Number of benchmarks for training: 4
Number of benchmarks for validation: 0
Number of benchmarks for testing: 12


## Registering the environment with RLlib

Now that we have our environment and training benchmarks, we can register the environment for use with RLlib. To do this we will define a second `make_training_env()` helper that uses the [CycleOverBenchmarks](https://facebookresearch.github.io/CompilerGym/compiler_gym/wrappers.html#compiler_gym.wrappers.CycleOverBenchmarks) wrapper to ensure that the environment uses all of the training benchmarks. We then call `tune.register_env()`, assining the environment a name.

In [18]:
from compiler_gym.wrappers import CycleOverBenchmarks

def make_training_env(*args) -> compiler_gym.envs.CompilerEnv:
  """Make a reinforcement learning environment that cycles over the
  set of training benchmarks in use.
  """
  del args  # Unused env_config argument passed by ray
  return CycleOverBenchmarks(make_env(), train_benchmarks)

tune.register_env("compiler_gym", make_training_env)

In [19]:
# Lets cycle through a few calls to reset() to demonstrate that this environment
# selects a new benchmark for each episode.
with make_training_env() as env:
  env.reset()
  print(env.benchmark)
  env.reset()
  print(env.benchmark)
  env.reset()
  print(env.benchmark)

What is the path /home/dx4/tools/miniconda3/lib/python3.8/site-packages/compiler_gym_examples-0.2.3-py3.8.egg/compiler2_service/service_py/example_service.py
Is that file:  True


/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
/home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang: /lib64/libtinfo.so.5: no version information available (required by /home/dx4/.local/share/compiler_gym/llvm-v0/bin/clang)
Using backend: pytorch
./compiler_gym-llvm-service: /lib64/libtinfo.so.5: no version information available (required by ./compiler_gym-llvm-service)



 /dev/shm/compiler_gym_dx4/s/0328T212650-249506-262c 

> /dev/shm/compiler_gym_dx4/s/0328T212650-249506-262c/example_service.py(148)__init__()
(Pdb) 

## Run the training loop

Now that we have the environment set up, let's run a training loop. Here will use RLlib's [Proximal Policy Optimization](https://docs.ray.io/en/master/rllib-algorithms.html#ppo) implementation, and run a very short training loop just for demonstative purposes.

In [None]:
import ray
from ray.rllib.agents.ppo import PPOTrainer

# (Re)Start the ray runtime.
if ray.is_initialized():
  ray.shutdown()
ray.init(include_dashboard=False, ignore_reinit_error=True)

tune.register_env("compiler_gym", make_training_env)

analysis = tune.run(
    PPOTrainer,
    checkpoint_at_end=True,
    stop={
        "episodes_total": 500,
    },
    config={
        "seed": 0xCC,
        "num_workers": 1,
        # Specify the environment to use, where "compiler_gym" is the name we 
        # passed to tune.register_env().
        "env": "compiler_gym",
        # Reduce the size of the batch/trajectory lengths to match our short 
        # training run.
        "rollout_fragment_length": 5,
        "train_batch_size": 5,
        "sgd_minibatch_size": 5,
    }
)

## Evaluate the agent

After running the training loop we can create a new agent that has exploration disabled, restore it from the training checkpoint, and then use it for running inference tests.

In [None]:
agent = PPOTrainer(
    env="compiler_gym",
    config={
        "num_workers": 1,
        "seed": 0xCC,
        # For inference we disable the stocastic exploration that is used during 
        # training.
        "explore": False,
    },
)

# We only made a single checkpoint at the end of training, so restore that. In
# practice we may have many checkpoints that we will select from using 
# performance on the validation set.
checkpoint = analysis.get_best_checkpoint(
    metric="episode_reward_mean", 
    mode="max", 
    trial=analysis.trials[0]
)

agent.restore(checkpoint)

In [None]:
# Lets define a helper function to make it easy to evaluate the agent's 
# performance on a set of benchmarks.

def run_agent_on_benchmarks(benchmarks):
  """Run agent on a list of benchmarks and return a list of cumulative rewards."""
  with make_env() as env:
    rewards = []
    for i, benchmark in enumerate(benchmarks, start=1):
        observation, done = env.reset(benchmark=benchmark), False
        while not done:
            action = agent.compute_action(observation)
            observation, _, done, _ = env.step(action)
        rewards.append(env.episode_reward)
        print(f"[{i}/{len(benchmarks)}] {env.state}")

  return rewards

# Evaluate agent performance on the validation set.
val_rewards = run_agent_on_benchmarks(val_benchmarks)

In [None]:
# Evaluate agent performance on the holdout test set.
test_rewards = run_agent_on_benchmarks(test_benchmarks)

In [None]:
# Finally lets plot our results to see how we did!
from matplotlib import pyplot as plt

def plot_results(x, y, name, ax):
  plt.sca(ax)
  plt.bar(range(len(y)), y)
  plt.ylabel("Reward (higher is better)")
  plt.xticks(range(len(x)), x, rotation = 90)
  plt.title(f"Performance on {name} set")

fig, (ax1, ax2) = plt.subplots(1, 2)
fig.set_size_inches(13, 3)
plot_results(val_benchmarks, val_rewards, "val", ax1)
plot_results(test_benchmarks, test_rewards, "test", ax2)
plt.show()

That's it for this demonstration! Check out the [documentation site](https://facebookresearch.github.io/CompilerGym/) for more details, API reference, and more. If you can encounter any problems, please [file an issue](https://github.com/facebookresearch/CompilerGym/issues).