# Artifacts Evaluation Instructions: #26 Automating Reinforcement Learning Architecture Design for Code Optimization

## Preliminaries

This interactive Jupyter notebook provides a small-scale demo for showcasing the task definition, client RL search, client RL parameter tuning and evaluation of case studies reported in the paper.

The main results of our CC 2022 paper are comparing the performance of our Supersonic-tuned client RL against prior search-based techniques. 
The evaluation presented in our paper ran on a much larger dataset and for longer. The idea is that this notebook contains minimal working examples which can be evaluated in a reasonable amount of time. 

## Instructions for Experimental Workflow:

Before you start, please first make a copy of the notebook by going to the landing page. Then select the checkbox next to the notebook titled *main.ipynb*, then click "**Duplicate**".

Click the name of the newly created Jupyter Notebook, e.g. **AE_Intel-Copy1.ipynb**. Next, select "**Kernel**" > "**Restart & Clear Output**". Then, repeatedly press the play button (the tooltip is "run cell, select below") to step through each cell of the notebook.

Alternatively, select each cell in turn and use "**Cell**"> "**Run Cell**" from the menu to run specific cells. Note that some cells depend on previous cells being executed. If any errors occur, ensure all previous cells have been executed.

## Important Notes

**Some cells can take more than half an hour to complete; please wait for the results until step to the next cell.** 

High load can lead to a long wait for results. This may occur if multiple reviewers are simultaneously trying to generate results. 

The experiments are customisable as the code provided in the Jupyter Notebook can be edited on the spot. Simply type your changes into the code blocks and re-run using **Cell > Run Cells** from the menu.

## Links to The Paper

For each step, we note the section number of the submitted version where the relevant technique is described or data is presented.

The main results are presented in Figures 3-6 of the submitted paper. 

# Demo 1: Client RL Search

This demo corresponds to the simplified search space definition example given in Figure 2. Note that we have refactored the code; hence there are small changes in the API. This is a small-scale demo for case study 4 of superoptimization. The full-scale evaluation used in the paper takes over 24 hours to run. 

## Step 1. Task definition

A compiler developer can use the Supersonic API to define the optimization problem. This is done by creating an RL policy interface. The definition includes a list of client RL components for the meta-optimizer to search over.

#### *statefs*: 

This parameter defines the candidate state functions. 

- *Word2vec* for embedding programs with word2vec model;
- *Doc2vec* for embedding programs with Doc2vec model;
- *Bert* for embedding programs with Bert model;
- *Actionhistory* for a hash function to the action history to represent the program.

#### *rewards*: 

This parameter defines the candidate reward functions. 

- *relative_measure* for using speedup, code size reduction ratio as a reward;
- *tan* for using a tan function to process original feedback, e.g. tan(run time) ;
- *func* for using a mapping function to process original feedback, e.g. better performance gives 1, worse performance gives 0;
- *weight* for assigning a weight number to different feedback. e.g. (weight_1*runtime + weight_2*memory_use + weight_3*code_size)

#### *rl_algs*: 

This parameter defines candidate RL algorithms. Possible values are:

- *"MCTS", "PPO", "APPO", "A2C", "DQN", "QLearning", "MARWIL", "PG", "SimpleQ", "A3C", "ARS", "ES", "BC"*  We don't recommend using algorithms A3C, ARS and ES due to long search time. 


In [1]:
import SuperSonic.policy_definition.policy_define as policy_define

#Candidate state functions
statefs=["Word2vec", "Doc2vec", "Bert"]
#Candidate reward functions
rewards=["relative_measure", "tan", "func", "weight"]
#Candidate RL algorithms
rl_algs=["MCTS", "PPO", "DQN", "QLearning"]
actions=["init"]

PolicyCandidate = policy_define.SuperOptimizer(
        StateFunctions=statefs,
        RewardFunctions=rewards,
        RLAlgorithms=rl_algs,
        ActionFunctions=actions,
    ).PolicyDefined()

PolicySearchSpace = PolicyCandidate[0]
PolicySearchCount = PolicyCandidate[1]

print("The search space consists of %d combinations of RL components." % PolicySearchCount)
print("A few examples include:")
print("---------------------------------------------------------------------------------------------------------")
for i in range(len(PolicySearchSpace)):
    # We reduce the output to 1/5 of original to avoid logging too long.
    if i%2==0:
        print(PolicySearchSpace[i])
print("---------------------------------------------------------------------------------------------------------\n")

The search space consists of 47 combinations of RL components.
A few examples include:
---------------------------------------------------------------------------------------------------------
{'StatList': 'Word2vec', 'ActList': 'init', 'RewList': 'relative_measure', 'AlgList': 'MCTS'}
{'StatList': 'Word2vec', 'ActList': 'init', 'RewList': 'relative_measure', 'AlgList': 'DQN'}
{'StatList': 'Word2vec', 'ActList': 'init', 'RewList': 'tan', 'AlgList': 'MCTS'}
{'StatList': 'Word2vec', 'ActList': 'init', 'RewList': 'tan', 'AlgList': 'DQN'}
{'StatList': 'Word2vec', 'ActList': 'init', 'RewList': 'func', 'AlgList': 'MCTS'}
{'StatList': 'Word2vec', 'ActList': 'init', 'RewList': 'func', 'AlgList': 'DQN'}
{'StatList': 'Word2vec', 'ActList': 'init', 'RewList': 'weight', 'AlgList': 'MCTS'}
{'StatList': 'Word2vec', 'ActList': 'init', 'RewList': 'weight', 'AlgList': 'DQN'}
{'StatList': 'Doc2vec', 'ActList': 'init', 'RewList': 'relative_measure', 'AlgList': 'MCTS'}
{'StatList': 'Doc2vec', 'ActList': '

## Step 2. Client RL search

The policy_search() function invokes the SuperSonic meta-optimizer to search the client RL architecture. The user can also limit the number of trials spent on client RL searching. 

Once again, this example works for case study 4 - Superoptimization and use Hacker benchmarks. 

**approximate runtime ~ 30 minutes (please wait before moving to the next cell)**

##### Note: 
You may encounter an error of failed tests. This is because we reduce the RL client search steps to make the search time manageable for the demo. Such failure did occur during our full-scale evaluation. 

#### *mode*: 
This defines module selecting of Supersonic. Possible values are:
- *policy* for showing client RL Policy Search module; 
- *config* for showing parameters Tuning module; 
- *deploy* for showing the deployment of a tuned client RL module; 

#### *iterations*: 
This defines the number of trials for client RL search/ parameters tuning/ deployment; we set it to 100 for paramete tuning and 1000+ for deployment in the paper but use a smaller number for demo.


In [2]:
from demo import demo_run

#Wait until you see the finish message before moving to the next cell.
demo_run(mode="policy",iterations=1)

Start RL Client searching on superoptimization (please wait - approximate runtime = 30 minutes)...
---------------------------------------------------------------------------------------------------------
Policy search engine started...

Start client RL training...
exist_policy:{'StatList': 'Word2vec', 'ActList': 'init', 'RewList': 'weight', 'AlgList': 'QLearning'}
Opened database successfully
== Status ==
Memory usage on this node: 31.2/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/50 CPUs, 0/0 GPUs, 0.0/59.23 GiB heap, 0.0/20.26 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------+
| Trial name               | status   | loc   |
|--------------------------+----------+-------|
| SAC_stoke_rl_b90af_00000 | RUNNING  |       |
+--------------------------+----------+-------+


== Status ==
Memory usage on this node: 31.6/125.6 GiB
Using FIFO scheduling algorithm.
Re


== Status ==
Memory usage on this node: 31.7/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/50 CPUs, 0/0 GPUs, 0.0/59.23 GiB heap, 0.0/20.26 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------------------+--------+------------------+------+----------+
| Trial name               | status   | loc               |   iter |   total time (s) |   ts |   reward |
|--------------------------+----------+-------------------+--------+------------------+------+----------|
| SAC_stoke_rl_b90af_00000 | RUNNING  | 172.17.0.7:101361 |     19 |          79.5913 |  950 |        1 |
+--------------------------+----------+-------------------+--------+------------------+------+----------+


== Status ==
Memory usage on this node: 31.7/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/50 CPUs, 0/0 GPUs, 0.0/59.23 GiB heap, 0.0/20.26 GiB objects
Result logdir: /home/sys


('Trials did not complete', [SAC_stoke_rl_f34cd_00000])
Iteration numbers : 2
exist_policy:{'StatList': 'Doc2vec', 'ActList': 'init', 'RewList': 'weight', 'AlgList': 'QLearning'}
Opened database successfully
== Status ==
Memory usage on this node: 31.4/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/50 CPUs, 0/0 GPUs, 0.0/58.89 GiB heap, 0.0/20.17 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------+
| Trial name               | status   | loc   |
|--------------------------+----------+-------|
| SAC_stoke_rl_1367c_00000 | RUNNING  |       |
+--------------------------+----------+-------+


== Status ==
Memory usage on this node: 31.7/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/50 CPUs, 0/0 GPUs, 0.0/58.89 GiB heap, 0.0/20.17 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+------


== Status ==
Memory usage on this node: 31.7/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/50 CPUs, 0/0 GPUs, 0.0/58.89 GiB heap, 0.0/20.17 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------------------+--------+------------------+------+----------+
| Trial name               | status   | loc               |   iter |   total time (s) |   ts |   reward |
|--------------------------+----------+-------------------+--------+------------------+------+----------|
| SAC_stoke_rl_1367c_00000 | RUNNING  | 172.17.0.7:102139 |     19 |          76.7059 |  950 |    0.995 |
+--------------------------+----------+-------------------+--------+------------------+------+----------+


== Status ==
Memory usage on this node: 31.7/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/50 CPUs, 0/0 GPUs, 0.0/58.89 GiB heap, 0.0/20.17 GiB objects
Result logdir: /home/sys


== Status ==
Memory usage on this node: 31.8/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/50 CPUs, 0/0 GPUs, 0.0/58.79 GiB heap, 0.0/20.12 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------------------+--------+------------------+------+----------+
| Trial name               | status   | loc               |   iter |   total time (s) |   ts |   reward |
|--------------------------+----------+-------------------+--------+------------------+------+----------|
| SAC_stoke_rl_4c14c_00000 | RUNNING  | 172.17.0.7:103051 |     15 |          62.1972 |  750 |        1 |
+--------------------------+----------+-------------------+--------+------------------+------+----------+


== Status ==
Memory usage on this node: 31.8/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/50 CPUs, 0/0 GPUs, 0.0/58.79 GiB heap, 0.0/20.12 GiB objects
Result logdir: /home/sys


== Status ==
Memory usage on this node: 31.9/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/50 CPUs, 0/0 GPUs, 0.0/58.79 GiB heap, 0.0/20.12 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------------------+--------+------------------+------+----------+
| Trial name               | status   | loc               |   iter |   total time (s) |   ts |   reward |
|--------------------------+----------+-------------------+--------+------------------+------+----------|
| SAC_stoke_rl_869c8_00000 | RUNNING  | 172.17.0.7:103864 |      9 |           37.462 |  450 |        1 |
+--------------------------+----------+-------------------+--------+------------------+------+----------+


== Status ==
Memory usage on this node: 31.4/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/50 CPUs, 0/0 GPUs, 0.0/58.79 GiB heap, 0.0/20.12 GiB objects
Result logdir: /home/sys

---------------------------------------------------------------------------------------------------------
Client RL search finished.


## Step 3. Client RL Parameter Tuning

After choosing an RL architecture, Supersonic will fine-tune a set of model-specific hyper-parameters of the selected
client RL.

This demo shows how to fine-tune the chosen client RL on the hacker benchmark dataset for superoptimization.

*approximate runtime ~ 20 minutes*

In [3]:
from demo import demo_run

#Wait until you see the finish message before moving to the next cell.
demo_run(mode="config", iterations=10)

Start client RL parameter tuning (please wait - approximate runtime = 20 minutes)...
---------------------------------------------------------------------------------------------------------
config search engine started...

== Status ==
Memory usage on this node: 33.0/125.6 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 256.000: None | Iter 64.000: None | Iter 16.000: None | Iter 4.000: None | Iter 1.000: None
Resources requested: 1/104 CPUs, 0/0 GPUs, 0.0/59.23 GiB heap, 0.0/20.26 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------+-------------------+
| Trial name               | status   | loc   |   learning_starts |
|--------------------------+----------+-------+-------------------|
| SAC_stoke_rl_4bef6_00000 | RUNNING  |       |         0.0676315 |
+--------------------------+----------+-------+-------------------+


== Status ==
Memory usage on this node: 33.3/125.6 Gi


== Status ==
Memory usage on this node: 33.5/125.6 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 256.000: None | Iter 64.000: None | Iter 16.000: None | Iter 4.000: None | Iter 1.000: None
Resources requested: 1/104 CPUs, 0/0 GPUs, 0.0/58.98 GiB heap, 0.0/20.17 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------+-------------------+
| Trial name               | status   | loc   |   learning_starts |
|--------------------------+----------+-------+-------------------|
| SAC_stoke_rl_6b42a_00000 | RUNNING  |       |          0.593558 |
+--------------------------+----------+-------+-------------------+


== Status ==
Memory usage on this node: 33.5/125.6 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 256.000: None | Iter 64.000: None | Iter 16.000: None | Iter 4.000: None | Iter 1.000: 1.0
Resources requested: 1/104 CPUs, 0/0 GPUs, 0.0/58.98 GiB heap, 0.0/20.17 GiB obj


== Status ==
Memory usage on this node: 33.7/125.6 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 256.000: None | Iter 64.000: None | Iter 16.000: None | Iter 4.000: None | Iter 1.000: 1.0
Resources requested: 1/104 CPUs, 0/0 GPUs, 0.0/58.84 GiB heap, 0.0/20.12 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-----------------+-------------------+--------+------------------+------+----------+
| Trial name               | status   | loc             |   learning_starts |   iter |   total time (s) |   ts |   reward |
|--------------------------+----------+-----------------+-------------------+--------+------------------+------+----------|
| SAC_stoke_rl_8b9be_00000 | RUNNING  | 172.17.0.7:4635 |          0.378585 |      1 |          5.27222 |   50 |        1 |
+--------------------------+----------+-----------------+-------------------+--------+------------------+------+----------+


== Status ==
Memory usage on this node: 33.7/125.6 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 256.000: None | Iter 64.000: None | Iter 16.000: None | Iter 4.000: None | Iter 1.000: 1.0
Resources requested: 1/104 CPUs, 0/0 GPUs, 0.0/58.79 GiB heap, 0.0/20.12 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-----------------+-------------------+--------+------------------+------+----------+
| Trial name               | status   | loc             |   learning_starts |   iter |   total time (s) |   ts |   reward |
|--------------------------+----------+-----------------+-------------------+--------+------------------+------+----------|
| SAC_stoke_rl_a4f91_00000 | RUNNING  | 172.17.0.7:6117 |         0.0209298 |      1 |          5.19478 |   50 |        1 |
+--------------------------+----------+-----------------+-------------------+--------+------------------+------+----------+

== Status ==
Memory usage on this node: 33.7/125.6 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 256.000: None | Iter 64.000: None | Iter 16.000: None | Iter 4.000: None | Iter 1.000: 1.0
Resources requested: 1/104 CPUs, 0/0 GPUs, 0.0/58.79 GiB heap, 0.0/20.12 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-----------------+-------------------+--------+------------------+------+----------+
| Trial name               | status   | loc             |   learning_starts |   iter |   total time (s) |   ts |   reward |
|--------------------------+----------+-----------------+-------------------+--------+------------------+------+----------|
| SAC_stoke_rl_c5e7d_00000 | RUNNING  | 172.17.0.7:9131 |          0.822631 |      1 |          5.31607 |   50 |        1 |
+--------------------------+----------+-----------------+-------------------+--------+------------------+------+----------+


---------------------------------------------------------------------------------------------------------
Finish client RL parameter tuning.


## Step 4. Client RL Deployment

The tuned client RL and its parameters are saved in the last step, which can be shipped with a compiler to optimise unseen programs at deployment time. 

This demo shows how to apply the saved client RL to optimize a test program for superoptimizaiton. 

*approximate runtime ~ 15 minutes*

In [4]:
from demo import demo_run

#Wait until you see the finish message before moving to the next cell.
demo_run(mode="deploy", iterations=10,benchmark='p20')

Environment rebuilding for case study Superoptimization (~3 mins)
Environment rebuilding successful!
---------------------------------------------------------------------------------------------------------
Start client RL demo (please wait - approximate runtime = 15 minutes)...
---------------------------------------------------------------------------------------------------------
deploying the best config and policy to RL client and optimize the program...

Opened database successfully
== Status ==
Memory usage on this node: 31.2/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/50 CPUs, 0/0 GPUs, 0.0/59.38 GiB heap, 0.0/20.31 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------+
| Trial name               | status   | loc   |
|--------------------------+----------+-------|
| SAC_stoke_rl_75b41_00000 | RUNNING  |       |
+--------------------------+----------+----

## Demo 2: Experimental Evaluation

Here, we provide a small-sized evaluation to showcase the working mechanism of the Supersonic-chosen RL on four case studies. A full-scale evaluation, which takes more than a week to run, is provided through the Docker image (with detailed instructions on our project Github). 

### Case Study 1: Optimizing Image Pipelines (Section 5.1)

This task aims to improve the optimization heuristic of the Halide compiler framework. Halide is a domain-specific language and compiler for image processing pipelines (or graphs) with multiple computation stages. A Halide program separates the expression of the computation kernels and the application processing pipeline from the pipeline’s schedule. Here, the schedule defines the order of execution and placement of data on the hardware. The goal of this task is to automatically synthesize schedules to minimize the execution time of the benchmark.

### Client RL Deployment Demo

This demo shows how to apply the saved client RL to optimize a test program for Optimizing Image Pipelines. 

#### *benchmark*
harris, interpolate, hist, max_filter, unsharp, nl_mean, lens_blur, local_laplacian, conv_layer, st_chain

*approximate runtime ~ 20 minutes*

In [None]:
from demo import demo_run

#Wait until you see the finish message before moving to the next cell.
demo_run(iterations=10, case_study = 'case_1', benchmark = 'harris')

Environment rebuilding for case study Optimizing Image Pipelines. (~3 mins)
Environment rebuilding successful!
---------------------------------------------------------------------------------------------------------
Start client RL demo (please wait - approximate runtime = 15 minutes)...
---------------------------------------------------------------------------------------------------------
deploying the best config and policy to RL client and optimize the program...

Opened database successfully
== Status ==
Memory usage on this node: 31.2/125.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/50 CPUs, 0/0 GPUs, 0.0/59.33 GiB heap, 0.0/20.26 GiB objects
Result logdir: /home/sys/SUPERSONIC/SuperSonic/logs/model_save/SAC
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------+
| Trial name               | status   | loc   |
|--------------------------+----------+-------|
| SAC_stoke_rl_6207a_00000 | RUNNING  |       |
+--------------------------+-----

#### Performance evaluation on benchmarks

The results correspond to Figure 3 of the submitted manuscript.

*approximate runtime ~30 minutes for one benchmark*

#### *data_nums*: 
This defines the number of test benchmarks. ( 1 - 10 )

#### *times*: 
The number you want to execute the benchmark. ( we set to 100 in our paper, it takes more than 10 hours )

Note: There may be other processes running at the same time on the same server, which could introduce noise to the data. 

In [None]:
import warnings
warnings.filterwarnings("ignore")
from AE.utils.Calculate import CalculateHalideDemo
print("Start tuning on Optimizing Image Pipelines (please wait - approximate runtime = 30 minutes per benchmark)...")
print("---------------------------------------------------------------------------------------------------------")
HalideSingleDemo = CalculateHalideDemo()
HalideSingleDemo.ExecHalideMultipleDemo(times=5,data_nums=10)
print("---------------------------------------------------------------------------------------------------------")
print("Showing the results under different search time budgets (higher is better)")
print("---------------------------------------------------------------------------------------------------------")
HalideSingleDemo.PrintTable("time")
print("---------------------------------------------------------------------------------------------------------")
print("Showing the results under different search iteration budgets (higher is better)")
print("---------------------------------------------------------------------------------------------------------")
HalideSingleDemo.PrintTable("cycle")
print("---------------------------------------------------------------------------------------------------------")
print("Performance evaluation finished.")

#### Full-scale evaluation data

We now plot diagrams using full-scale evaluation data (it would take too long to run the experiment lively). The results correspond to Figure 3 (Section 5.1) of the submitted manuscript.

In [None]:
from AE.utils.Calculate import CalculateHalideDemo
HalideDemo = CalculateHalideDemo()
# Output Figure 3(a): speedup over Halide master for optimizing image pipelines under different search time(in intel).
HalideDemo.plot("time")
# Output Figure 3(b): speedup over Halide master for optimizing image pipelines under different iterations(in intel).
HalideDemo.plot("cycle")

### Case Study 2: Neural Network Code Generation (Section 5.2)

This task targets DNN back-end code generation to find a good schedule. e.g., instruction orders and data placement to reduce execution time on a multi-core CPU.

This demo corresponds to Figure 4 of the submitted manuscript.

*approximate runtime = 10 minutes for one benchmark*



In [None]:
from AE.utils.Calculate_zjq import CalculateTvmDemo
'''
Class CalculateTvmDemo():TVM evaluation rountine
Func ExecTvmDemo(): evaluation interface

'''
print("Start tuning benchmarks on Neural Network Code Generation (please wait - approximate runtime = 10 minutes per benchmark)...")
print("---------------------------------------------------------------------------------------------------------")
TvmDemo = CalculateTvmDemo("cycle") 
TvmDemo.execTvmDemo(cycles = [3,12,16])
print("Showing the results under different search iteration budgets (higher is better)")
print("---------------------------------------------------------------------------------------------------------")
TvmDemo.printTable()
print("---------------------------------------------------------------------------------------------------------")
print("Performing evaluation under different search time constraints.")
TvmDemo = CalculateTvmDemo("time")
TvmDemo.execTvmDemo(times = [20,40,60])
print("Showing the results under different search time budgets (higher is better)")
print("---------------------------------------------------------------------------------------------------------")
TvmDemo.printTable()
print("---------------------------------------------------------------------------------------------------------")
print("Performance evaluation finished.")

#### Full-scale evaluation data

We now plot the diagrams using full-scale evaluation data (it would take too long to run the experiment lively). The results correspond to Figure 4 (Section 5.2) of the submitted manuscript.

In [None]:
TvmDemo = CalculateTvmDemo()
# Output Figure 4(a): speedup over TVM default for Neural Network Code Generation under different search time(in intel).
TvmDemo.plot("time")
# Output Figure 4(b): speedup over TVM default for Neural Network Code Generation under different search iterations(in intel).
TvmDemo.plot("cycle")

### Case Study 3: Code Size Reduction (Section 5.3)

This task is concerned with determining the LLVM passes and their order to minimize the code size.

This demo corresponds to Figure 5 of the submitted manuscript.

*approximate runtime = 10 minutes for one benchmark*

### Client RL Deployment Demo

This demo shows how to apply the saved client RL to optimize a test program for Code Size Reduction. 

*approximate runtime ~ 15 minutes*

#### *benchmark*
mandel-text, revertBits, hello, spellcheck, wordfreq, reversefile, lists1 ,misr, whetstone, queens, lists, dt, wc, linpack-pc, perlin, objinst, puzzle, smallpt, blowfish,bzip2, crc32, dijkstra, ghostscript, gsm, ispell, jpeg-c, jpeg-d,lame,patricia,qsort,sha,stringsearch, stringsearch2, susan, tiff2bw, tiffdither, tiffmedian

*approximate runtime ~ 20 minutes*

In [None]:
from demo import demo_run

#Wait until you see the finish message before moving to the next cell.
demo_run(mode="deploy", iterations=10, case_study = 'case_3', benchmark = 'mandel-text')

#### Performance evaluation on benchmarks

The results correspond to Figure 5 of the submitted manuscript.

*approximate runtime ~30 minutes for one benchmark*

#### *times*: 
The number you want to execute the benchmark. ( we set to 100 in our paper, it takes more than 10 hours )

Note: There may be other processes running at the same time on the same server, which could introduce noise to the data. 

#### *data_num*: 
This defines the benchmark account you want to calculate speedup ( 1 - 43 ).  It takes the following values in this case study: 'adpcm', 'dry', 'bitcount', 'recursive', 'rijndael', 'tiff2rgba', total 43 benchmarks.

In [None]:
from AE.utils.Calculate import CalculateCSRDemo
'''
Class CalculateCSRDemo(setting_list,set_up): code size reduction evaluation routine
:param setting:for time'setting is [50,100,200,3600] and for cycle's setting is [100,200,400,7200]
:param set_up: represent search times or iterations, choose between 'time' (i.e., the search overhead) or 'cycle (i.e., the number of search iterations)'.
Func ExecCSRSingleDemo(data): this function perform code size reduction optimization on benchmarks. Our dataset includes 43 benchmarks: mandel-text,revertBits, hello,spellcheck,wordfreq,reversefile,lists1,misr,whetstone,queens,lists,dt,wc,linpack-pc,
perlin,objinst,puzzle,smallpt,blowfish,bzip2,crc32,dijkstra,ghostscript,gsm,ispell,jpeg-c,jpeg-d,lame,patricia,qsort,sha,stringsearch,stringsearch2,susan,tiff2bw,tiffdither, and tiffmedian. 
'''
print("Start tuning benchmarks on Code Size Reduction (please wait - approximate runtime = 10 minutes per benchmark)...")
print("---------------------------------------------------------------------------------------------------------")
CSRTimeDemo = CalculateCSRDemo(["50","100","200","3600"],"time")
CSRCycleDemo = CalculateCSRDemo(["100","200","400","7200"],"cycle")
CSRTimeDemo.ExecCSRMultipleDemo(data_nums=10)
print("Showing the results under different search iteration budgets (higher is better)")
print("---------------------------------------------------------------------------------------------------------")
CSRTimeDemo.PrintTable()
print("---------------------------------------------------------------------------------------------------------")
CSRCycleDemo.ExecCSRMultipleDemo(data_nums=10)
print("Showing the results under different search time budgets (higher is better)")
print("---------------------------------------------------------------------------------------------------------")
CSRCycleDemo.PrintTable()
print("---------------------------------------------------------------------------------------------------------")
print("Performance evaluation finished.")

#### Full-scale evaluation data

We now plot the diagrams using full-scale evaluation data (it would take too long to run the experiment lively). The results correspond to Figure 5 (Section 5.3) of the submitted manuscript.

In [None]:
# Output Figure 5(a): reduction over LLVM -Oz for code size reduction under different search time.
CSRTimeDemo.plot()
# Output Figure 5(b): reduction over LLVM -Oz for code size reduction under different iterations.
CSRCycleDemo.plot()

### Case Study 4: Superoptimization (Section 5.4)

This classical compiler optimization task
finds a valid code sequence to maximize the performance
of a loop-free sequence of instructions. Superoptimizaiton is an expensive optimization technique as the
number of possible configurations grows exponentially as
the instruction count to be optimized increases.

This demo corresponds to Figure 5 of the submitted manuscript.

*approximate runtime = 10 minutes for one benchmark*

### Client RL Deployment Demo

This demo shows how to apply the saved client RL to optimize a test program for Superoptimization. 

*approximate runtime ~ 15 minutes*

#### *benchmark*

p01 - p25

In [None]:
from demo import demo_run

#Wait until you see the finish message before moving to the next cell.
demo_run(mode="deploy", iterations=10, case_study = 'case_4', benchmark = 'p20')

#### Performance evaluation on benchmarks

The results correspond to Figure 6 of the submitted manuscript.

*approximate runtime ~30 minutes for one benchmark*

#### *times*: 
The number you want to execute the benchmark. ( we set to 100 in our paper, it takes more than 10 hours )

Note: There may be other processes running at the same time on the same server, which could introduce noise to the data. 

#### *data_nums*: 
This defines the benchmark account you want to calculate speedup. It takes the following values in this case study: p01 - p25

In [None]:
#Calculate all the result'time
from AE.utils.Calculate import CalculateStokeDemo
print("Start tuning benchmarks on Superoptimization (please wait - approximate runtime = 10 minutes per benchmark)...")
print("---------------------------------------------------------------------------------------------------------")
StokeSingleDemo = CalculateStokeDemo()
# StokeSingleDemo.ExecStokeSingleDemo(data="p20",times=1)
StokeSingleDemo.ExecStokeMultipleDemo(times=10,data_nums=10)
print("---------------------------------------------------------------------------------------------------------")

print("Showing the results under different search time budgets (higher is better)")
print("---------------------------------------------------------------------------------------------------------")
StokeSingleDemo.PrintTable("time")
print("---------------------------------------------------------------------------------------------------------")
print("Showing the results under different search iteration budgets (higher is better)")
print("---------------------------------------------------------------------------------------------------------")
StokeSingleDemo.PrintTable("cycle")
print("---------------------------------------------------------------------------------------------------------")

#Wait until you see this message before moving to the next cell.
print("Performance evaluation finished.")

#### Full-scale evaluation data

We now plot the diagrams using full-scale evaluation data (it would take too long to run the experiment lively). The results correspond to Figure 6 (Section 5.4) of the submitted manuscript.

In [None]:
from AE.utils.Calculate import CalculateStokeDemo
StokeDemo = CalculateStokeDemo()
# Output Figure 6(a): speedup over LLVM/GCC -O3 for superoptimization under different search time (on intel machine).
StokeDemo.plot("time")
# Output Figure 6(b): speedup over LLVM/GCC -O3 for superoptimization under different iterations (on intel machine).
StokeDemo.plot("cycle")