In [1]:
import am4pa

In [2]:
from am4pa.runners import RunnerVariants
from am4pa.runners import ExecuteSubProcess
from am4pa.runners import bcolors

from am4pa.data_integration import DataCollector

from am4pa.data_proccessing import CaseDurationsManager
from am4pa.data_proccessing import FilterOnKPIs
from am4pa.data_proccessing import get_trace_durations

### Runner:

Runners are objects that executes an external command (such as calling a python or julia or bash script).
Running the external command outputs a csv file in a format that can be handled by the data_integration sub module. The inputs are the names of the script files that executes the required operation.

The execution of the scripts can either be local, in the backend or submitted to a batch system

#### Input:
Implementation that generates and measures variant codes for an input linear algebra expression. The generated code has timestamps inserted before and after the kernel calls.



## Local Measurements

#### 1. |Generate variants for an instance of a linear algebra expression.

An instance refers to an linear algebra expression with a specific operand sizes

In [3]:
operand_sizes = ["75","75","6","75","75"]
script_dir = "sample_generation/"
runner_local = RunnerVariants(operand_sizes, script_dir)

Here, ```generate-variants-linnea.py``` is a script file that generates variant codes using the Linnea interface

In [4]:
runner_local.generate_variants_for_measurements(generation_script="generate-variants-linnea.py")

['python', 'sample_generation/generate-variants-linnea.py', '75', '75', '6', '75', '75', '--threads=4']
New solution:.............2.02e+05
No further generation steps possible.
----------------------------------
Number of nodes:                 8
Solution nodes:                  1
Data:                     1.78e+04
Best solution:            2.02e+05
Intensity:                    11.4
Number of algorithms:            6
Generated Variants.
[92mSuccess: Local run: Generate variants[0m


0

#### 2. Measure variants

Executing ```generate-variants-linnea.py``` creates a subdirectory ```experiments/75_75_6_75_75/```, which consists of the generated code and a number of scripts.

In [15]:
!ls sample_generation/experiments/75_75_6_75_75/

case_table.csv                  [34mlogs[m[m
compute-ranks.py                operand_generator.jl
event_meta_table.csv            runner.jl
[34mexperiments[m[m                     submit.sh
generate-measurements-script.py [34mvariants[m[m


```runner.jl``` is a script that  executes all the variants once and outputs a file ```run_times.csv``` that consists of the run time for each variant.

In [16]:
runner_local.measure_variants(app="julia", runner_script="runner.jl")

Running Measurements locally
['julia', 'sample_generation/experiments/75_75_6_75_75/runner.jl']
[92mSuccess: Local run: Measurements from runner.jl[0m


0

In [17]:
!ls sample_generation/experiments/75_75_6_75_75/

case_table.csv                  operand_generator.jl
compute-ranks.py                run_times.csv
event_meta_table.csv            runner.jl
[34mexperiments[m[m                     submit.sh
generate-measurements-script.py [34mvariants[m[m
[34mlogs[m[m


```generate_measurement_scripts.py``` is a file that generates a measurement script with a specific identifier (run_id) that repeats a given set of variant for a said (rep) numnber of times. For instance, the resulting script for identifier 0 is ```runner_competing_0.jl```

In [18]:
measurements_script = "generate-measurements-script.py"
variants = ['algorithm0', 'algorithm1']
reps = 3
run_id = 0

In [19]:
runner_local.generate_measurements_script(measurements_script, variants, run_id, reps)

['python', 'sample_generation/experiments/75_75_6_75_75/generate-measurements-script.py', '--algs', 'algorithm0', 'algorithm1', '--rep', '3', '--threads', '4', '--id', '0']
[92mSuccess: Local run: Generate Measurement script 0[0m


0

In [20]:
!ls sample_generation/experiments/75_75_6_75_75/

case_table.csv                  operand_generator.jl
compute-ranks.py                run_times.csv
event_meta_table.csv            runner.jl
[34mexperiments[m[m                     runner_competing_0.jl
generate-measurements-script.py submit.sh
[34mlogs[m[m                            [34mvariants[m[m


Executing ```runner_competing_0.jl``` measures the variants and outputs a csv file ```run_times_competing_0.csv``` consisting of the execution time measurements

In [22]:
runner_competing_script = "runner_competing_0.jl"
runner_local.measure_variants(app="julia", runner_script=runner_competing_script)

Running Measurements locally
['julia', 'sample_generation/experiments/75_75_6_75_75/runner_competing_0.jl']
[92mSuccess: Local run: Measurements from runner_competing_0.jl[0m


0

In [23]:
!ls sample_generation/experiments/75_75_6_75_75/

case_table.csv                  run_times.csv
compute-ranks.py                run_times_competing_0.csv
event_meta_table.csv            runner.jl
[34mexperiments[m[m                     runner_competing_0.jl
generate-measurements-script.py submit.sh
[34mlogs[m[m                            [34mvariants[m[m
operand_generator.jl


### Data integration

This module converts the csv files generated by the runner to pandas data frames.

The input is the directory consisting of the csv files

In [24]:
dc_local = DataCollector("sample_generation/experiments/75_75_6_75_75/")

For instance, lets read ```case_table.csv```

In [26]:
dc_local.get_table("case_table.csv")

Unnamed: 0,case:concept:name,case:flops,case:num_kernels
0,algorithm1,202000.0,3
1,algorithm5,1760000.0,3
2,algorithm4,1760000.0,3
3,algorithm0,202000.0,3
4,algorithm3,979000.0,3
5,algorithm2,979000.0,3


```DataCollector``` offers methods to read the files without having to input the file names, provided the file names adhere to the ***PA4Algs*** standards.

In [29]:
case_table = dc_local.get_case_table()
case_table

Unnamed: 0,case:concept:name,case:flops,case:num_kernels
0,algorithm1,202000.0,3
1,algorithm5,1760000.0,3
2,algorithm4,1760000.0,3
3,algorithm0,202000.0,3
4,algorithm3,979000.0,3
5,algorithm2,979000.0,3


In [30]:
measurements_table = dc_local.get_runtimes_table()
measurements_table

Unnamed: 0,case:concept:name,concept:name,concept:flops,concept:operation,concept:kernel,timestamp:start,timestamp:end
0,algorithm1,gemm_6.75e+04,67500.0,tmp1 = (A B),"gemm!('N', 'N', 1.0, ml0, ml1, 0.0, ml4)",1658504000.0,1658504000.0
1,algorithm1,gemm_6.75e+04,67500.0,tmp3 = (C D),"gemm!('N', 'N', 1.0, ml2, ml3, 0.0, ml5)",1658504000.0,1658504000.0
2,algorithm1,gemm_6.75e+04,67500.0,tmp6 = (tmp1 tmp3),"gemm!('N', 'N', 1.0, ml4, ml5, 0.0, ml6)",1658504000.0,1658504000.0
3,algorithm5,gemm_6.75e+04,67500.0,tmp2 = (B C),"gemm!('N', 'N', 1.0, ml1, ml2, 0.0, ml4)",1658504000.0,1658504000.0
4,algorithm5,gemm_8.44e+05,844000.0,tmp4 = (A tmp2),"gemm!('N', 'N', 1.0, ml0, ml4, 0.0, ml5)",1658504000.0,1658504000.0
5,algorithm5,gemm_8.44e+05,844000.0,tmp6 = (tmp4 D),"gemm!('N', 'N', 1.0, ml5, ml3, 0.0, ml6)",1658504000.0,1658504000.0
6,algorithm4,gemm_6.75e+04,67500.0,tmp2 = (B C),"gemm!('N', 'N', 1.0, ml1, ml2, 0.0, ml4)",1658504000.0,1658504000.0
7,algorithm4,gemm_8.44e+05,844000.0,tmp5 = (tmp2 D),"gemm!('N', 'N', 1.0, ml4, ml3, 0.0, ml5)",1658504000.0,1658504000.0
8,algorithm4,gemm_8.44e+05,844000.0,tmp6 = (A tmp5),"gemm!('N', 'N', 1.0, ml0, ml5, 0.0, ml6)",1658504000.0,1658504000.0
9,algorithm0,gemm_6.75e+04,67500.0,tmp3 = (C D),"gemm!('N', 'N', 1.0, ml2, ml3, 0.0, ml4)",1658504000.0,1658504000.0


### Data processing

The ```data_proccessing``` module takes as input the data frames of the case table and the measurement tables (according to the PA4Algs standards) and performs a number of data processing operations. 

#### FilterOnKPIs

The ```FilterOnKPI``` class filters the variants with the highest FLOP count or execution times within a cretain threshold from the minimum oberved execution time

In [31]:
filterAlgs = FilterOnKPIs(case_table, measurements_table)

In [33]:
filterAlgs.filter_on_flops_and_rel_duration(1.2)

Unnamed: 0,case:concept:name,case:timestamp:start,case:timestamp:end,case:duration,case:flops,case:num_kernels,case:rel-flops,case:rel-duration
0,algorithm1,1658504000.0,1658504000.0,3.9e-05,202000.0,3,0.0,0.006135
3,algorithm0,1658504000.0,1658504000.0,3.9e-05,202000.0,3,0.0,0.0
4,algorithm3,1658504000.0,1658504000.0,8.2e-05,979000.0,3,3.846535,1.110429
5,algorithm2,1658504000.0,1658504000.0,8.2e-05,979000.0,3,3.846535,1.110429


In [34]:
filterAlgs.filter_on_best_flops()

Unnamed: 0,case:concept:name,case:timestamp:start,case:timestamp:end,case:duration,case:flops,case:num_kernels,case:rel-flops,case:rel-duration
0,algorithm1,1658504000.0,1658504000.0,3.9e-05,202000.0,3,0.0,0.006135
3,algorithm0,1658504000.0,1658504000.0,3.9e-05,202000.0,3,0.0,0.0


In [35]:
filterAlgs.get_alg_seq_sorted_on_duration()

['algorithm0',
 'algorithm1',
 'algorithm3',
 'algorithm2',
 'algorithm4',
 'algorithm5']

#### CaseDurationManager

In practise, measuremnets from multiple tables have to be aggregated. For instance, in order to measure the relative performance of algorithms in PA4Algs (Algorithm Ranking), the variants are measured iteratively, and the execution times from different csv files have to be combined into a single data frame. To this end, ```CaseDurationManager``` class is used

In [42]:
cm = CaseDurationsManager()

run_times_table0 = dc_local.get_runtimes_competing_table(0)
cm.add_case_durations(run_times_table0)

cm.case_durations

Unnamed: 0,case:concept:name,case:timestamp:start,case:timestamp:end,case:duration
0,algorithm0_00,1658505000.0,1658505000.0,0.000802
1,algorithm1_01,1658505000.0,1658505000.0,1.9e-05
2,algorithm0_01,1658505000.0,1658505000.0,2.1e-05
3,algorithm1_02,1658505000.0,1658505000.0,2.2e-05
4,algorithm0_02,1658505000.0,1658505000.0,3.6e-05
5,algorithm1_00,1658505000.0,1658505000.0,3.5e-05


For instances, let us consider combining measurements from ```run_times_competing_0.csv``` and ```run_times_competing_1.csv```. ```run_times_competing_0.csv``` s already available. Now lets measure again and get ```run_times_competing_1.csv```

In [37]:
runner_local.generate_measurements_script(measurements_script, variants, 1, reps)
runner_competing_script = "runner_competing_1.jl"
runner_local.measure_variants(app="julia", runner_script=runner_competing_script)

['python', 'sample_generation/experiments/75_75_6_75_75/generate-measurements-script.py', '--algs', 'algorithm0', 'algorithm1', '--rep', '3', '--threads', '4', '--id', '1']
[92mSuccess: Local run: Generate Measurement script 1[0m
Running Measurements locally
['julia', 'sample_generation/experiments/75_75_6_75_75/runner_competing_1.jl']
[92mSuccess: Local run: Measurements from runner_competing_1.jl[0m


0

In [43]:
run_times_table1 = dc_local.get_runtimes_competing_table(1)

In [44]:
cm.add_case_durations(run_times_table1)

In [45]:
cm.case_durations

Unnamed: 0,case:concept:name,case:timestamp:start,case:timestamp:end,case:duration
0,algorithm0_00,1658505000.0,1658505000.0,0.000802
1,algorithm1_01,1658505000.0,1658505000.0,1.9e-05
2,algorithm0_01,1658505000.0,1658505000.0,2.1e-05
3,algorithm1_02,1658505000.0,1658505000.0,2.2e-05
4,algorithm0_02,1658505000.0,1658505000.0,3.6e-05
5,algorithm1_00,1658505000.0,1658505000.0,3.5e-05
6,algorithm1_12,1658506000.0,1658506000.0,0.000811
7,algorithm0_12,1658506000.0,1658506000.0,2.7e-05
8,algorithm1_11,1658506000.0,1658506000.0,3.2e-05
9,algorithm0_11,1658506000.0,1658506000.0,3.2e-05


```CaseDurationManager``` also outputs the measurements as a dictionary in the following format (In order to rank the algorithms using the ***PA4Algs*** (Algorithm ranking), the format is required) 

In [46]:
cm.get_alg_measurements()

{'algorithm0': [0.0008020401000976562,
  2.09808349609375e-05,
  3.600120544433594e-05,
  2.7179718017578125e-05,
  3.1948089599609375e-05,
  4.506111145019531e-05],
 'algorithm1': [1.9073486328125e-05,
  2.2172927856445312e-05,
  3.504753112792969e-05,
  0.0008111000061035156,
  3.1948089599609375e-05,
  4.8160552978515625e-05]}

#### Deleting the directory and the csv files generated by the runner

In [47]:
runner_local.clean()

In [49]:
!ls sample_generation/experiments/

## Executing Measurements on the Backend

#### 1. Instantiate a backend manager

In [77]:
from backend_manager import BackendManager,Commands
import os

In [51]:
bm = BackendManager(server="login18-1.hpc.itc.rwth-aachen.de", uname="as641651")
bm.connect()
cmds = Commands(source="~/.analyzer")

#### 2. Generate variants

In [52]:
operand_sizes = ["75","75","6","75","75"]
script_dir = "sample_generation/" # the path to the directory in the backend
generation_script = "generate-variants-linnea.py"
runner = RunnerVariants(operand_sizes, script_dir,backend_manager=bm, backend_commands=cmds)

In [53]:
ret = runner.generate_variants_for_measurements(generation_script=generation_script)

source ~/.analyzer; cd sample_generation; python generate-variants-linnea.py 75 75 6 75 75 --threads=4
['New solution:.............2.02e+05\n', 'No further generation steps possible.\n', '----------------------------------\n', 'Number of nodes:                 8\n', 'Solution nodes:                  1\n', 'Data:                     1.78e+04\n', 'Best solution:            2.02e+05\n', 'Intensity:                    11.4\n', 'Number of algorithms:            6\n', 'Generated Variants.\n']
[92mSuccess: Backend interactive run: Generate variants[0m


#### 3. Measure variants

In [54]:
runner.measure_variants(app="julia", runner_script="runner.jl")

Running Measurements Backend interactive
source ~/.analyzer; cd sample_generation/experiments/75_75_6_75_75; julia runner.jl 
[]
[92mSuccess: Backend interactive run: Measurements from runner.jl[0m


0

#### 4. Read Tables

To this end, instantiate the ```DataCollector``` class by passing the backend manager. The class synchronizes data from the opoerands directory from the runner to a local directory

In [55]:
local_dir = "sample_generation/cluster/"
backend_dir = runner.operands_dir

In [56]:
dc_backend = DataCollector(local_dir,backend_dir,bm)

In [60]:
ct_backend = dc_backend.get_case_table()
ct_backend

Unnamed: 0,case:concept:name,case:flops,case:num_kernels
0,algorithm0,202000.0,3
1,algorithm2,979000.0,3
2,algorithm4,1760000.0,3
3,algorithm1,202000.0,3
4,algorithm5,1760000.0,3
5,algorithm3,979000.0,3


In [62]:
mt_backend = dc_backend.get_runtimes_table()
mt_backend

Unnamed: 0,case:concept:name,concept:name,concept:flops,concept:operation,concept:kernel,timestamp:start,timestamp:end
0,algorithm0,gemm_6.75e+04,67500.0,tmp3 = (C D),"gemm!('N', 'N', 1.0, ml2, ml3, 0.0, ml4)",1658507000.0,1658507000.0
1,algorithm0,gemm_6.75e+04,67500.0,tmp1 = (A B),"gemm!('N', 'N', 1.0, ml0, ml1, 0.0, ml5)",1658507000.0,1658507000.0
2,algorithm0,gemm_6.75e+04,67500.0,tmp6 = (tmp1 tmp3),"gemm!('N', 'N', 1.0, ml5, ml4, 0.0, ml6)",1658507000.0,1658507000.0
3,algorithm2,gemm_6.75e+04,67500.0,tmp1 = (A B),"gemm!('N', 'N', 1.0, ml0, ml1, 0.0, ml4)",1658507000.0,1658507000.0
4,algorithm2,gemm_6.75e+04,67500.0,tmp4 = (tmp1 C),"gemm!('N', 'N', 1.0, ml4, ml2, 0.0, ml5)",1658507000.0,1658507000.0
5,algorithm2,gemm_8.44e+05,844000.0,tmp6 = (tmp4 D),"gemm!('N', 'N', 1.0, ml5, ml3, 0.0, ml6)",1658507000.0,1658507000.0
6,algorithm4,gemm_6.75e+04,67500.0,tmp2 = (B C),"gemm!('N', 'N', 1.0, ml1, ml2, 0.0, ml4)",1658507000.0,1658507000.0
7,algorithm4,gemm_8.44e+05,844000.0,tmp5 = (tmp2 D),"gemm!('N', 'N', 1.0, ml4, ml3, 0.0, ml5)",1658507000.0,1658507000.0
8,algorithm4,gemm_8.44e+05,844000.0,tmp6 = (A tmp5),"gemm!('N', 'N', 1.0, ml0, ml5, 0.0, ml6)",1658507000.0,1658507000.0
9,algorithm1,gemm_6.75e+04,67500.0,tmp1 = (A B),"gemm!('N', 'N', 1.0, ml0, ml1, 0.0, ml4)",1658507000.0,1658507000.0


#### 5. Filter for competing variants

In [63]:
filterAlgs_backend = FilterOnKPIs(ct_backend, mt_backend)

In [64]:
competing_algs_table = filterAlgs.filter_on_flops_and_rel_duration(1.2)
competing_algs_table

Unnamed: 0,case:concept:name,case:timestamp:start,case:timestamp:end,case:duration,case:flops,case:num_kernels,case:rel-flops,case:rel-duration
0,algorithm1,1658504000.0,1658504000.0,3.9e-05,202000.0,3,0.0,0.006135
3,algorithm0,1658504000.0,1658504000.0,3.9e-05,202000.0,3,0.0,0.0
4,algorithm3,1658504000.0,1658504000.0,8.2e-05,979000.0,3,3.846535,1.110429
5,algorithm2,1658504000.0,1658504000.0,8.2e-05,979000.0,3,3.846535,1.110429


In [66]:
competing_algs = list(competing_algs_table['case:concept:name'])
competing_algs

['algorithm1', 'algorithm0', 'algorithm3', 'algorithm2']

#### 6. Generate measurement scripts for the competing variants in the backend

In [67]:
measurements_script = "generate-measurements-script.py"
variants = competing_algs
reps = 3
run_id = 0

In [68]:
runner.generate_measurements_script(measurements_script, variants, run_id, reps)

source ~/.analyzer; cd sample_generation/experiments/75_75_6_75_75; python generate-measurements-script.py --algs algorithm1 algorithm0 algorithm3 algorithm2 --rep 3 --threads 4 --id 0
[]
[92mSuccess: Backend interactive run: Generate Measurement script 0[0m


0

#### 7. Run the measurement script in the backend

In [69]:
runner_competing_script = "runner_competing_0.jl"
runner.measure_variants(app="julia", runner_script=runner_competing_script)

Running Measurements Backend interactive
source ~/.analyzer; cd sample_generation/experiments/75_75_6_75_75; julia runner_competing_0.jl 
[]
[92mSuccess: Backend interactive run: Measurements from runner_competing_0.jl[0m


0

### Execute the measurement script in a batch system

Generate measurement script

In [72]:
run_id = 1
runner.generate_measurements_script(measurements_script, variants, run_id, reps)

source ~/.analyzer; cd sample_generation/experiments/75_75_6_75_75; python generate-measurements-script.py --algs algorithm1 algorithm0 algorithm3 algorithm2 --rep 3 --threads 4 --id 1
[]
[92mSuccess: Backend interactive run: Generate Measurement script 1[0m


0

Submit job

In [74]:
submit_cmd = "sbatch submit.sh"
runner_competing_script = "runner_competing_1.jl"
runner.measure_variants(app="julia", runner_script=runner_competing_script, submit_cmd=submit_cmd)

Running Measurements Backend batch
source ~/.analyzer; cd sample_generation/experiments/75_75_6_75_75; sbatch submit.sh julia 'runner_competing_1.jl '
['Submitted batch job 28808825\n']
[92mSuccess: Backend batch run: Measurements from runner_competing_1.jl[0m


0

Check job status

In [75]:
bm.check_slrum_status(runner.job_name)

          28808825        ih               75_75_6_75_75_T4 as641651  RUNNING       0:01   3:00:00      1 linuxihdc074



2

Check if file exists

In [78]:
bm.check_if_file_exists(os.path.join(runner.operands_dir, "run_times_competing_1.csv"))

True

#### 8. Collect data

In [79]:
mt_competing_0 = dc_backend.get_runtimes_competing_table(0)
mt_competing_1 = dc_backend.get_runtimes_competing_table(1)
mt_competing_0.head()

scp as641651@login18-1.hpc.itc.rwth-aachen.de:sample_generation/experiments/75_75_6_75_75/run_times_competing_1.csv sample_generation/cluster/
b''


Unnamed: 0,case:concept:name,concept:name,concept:flops,concept:operation,concept:kernel,timestamp:start,timestamp:end
0,algorithm1_02,gemm_6.75e+04,67500.0,tmp1 = (A B),"gemm!('N', 'N', 1.0, ml0, ml1, 0.0, ml4)",1658508000.0,1658508000.0
1,algorithm1_02,gemm_6.75e+04,67500.0,tmp3 = (C D),"gemm!('N', 'N', 1.0, ml2, ml3, 0.0, ml5)",1658508000.0,1658508000.0
2,algorithm1_02,gemm_6.75e+04,67500.0,tmp6 = (tmp1 tmp3),"gemm!('N', 'N', 1.0, ml4, ml5, 0.0, ml6)",1658508000.0,1658508000.0
3,algorithm2_02,gemm_6.75e+04,67500.0,tmp1 = (A B),"gemm!('N', 'N', 1.0, ml0, ml1, 0.0, ml4)",1658508000.0,1658508000.0
4,algorithm2_02,gemm_6.75e+04,67500.0,tmp4 = (tmp1 C),"gemm!('N', 'N', 1.0, ml4, ml2, 0.0, ml5)",1658508000.0,1658508000.0
