# Overview
This tutorial demonstrates how to use the Timeloop Front End (TimeloopFE) to run
Timeloop & Accelergy. TimeloopFE is a Python front-end interface for Timeloop
that lets users gather YAML inputs, edit them, and run Timeloop & Accelergy.

## Prerequisites
We recommend that you first complete the Timeloop & Accelergy tutorials and
familiarize yourself with the Timeloop & Accelergy APIs. There are cheatsheets
available in the cheatsheets directory to provide a quick reference for each
of the files.

We'll start by importing TimeloopFE and setting up the paths to the YAML files.

In [1]:
# Set up imports
import os
import timeloopfe.v4 as tl

# Define relative paths
ARCH_PATH = f"{os.curdir}/inputs/arch.yaml"
COMPONENTS_PATH = f"{os.curdir}/inputs/components.yaml"
PROBLEM_PATH = f"{os.curdir}/inputs/problem.yaml"
MAPPER_PATH = f"{os.curdir}/inputs/mapper.yaml"
VARIABLES_PATH = f"{os.curdir}/inputs/variables.yaml"
TOP_PATH = f"{os.curdir}/top.yaml.jinja"

# Setting up a TimeloopFE Specification 

TimeloopFE top-level objects are called `Specification`s. A `Specification`
collects all the inputs needed to run Timeloop+Accelergy and exposes them to
Python for editing and processing.

## Gathering YAML Inputs & Running Timeloop
We will crate a `Specification` object by gathering the following YAML files: 

- `ARCH_PATH` describes the hardware architecture, including storage,
  computation, and networks. It also describes any mapping restrictions that the
  architecture imposes.
- `COMPONENTS_PATH` describes compound components that may be used in the
  architecture file. A compound component is a collection of components that are
  grouped together into a single component (*e.g.,* a buffer and adder grouped
  together as a smartbuffer).
- `PROBLEM_PATH` describes the problem being mapped to the architecture as an
  extended Einsum expression.
- `MAPPER_PATH`: describes how the Timeloop mapper should search, and includes
  parameters such as the search algorithm, timeouts, and number of threads.
- `VARIABLES_PATH` defines global variables that can be referenced in the
  architecture, problem, and component(s) files.
  
Let's initalize a `Specification` object with these files and run the Timeloop
mapper. We'll run the mapper in the outputs/ directory, then print out the
summary output from the Timeloop output stats file.

In [2]:
spec = tl.Specification.from_yaml_files(
    ARCH_PATH,
    COMPONENTS_PATH,
    MAPPER_PATH,
    PROBLEM_PATH,
    VARIABLES_PATH,
)  # Gather YAML files into a Python object
tl.call_mapper(spec, output_dir=f"{os.curdir}/outputs")  # Run the Timeloop mapper
stats = open("outputs/timeloop-mapper.stats.txt").read()
print(stats[stats.index("Summary Stats") :])



Summary Stats
-------------
GFLOPs (@1GHz): 15.94
Utilization: 100.00%
Cycles: 1073741824
Energy: 56245.75 uJ
EDP(J*cycle): 6.04e+07
Area: 0.00 mm^2

Computes = 8589934592
fJ/Compute
    mac                          = 3275.00
    buffer                       = 710.37
    DRAM                         = 2562.50
    Total                        = 6547.87




## Timeloop Output Files

The outputs directory has many useful files with information on the simulation. Some important files include the mapping file, logs & per-component energy/area estimation tables from Accelergy, and detailed XML descriptions of all access counts.

Let's explore some of the outputs that are in the outputs/ directory.


In [3]:
def read_and_indent(path, n_lines=30, start_at: str = None, end_at: str = None):
    content = open(path).read()
    if start_at is not None:
        content = content[content.index(start_at) :]
    if end_at is not None:
        content = content[: content.index(end_at) + len(end_at)]
    content = content.split("\n")
    content = content[:n_lines] if n_lines > 0 else content[n_lines:]
    return "\t" + "\n\t".join(content)


print(f"Mapping:")
print(read_and_indent("outputs/timeloop-mapper.map.txt"))

Mapping:
	DRAM [ Weights:16384 (16384) Inputs:67108864 (67108864) Outputs:67108864 (67108864) ] 
	-------------------------------------------------------------------------------------
	| for Q in [0:64)
	|   for N in [0:8)
	|     for M in [0:4)
	|       for P in [0:128)
	
	buffer [ Weights:4096 (4096) Inputs:1024 (1024) Outputs:256 (256) ] 
	-------------------------------------------------------------------
	|         for M in [0:4)
	|           for N in [0:4)
	|             for Q in [0:2)
	|               for C in [0:128)
	
	inter_PE_spatial [ ] 
	--------------------
	|                 for M in [0:8) (Spatial-X)
	
	reg [ Weights:1 (1) Inputs:1 (1) Outputs:1 (1) ] 
	------------------------------------------------
	|                   << Compute >>
	


In [4]:
print(f"Accelergy log:")
print(read_and_indent("outputs/timeloop-mapper.accelergy.log", -60))


Accelergy log:
	2024-03-28 10:44:25 INFO        | CACTI output will be written to /home/tanner/research/cimloop/venv/share/accelergy/estimation_plug_ins/accelergy-cacti-plug-in/cacti_inputs_outputs/tmpd9lcnois
	2024-03-28 10:44:25 INFO        | CACTI executable not found at /home/tanner/research/cimloop/venv/share/accelergy/estimation_plug_ins/accelergy-cacti-plug-in/cacti/cacti
	2024-03-28 10:44:25 INFO        | Calling: cd /home/tanner/research/cimloop/venv/share/accelergy/estimation_plug_ins/accelergy-cacti-plug-in ; ./cacti -infile /home/tanner/research/cimloop/venv/share/accelergy/estimation_plug_ins/accelergy-cacti-plug-in/cacti_inputs_outputs/tmpo3h57vl2 >> /home/tanner/research/cimloop/venv/share/accelergy/estimation_plug_ins/accelergy-cacti-plug-in/cacti_inputs_outputs/tmpd9lcnois 2>&1
	2024-03-28 10:44:25 INFO        | Cache bandwidth: 32.0 bits/cycle
	2024-03-28 10:44:25 INFO        | Cache bandwidth: 331.75576555234744 bits/second
	2024-03-28 10:44:25 INFO        
	2024-03-

In [5]:
print(f"Energy reference table:")
print(read_and_indent("outputs/timeloop-mapper.ERT.yaml"))


Energy reference table:
	ERT:
	    version: '0.4'
	    tables:
	      - name: system_top_level.DRAM[1..1]
	        actions:
	          - name: update
	            arguments:
	                global_cycle_seconds: 1e-09
	                action_latency_cycles: 1
	            energy: 512.0
	          - name: read
	            arguments:
	                global_cycle_seconds: 1e-09
	                action_latency_cycles: 1
	            energy: 512.0
	          - name: write
	            arguments:
	                global_cycle_seconds: 1e-09
	                action_latency_cycles: 1
	            energy: 512.0
	          - name: leak
	            arguments:
	                global_cycle_seconds: 1e-09
	                action_latency_cycles: 1
	            energy: 0.0
	      - name: system_top_level.buffer[1..1]
	        actions:
	          - name: update
	            arguments:
	                global_cycle_seconds: 1e-09


In [6]:
print(f"Area reference table:")
print(read_and_indent("outputs/timeloop-mapper.ART.yaml"))


Area reference table:
	ART:
	    version: '0.4'
	    tables:
	      - name: system_top_level.DRAM[1..1]
	        area: 0.0
	      - name: system_top_level.buffer[1..1]
	        area: 43608.4
	      - name: system_top_level.inter_PE_spatial[1..1]
	        area: 1.0
	      - name: system_top_level.reg[1..8]
	        area: 0
	      - name: system_top_level.mac[1..8]
	        area: 1726.5
	


We can get more detailed logs and energy/area estimation information by
running tl.call_accelergy_verbose.

In [7]:
tl.call_accelergy_verbose(
    spec,
    output_dir=f"{os.curdir}/outputs",
    log_to=f"{os.curdir}/outputs/accelergy_verbose.log",
)
print(f"Verbose Accelergy log:")
print(read_and_indent("outputs/accelergy_verbose.log", -60))


Verbose Accelergy log:
	2024-03-28 10:44:26 DEBUG       |  | ADC Plug-In does not support aladdin_multiplier.None. Supported classes: ['adc', 'pim_adc', 'sar_adc', 'array_adc', 'pim_array_adc', 'cim_array_adc', 'cim_adc'].
	2024-03-28 10:44:26 DEBUG       | DigitalAnalogConverterX2XLadder with accuracy 0% estimating accuracy:
	2024-03-28 10:44:26 DEBUG       |  | Accuracy is 0%. Not supported.
	2024-03-28 10:44:26 DEBUG       |  | Class name aladdin_multiplier is not supported. Supported class names: ['dac_x2x_ladder']
	2024-03-28 10:44:26 DEBUG       | DigitalAnalogConverter_C2C with accuracy 0% estimating accuracy:
	2024-03-28 10:44:26 DEBUG       |  | Accuracy is 0%. Not supported.
	2024-03-28 10:44:26 DEBUG       |  | Class name aladdin_multiplier is not supported. Supported class names: ['dac_c2c_ladder']
	2024-03-28 10:44:26 DEBUG       | DigitalAnalogConverter_R2R with accuracy 0% estimating accuracy:
	2024-03-28 10:44:26 DEBUG       |  | Accuracy is 0%. Not supported.
	2024-03-

In [8]:
print(f"Verbose energy reference table:")
print(read_and_indent("outputs/ERT.yaml"))


Verbose energy reference table:
	ERT:
	    version: '0.4'
	    tables:
	      - name: system_top_level.DRAM[1..1]
	        actions:
	          - name: update
	            arguments:
	                global_cycle_seconds: 1e-09
	                action_latency_cycles: 1
	            energy: 512.0
	          - name: read
	            arguments:
	                global_cycle_seconds: 1e-09
	                action_latency_cycles: 1
	            energy: 512.0
	          - name: write
	            arguments:
	                global_cycle_seconds: 1e-09
	                action_latency_cycles: 1
	            energy: 512.0
	          - name: leak
	            arguments:
	                global_cycle_seconds: 1e-09
	                action_latency_cycles: 1
	            energy: 0.0
	      - name: system_top_level.buffer[1..1]
	        actions:
	          - name: update
	            arguments:
	                global_cycle_seconds: 1e-09


In [9]:
print(f"Verbose area reference table:")
print(read_and_indent("outputs/ART.yaml"))


Verbose area reference table:
	ART:
	    version: '0.4'
	    tables:
	      - name: system_top_level.DRAM[1..1]
	        area: 0.0
	      - name: system_top_level.buffer[1..1]
	        area: 43608.4
	      - name: system_top_level.inter_PE_spatial[1..1]
	        area: 1.0
	      - name: system_top_level.reg[1..8]
	        area: 0
	      - name: system_top_level.mac[1..8]
	        area: 1726.5
	


## Jinja2 + YAML Input

The front-end supports Jinja2[https://jinja.palletsprojects.com/en/3.1.x/]
templating that may be used to automate some aspects of YAML file writing.
We will use a Jinja template to gather the top files so we don't have to list
each of them in Python.


In [10]:
print(f"Top path contents:")
print(read_and_indent("top.yaml.jinja", 99999))
spec = tl.Specification.from_yaml_files(TOP_PATH)

Top path contents:
	# This file will be converted into a YAML file by Jinja2 templating.
	{{add_to_path(cwd() ~ '/inputs')}}
	
	# Grab the necessary top keys from each file and put them here
	architecture: {{include('arch.yaml', 'architecture')}}
	components:   {{include('components.yaml', 'components')}}
	variables:    {{include('variables.yaml', 'variables')}}
	mapper:       {{include('mapper.yaml', 'mapper')}}
	problem:      {{include(problem|default('problem.yaml'), 'problem')}}
	


The top-level file gathered each of the YAML inputs, letting us specify just one
path in Python. Top level files can perform more advanced functions, such as
defining variables, importing scripts, or setting environment variables. Any
input file can use Jinja2 templating, and we can mix-and-match Jinja2 files with
standard YAML.

We call the Timeloop mapper using the Jinja2-templated architecture below, yielding the
same output as the previous call.

In [11]:
tl.call_mapper(spec, output_dir=f"{os.curdir}/outputs")  # Run the Timeloop mapper
print(stats[stats.index("Summary Stats") :])

Summary Stats
-------------
GFLOPs (@1GHz): 15.94
Utilization: 100.00%
Cycles: 1073741824
Energy: 56245.75 uJ
EDP(J*cycle): 6.04e+07
Area: 0.00 mm^2

Computes = 8589934592
fJ/Compute
    mac                          = 3275.00
    buffer                       = 710.37
    DRAM                         = 2562.50
    Total                        = 6547.87




# Editing the Specification
TimeloopFE lets users edit the specification in Python. This is useful for
automating design space exploration, making small changes, or editing
the problem programmatically. Let's take a look at some of the ways we can edit
the specification.

## Modifying Mapper Parameters
Let's see if we can find a better mapping by changing the mapper parameters. We'll start by printing out the mapper parameters defined in the `MAPPER_PATH` file.

In [12]:
print(open(MAPPER_PATH).read())

mapper:
  version: 0.4
  optimization_metrics: [ edp ]
  live_status: False
  num_threads: 4
  search_size:       100    # Max valid mappings per-thread
  victory_condition: 10000  # Exit once a mapping is better than this number of
                            # valid mappings in a row
  timeout: 10000            # Max invalid mappings in a row
  max_permutations_per_if_visit: 4 # We fix permutations with the Greedy Mapper
  algorithm: random_pruned # linear_pruned
  max_temporal_loops_in_a_mapping: 9



It looks like the YAML file set search_size to only 100... let's see if we can
get a better mapping by searching 1000 mappings. We'll initialize a
specification, then edit the search_size parameter. 

We can access any piece of the specification by indexing relative to the
top-level object. For example, we can access the mapper with `spec.mapper`, the
architecture with `spec.architecture`, and the problem with `spec.problem`.
Within each, we can further index into the Specification using the keys that
correspond to the YAML objects in the input files. The organization of the
Python objects is the same as that of the YAML objects, so any indexing that
would work on the input YAML files will also work in Python.


In [13]:
spec = tl.Specification.from_yaml_files(TOP_PATH)
spec.mapper.search_size = 1000
tl.call_mapper(spec, output_dir=f"{os.curdir}/outputs")  # Run the Timeloop mapper
stats = open("outputs/timeloop-mapper.stats.txt").read()
print(stats[stats.index("Summary Stats") :])

Summary Stats
-------------
GFLOPs (@1GHz): 15.94
Utilization: 100.00%
Cycles: 1073741824
Energy: 55710.75 uJ
EDP(J*cycle): 5.98e+07
Area: 0.00 mm^2

Computes = 8589934592
fJ/Compute
    mac                          = 3275.00
    buffer                       = 709.61
    DRAM                         = 2500.98
    Total                        = 6485.58




We found a lower overall energy by searching more mappings!

## Modifying the Architecture and Problem

The architecture & problem can be edited in a similar way to the mapper. Let's
change the architecture to work on bigger problems. We'll double the number of
PEs, double the global buffer size, and double the batch size (dimension N) in the problem.

Like before, we'll initialize a specification. We then index into the problem and architecture to edit our components.

The architecture includes a `find` function that lets us find components by name.

In [14]:
spec = tl.Specification.from_yaml_files(TOP_PATH)
spec.problem.instance["N"] *= 2
spec.architecture.find("buffer").attributes["depth"] *= 2
spec.architecture.find("PE").spatial.meshX *= 2
spec.mapper.search_size = 1000
tl.call_mapper(spec, output_dir=f"{os.curdir}/outputs")  # Run the Timeloop mapper
stats = open("outputs/timeloop-mapper.stats.txt").read()
print(stats[stats.index("Summary Stats") :])

Summary Stats
-------------
GFLOPs (@1GHz): 31.88
Utilization: 100.00%
Cycles: 1073741824
Energy: 99983.28 uJ
EDP(J*cycle): 1.07e+08
Area: 0.00 mm^2

Computes = 17179869184
fJ/Compute
    mac                          = 3275.00
    buffer                       = 1044.73
    DRAM                         = 1500.06
    Total                        = 5819.79




## The `get_nodes_of_type` and `find` Functions
`get_nodes_of_type` and `find` functions can be used to find nodes in the
specification. The `get_nodes_of_type` function returns a list of all nodes of a
given type, starting from a given node. The `find` function is is only for the
architecture, returning the architecture component or container that matches a
given name. We can use these function to traverse the specification and search
for particular nodes.

In [15]:
spec = tl.Specification.from_yaml_files(TOP_PATH)

# List all dataspace constraintas
for ds_constraint in spec.get_nodes_of_type(tl.constraints.Dataspace):
    print(f'Found dataspace constraint: {ds_constraint}')
    
# Print DRAM from the architecture
print(f'DRAM: {spec.architecture.find("DRAM")}')
print(f'DRAM Attributes: {spec.architecture.find("DRAM").attributes}')

Found dataspace constraint: dataspace constraint(target=) Specification[architecture].Architecture[nodes].ArchNodes[0].Container(system)[constraints].ConstraintGroup[dataspace].Dataspace
Found dataspace constraint: dataspace constraint(target=) Specification[architecture].Architecture[nodes].ArchNodes[1].Storage(DRAM)[constraints].ConstraintGroup[dataspace].Dataspace
Found dataspace constraint: dataspace constraint(target=) Specification[architecture].Architecture[nodes].ArchNodes[2].Storage(buffer)[constraints].ConstraintGroup[dataspace].Dataspace
Found dataspace constraint: dataspace constraint(target=) Specification[architecture].Architecture[nodes].ArchNodes[3].Container(PE)[constraints].ConstraintGroup[dataspace].Dataspace
Found dataspace constraint: dataspace constraint(target=) Specification[architecture].Architecture[nodes].ArchNodes[4].Storage(reg)[constraints].ConstraintGroup[dataspace].Dataspace
Found dataspace constraint: dataspace constraint(target=) Specification[architec

## Modifying Specifications with Jinja2 Templating
Jinja2 expressions can be used to automate the specification. For example, we
have defined the DATAWIDTH variable in Jinja2 in the variables file. This
variable is used in the architecture file to set the datawidth of the buffers.

Of course, the variables file can be edited in Python as well.

In [16]:
print(f"Variables file:")
print(f'\t{read_and_indent(VARIABLES_PATH, start_at="DATAWIDTH")}')
print(f"Setting variable with Jinja2")

spec = tl.Specification.from_yaml_files(TOP_PATH, jinja_parse_data={"datawidth_jinja": 16})

tl.call_mapper(spec, output_dir=f"{os.curdir}/outputs")  # Run the Timeloop mapper
stats = open("outputs/timeloop-mapper.stats.txt").read()
print(stats[stats.index("Summary Stats") :])

# OR edit the variables file directly
spec.variables["DATAWIDTH"] = 32

Variables file:
		DATAWIDTH: {{datawidth_jinja|default(8)}} # 8 bits is the default
Setting variable with Jinja2
Summary Stats
-------------
GFLOPs (@1GHz): 15.94
Utilization: 100.00%
Cycles: 1073741824
Energy: 129148.55 uJ
EDP(J*cycle): 1.39e+08
Area: 0.00 mm^2

Computes = 8589934592
fJ/Compute
    mac                          = 6550.00
    buffer                       = 1453.62
    DRAM                         = 7031.25
    Total                        = 15034.87




Jinja2 can be used to enable/disable components in the architecture file. Let's
see how this can be done with the architecture file. We have a register that is
set to be disabled by default. We can enable it by setting a Jinja2 variable.

Of course, the architecture file can be edited in Python as well.

In [17]:
print(f"Architecture file:")
print(f'{read_and_indent(ARCH_PATH, start_at="{%", end_at="{% endif %}")}')
spec = tl.Specification.from_yaml_files(TOP_PATH, jinja_parse_data={"reg_enabled": False})
spec.mapper.search_size = 1000
tl.call_mapper(spec, output_dir=f"{os.curdir}/outputs")  # Run the Timeloop mapper
stats = open("outputs/timeloop-mapper.stats.txt").read()
print(stats[stats.index("Summary Stats") :])

# OR edit the architecture file directly
spec = tl.Specification.from_yaml_files(TOP_PATH)
spec.architecture.find("reg").enabled = False


Architecture file:
	{% if reg_enabled|default(True) %}
	  - !Component # Global buffer for inputs & outputs
	    name: reg
	    class: SRAM
	    subclass: register
	    attributes: 
	      datawidth: DATAWIDTH
	      depth: 1
	      width: datawidth * 3
	    constraints:
	      dataspace: {keep: [Inputs, Outputs, Weights]}
	      temporal: {factors_only: []}
	    {% endif %}
Summary Stats
-------------
GFLOPs (@1GHz): 15.94
Utilization: 100.00%
Cycles: 1073741824
Energy: 67000.06 uJ
EDP(J*cycle): 7.19e+07
Area: 0.00 mm^2

Computes = 8589934592
fJ/Compute
    mac                          = 3275.00
    buffer                       = 2024.83
    DRAM                         = 2500.00
    Total                        = 7799.83




# What can be edited with Python?

TimeloopFE can edit any piece of the specification that is exposed in Python. To
see what can be edited, we can use the `get_property_tree` and
`get_property_table` functions. These functions show all the properties that can
be edited, as well as any required types, default values, and other information.

To see a list of every property that can be edited in Python, we can use
`get_property_tree` or `get_property_table` with no arguments.


In [18]:
# Get a recursive list of all the properties that can be edited in Python
print(tl.get_property_tree(tl.components.CompoundComponent))


[KEY_OR_TAG]: [EXPECTED_TYPE] [REQUIRED or = DEFAULT_VALUE]
├─ SUBNODES (If applicable)

CompoundComponent
├─ '*ignore*': None Optional
├─ 'name': str REQUIRED
├─ 'attributes': ComponentAttributes = '{}'
│  ├─ '*ignore*': None Optional
│  └─ '**': None Optional
├─ 'subcomponents': SubcomponentList = '[]'
│  ├─ 'ignore': None 
│  └─ '': Subcomponent 
│     ├─ '*ignore*': None Optional
│     ├─ 'name': str REQUIRED
│     ├─ 'attributes': ComponentAttributes = '{}'
│     │  ├─ '*ignore*': None Optional
│     │  └─ '**': None Optional
│     └─ 'area_share': Number/str = '1'
└─ 'actions': ActionsList = '[]'
   ├─ 'ignore': None 
   └─ '': Action 
      ├─ '*ignore*': None Optional
      ├─ 'name': str REQUIRED
      ├─ 'arguments': DictNode = '{}'
      │  └─ '*ignore*': None Optional
      └─ 'subcomponents': ActionSubcomponentsList = '[]'
         ├─ 'ignore': None 
         └─ '': SubcomponentActionGroup 
            ├─ '*ignore*': None Optional
            ├─ 'name': str REQUIRED
      

In [19]:
# Get a table of all the properties that can be edited in Python for this element
print(tl.get_property_table(tl.components.CompoundComponent))

# print(tl.doc.get_property_tree())   # These print every property for every class
# print(tl.doc.get_property_table())  # in the specification. This is a lot of text!




==== CompoundComponent ====
  KEY                      ,REQUIRED_TYPE            ,DEFAULT                  ,CALLFUNC                 ,SET_FROM                 
  ignore                   ,None                     ,None                     ,None                     ,None                     
  name                     ,str                      ,REQUIRED                 ,None                     ,None                     
  attributes               ,ComponentAttributes      ,{}                       ,ComponentAttributes      ,None                     
  subcomponents            ,SubcomponentList         ,[]                       ,SubcomponentList         ,None                     
  actions                  ,ActionsList              ,[]                       ,ActionsList              ,None                     
