# Your name:
Cindy Yang

In [1]:
import pandas as pd
import numpy as np

## Part 2: Full System (PE Array, Global Buffer, NoC) Modeling
Now that you are familiar with the simple PE setup, let’s look at a full system as shown in the figure below. This design is composed of two levels of on-chip storage --- the global buffer and the local scratchpads in each PE as described in part 1. Each datatype is sent via a network from the global buffer to the PE array, and there are inter-PE networks that are capable of sending various data types within the array. We provide you with the loop nest of this design in the figure below. 

<br>
<div class="row">
  <div class="column">
    <img align="left" src="designs/system_manual/figures/system_arch.png" alt="Full System  Architecture Diagram" style="margin:50px 0px 0px 50px; width:40%">
  </div>
  <div class="column">
    <img  align="left"  src="designs/system_manual/figures/system_loopnest.png" alt="System Loopnest" style="width:50%">
  </div>
</div>

## Question 2  Manual Exploration of the Mapspace

### Question 2.1
You are provided with a PE array that has 16 PEs. Assume you can design different architectures and associated mappings for every layer shape (i.e. both architecture yaml and mapping yaml can change across layer shapes). 

In specific, you can select the height and width of the PE array as long as the total number of PEs euqal to 16, while keeping other architectural attributes the same. We have provided 2 possible architecture descriptions for this question are provided in the `designs/system_manual/arch` folder, they are `system_arch_1x16.yaml` and `system_arch_2x8.yaml`.


1. Please examine the provided architecture descriptions, and modify `system_arch_4x4.yaml` to create an architecture description that has the same buffer sizes and a PE array of physical dimension 4x4. Which hardware attributes do you need to change?

   I changed the meshX and meshY attributes of the scratchpad.

   

2. In 1 or 2 sentences, explain why running the same workload on architectures with different physical PE array dimensions might result in different performance (*e.g.,* energy, throughput)?

   Different PE array dimensions represent different spatial mappings and affects communication patterns between PEs. A more efficient design maximizes local communication between spatially close PEs and minimizes data movement between PEs and the global buffer. 


### Question 2.2
In this question, we would like you to find the best architecture (among the three architectures in question 2.1) and the associated mapping that has the highest throughput (minimizes the number of cycles) for `layer_shapes/tiny_layer.yaml`. If two architectures result in the same throughput, choose the one that's less energy consuming.
  
<font color=blue> <b>Your mapping has to agree with the loop nest provided above. To simplify your search, please further assume that: </b>
    
   - input channels can only be spatially mapped to the rows of the PE array and output channels can only be spatially mapped to the columns of the PE array.
    
   - PE scrachpads only store filter weights 
    
</font>

A sample mapping for `system_arch_1x16.yaml` is provided in `designs/system_manual/map`. You can add more in this directory. As a reminder, to perform a simulation of the architecture with a specified mapping, you should run the command (*replace the archtecture file name, mapping file name, and layer shape file name approporiatly*) - see the below command cell. 

```timeloop-model arch/<arch_file_name>.yaml arch/components/*  map/<mapping_file_name>.yaml ../../layer_shapes/<layer_shape_file_name>.yaml``` 


Please fill in the table below to provide your answer.

In [63]:
%%bash
cd designs/system_manual/
timeloop-model arch/system_arch_2x8.yaml arch/components/*  map/mapping_4x4.yaml ../../layer_shapes/tiny_layer.yaml


execute:/usr/local/bin/accelergy arch/system_arch_2x8.yaml arch/components/mac_compute.yaml arch/components/reg_storage.yaml arch/components/smart_storage.yaml map/mapping_4x4.yaml ../../layer_shapes/tiny_layer.yaml --oprefix timeloop-model. -o ./ > timeloop-model.accelergy.log 2>&1
Generate Accelergy ART (area reference table) to replace internal area model.
Utilization = 1.00 | pJ/MACC =   14.728




In [64]:
# the Question 2.2 chart
d = {'problem': ['tiny_layer'],  # fill in your answer here
     'architecture name': ['2x8'], # fill in your answer here
     'number of cycles': [8100],   # fill in your answer here
     'total energy (uJ)': [1.91],  # fill in your answer here
     'M3': [2],
     'N3': [2],
     'C3': [9],
     'M2': [1],
     'N2': [1],
     'C2': [1],
     'M1': [8],
     'C1': [2],
     'N0': [1]
    }
df = pd.DataFrame(data=d)
print(df.to_string(index=False, justify='center'))

 problem   architecture name  number of cycles  total energy (uJ)  M3  N3  C3  M2  N2  C2  M1  C1  N0
tiny_layer        2x8               8100              1.91         2   2   9   1   1   1   8   2   1 


### Question 3 Mapspace Exploration with Timeloop

Mananully generating the best mapping for each architecture and layer shape is rather time-consuming, even if the search is performed under a tightly constrained map sapace, *e.g.,* the one in question 2.2. Therefore, timeloop does provide the automatic mapping space search functinality when appropriate map space constriants are given.

To perform an automatic mapping space search, you need to provide a mapspace constraint file as an input. A mapspace constraint file specifies the limitations imposed by your dataflow or hardware structures. An example mapping space constraint file for the loop nest above can be found in `designs/system_auto/constraints/example_constraints.yaml` To automatically search the mapspace with the constraints file, you should run the command - see the command cell below

`timeloop-mapper arch/system_arch_2x8.yaml arch/components/* constraints/example_constraints.yaml mapper/mapper.yaml ../../layer_shapes/tiny_layer.yaml`

*The search should take less than 5 minutes to finish. If you are running this command from the shell instead of running the below cell, you can also temrinate it whenever you want by pressing Ctrl+C (you will need to wait for timeloop to finish the remainig computations after you send the signal; the terminated threads will have a dash next to its id).*

In [33]:
%%bash
cd designs/system_auto/
timeloop-mapper arch/system_arch_2x8.yaml arch/components/* constraints/example_constraints.yaml mapper/mapper.yaml ../../layer_shapes/tiny_layer.yaml

  _______                __                
 /_  __(_)___ ___  ___  / /___  ____  ____ 
  / / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
 / / / / / / / / /  __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/ 
                                  /_/      

Problem configuration complete.
execute:/usr/local/bin/accelergy arch/system_arch_2x8.yaml arch/components/mac_compute.yaml arch/components/reg_storage.yaml arch/components/smart_storage.yaml constraints/example_constraints.yaml mapper/mapper.yaml ../../layer_shapes/tiny_layer.yaml --oprefix timeloop-mapper. -o ./ > timeloop-mapper.accelergy.log 2>&1
Generate Accelergy ERT (energy reference table) to replace internal energy model.
Generate Accelergy ART (area reference table) to replace internal area model.
Architecture configuration complete.
Using threads = 8
Mapper configuration complete.
Initializing Index Factorization subspace.
  Factorization options along problem dimension C = 18
  Factorization options along problem 

[  2] Utilization = 0.12 | pJ/MACC =   38.208 | L5[WIO] M16 C9 - L4[WIO] N2 C2X - L3[W] R3 S3 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  3] Utilization = 0.06 | pJ/MACC =   39.815 | L5[WIO] M16 C6 - L4[WIO] C3 - L3[W] R3 S3 N2 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  4] Utilization = 0.12 | pJ/MACC =   26.724 | L5[WIO] N2 M8 C9 - L4[WIO] M2 C2X - L3[W] R3 S3 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  0] Utilization = 0.25 | pJ/MACC =   16.542 | L5[WIO] N2 M2 C18 - L4[WIO] M2 M4Y - L3[W] R3 S3 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  6] Utilization = 0.12 | pJ/MACC =   24.131 | L5[WIO] M8 C9 - L4[WIO] N2 M2 C2X - L3[W] R3 S3 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  5] Utilization = 0.06 | pJ/MACC =   28.336 | L5[WIO] N2 M8 C9 - L4[WIO] M2 C2 - L3[W] R3 S3 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  1] Utilization = 0.06 | pJ/MACC =   28.336 | L5[WIO] N2 M8 C3 - L4[WIO] M2 C6 - L3[W] R3 S3 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  4] Utilization = 0.12 | pJ/MACC =   17.075 | L5[WIO

### Question 3.1

In this question, we have provided you with a much more relaxed constraint file `designs/system_auto/constraints/relaxed_constraints.yaml`. 

    
    
1. Please examine the file, and list two additional relaxations on the mapspace constraints in `relaxed_contraints.yaml ` comparing to `example_contsraints.yaml` (*Note: there are more than two relaxations, but you only need to list two*)
 
   1. We removed the temporal constraints on M, C in the scratchpad.
   2. We removed temporal constraints on the DRAM.
    

2. Please copy and paste your implementation of the 4x4 architecture from the `system_manual/arch` folder to `system_atuo/arch` folder. With the `relaxed_constraints.yaml` constraint file, run auto-search on possible architectures (among 1x16, 2x8, and 4x4) for `tiny_layer`, `depth_wise` layer and `point_wise`. Find the architecture that has the highest throughput. If two architectures result in the same throughput, choose the one that's less energy consuming. Please fill in the chart below. 


In [57]:
%%bash
cd designs/system_auto/
timeloop-mapper arch/system_arch_2x8.yaml arch/components/*  constraints/relaxed_constraints.yaml mapper/mapper.yaml ../../layer_shapes/point_wise.yaml

  _______                __                
 /_  __(_)___ ___  ___  / /___  ____  ____ 
  / / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
 / / / / / / / / /  __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/ 
                                  /_/      

Problem configuration complete.
execute:/usr/local/bin/accelergy arch/system_arch_2x8.yaml arch/components/mac_compute.yaml arch/components/reg_storage.yaml arch/components/smart_storage.yaml constraints/relaxed_constraints.yaml mapper/mapper.yaml ../../layer_shapes/point_wise.yaml --oprefix timeloop-mapper. -o ./ > timeloop-mapper.accelergy.log 2>&1
Generate Accelergy ERT (energy reference table) to replace internal energy model.
Generate Accelergy ART (area reference table) to replace internal area model.
Architecture configuration complete.
Using threads = 8
Mapper configuration complete.
Initializing Index Factorization subspace.
  Factorization options along problem dimension C = 40
  Factorization options along problem 

[  1] Utilization = 0.06 | pJ/MACC =   37.060 | L5[WIO] N2 M2 C2 - L4[WIO] C6 - L3[W] M10 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  4] Utilization = 0.06 | pJ/MACC =   86.312 | L5[WIO] M10 C6 - L4[WIO] N2 M2 - L3[W] C2 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  0] Utilization = 0.31 | pJ/MACC =   29.289 | L5[WIO] N2 M4 - L4[WIO] Q1 M5Y - L3[W] C12 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  5] Utilization = 0.12 | pJ/MACC =   35.447 | L5[WIO] N2 M2 C6 - L4[WIO] Q1 C2X - L3[W] M10 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  4] Utilization = 0.06 | pJ/MACC =   50.021 | L5[WIO] N2 M4 C6 - L4[WIO] Q1 - L3[W] M5 C2 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  3] Utilization = 0.12 | pJ/MACC =   35.447 | L5[WIO] N2 M2 C2 - L4[WIO] C3 C2X - L3[W] M10 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  2] Utilization = 0.62 | pJ/MACC =   27.677 | L5[WIO] N2 M4 - L4[WIO] Q1 M5Y C2X - L3[W] C6 P5 Q5 - L2[W] Q1 - L1[I] Q1 - L0[O] Q1 
[  6] Utilization = 0.31 | pJ/MACC =   29.289 | L5[WIO] N2 M4 - L4[WIO] C12

In [65]:
# the Question 3.1.2 chart
d = {'problem': ['tiny_layer', 'depth_wise', 'point_wise'],  
     'architecture name': ['2x8' , '1x16', '4x4'], # fill in your answer here
     'number of cycles': [ 8100, 2700, 750],    # fill in your answer here
     'total energy (uJ)': [ 1.91, 0.84, 0.32],   # fill in your answer here
    }
df = pd.DataFrame(data=d)
print(df.to_string(index=False, justify='center'))

 problem   architecture name  number of cycles  total energy (uJ)
tiny_layer         2x8              8100              1.91       
depth_wise        1x16              2700              0.84       
point_wise         4x4               750              0.32       


### Question 3.2
Your circuit designer has told you that it is too expensive to have a separate architecture for each layer shape. You must now have a fixed architecture (i.e. fixed height and width of the PE array). Based on this specific architecture, you can change the mapping according to different layer shapes. 

What is the best architecture that achieves the **highest average throughput** of those three layer shapes among all the architectures explored in question 3.1? Please fill in the chart below.



In [66]:
# the Question 3.2 chart
d = {'problem': ['tiny_layer', 'depth_wise', 'point_wise'],  
     'architecture name': ['2x8','2x8','2x8'], # fill in your answer here
     'number of cycles': [8100,4050,1200],    # fill in your answer here
     'total energy (uJ)': [1.91,0.86,0.33],   # fill in your answer here
    }
df = pd.DataFrame(data=d)
print(df.to_string(index=False, justify='center'))

 problem   architecture name  number of cycles  total energy (uJ)
tiny_layer        2x8               8100              1.91       
depth_wise        2x8               4050              0.86       
point_wise        2x8               1200              0.33       


## Question 4 Architectures with New Technologies

So far, we have been looking at conventional architectures based on digital VLSI designs. There are also many DNN accelerator designs that are based on emerging technologies, such as optical DNN accelerators and processing-in-memory (PIM) DNN accelerators. In this question, we are going to evaluate a PIM DNN accelerator design. The PIM design can be found at `designs/PIM` .

### Question 4.1 
Please take a look at the architecture description and the compound components descriptions at `designs/PIM/arch`. You will realize the compound components are much more complicated than the ones we presented before. Examine the `designs/PIM/arch/components/ADC_SimpleMulticast` class YAML definition and the hierachical tree description below. What are the missing subcomponent names? We have provided one subcomponent name for you, please follow the convention and provide you anwser in the cell below.

*Hint: to find the definition of a sub-compound-component, you need to find its class definition in another file stored in the component folder*

<img align="left" src="designs/PIM/figures/simplemulticast_tree.png" alt="Full System  Architecture Diagram" style="margin:0px 0px 0px 0px; width:70%">

- (1):  provide your anwser here... 
A2D_converter 
- (2):  provide your anwser here...
digital_accumulator
- (3):  provide your anwser here...
ADC
- (4):  provide your anwser here...
SH[0..total_SHs-1]
- (5):  provide your anwser here...
S_A
- (6):  provide your anwser here...
outputBuffer

### Question 4.2
Navigate to the folder `designs/PIM`. 


1. Run `accelergy arch/ `. Recall that this command generates the energy and area charaterizations of the architecture. Examine the output files, and fill in the table below

*Hint: mac compute energy should not be a large number, e.g., >100. If so, you probably restarted/recreated the docker container and therefore erased the PIM plug-in path added by the 
accelergyTables command in the readme*. Please rerun:

```
accelergyTables -r /home/workspace/<your-lab4-folder-name>/PIM_estimation_tables
```

2. Our PIM accelerator program the weights in the memory cells (i.e.,Each PE is reposible for storing 1 16-bit weight value in its scratchpad) and does not reload weights during the run of a layer (reflected in the constraints file). Calculate the number of PEs needed to store all the weights for `problems/tiny_layer.yaml`. 

   M=16, C=18, R=S=3, so there are 2592 weights and 2592 PEs needed to store all the weights.
   

3.  Run `timeloop-mapper arch/*.yaml arch/components/*.yaml mapper/* constraints/* ../../layer_shapes/tiny_layer.yaml`. Is timeloop able to find any mappings? If not, in 1 or 2 sentences, explain why. If yes, provide the number of cycles and total energy consumption for running the workload.

  Timeloop can't find any mappings because there are only 64 PEs, which is not enough to store the weights.
  

In [60]:
%%bash
cd designs/PIM/
timeloop-mapper arch/*.yaml arch/components/*.yaml mapper/* constraints/* ../../layer_shapes/tiny_layer.yaml

  _______                __                
 /_  __(_)___ ___  ___  / /___  ____  ____ 
  / / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
 / / / / / / / / /  __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/ 
                                  /_/      

Problem configuration complete.
execute:/usr/local/bin/accelergy arch/system_PIM.yaml arch/components/A2D_conversion_system.yaml arch/components/ADC_SimpleMulticast.yaml arch/components/D2A_conversion_system.yaml arch/components/DAC_SimpleMulticast.yaml arch/components/digital_accumulation_system.yaml arch/components/memcell_compute.yaml arch/components/storage.yaml mapper/mapper.yaml constraints/constraints.yaml ../../layer_shapes/tiny_layer.yaml --oprefix timeloop-mapper. -o ./ > timeloop-mapper.accelergy.log 2>&1
Generate Accelergy ERT (energy reference table) to replace internal energy model.
Generate Accelergy ART (area reference table) to replace internal area model.
Architecture configuration complete.
Using threads = 

[  0] STATEMENT: 15000 invalid mappings (15000 fanout, 0 capacity) found since the last valid mapping, terminating search.
[  4] STATEMENT: 15000 invalid mappings (15000 fanout, 0 capacity) found since the last valid mapping, terminating search.
[  3] STATEMENT: 15000 invalid mappings (15000 fanout, 0 capacity) found since the last valid mapping, terminating search.
[  7] STATEMENT: 15000 invalid mappings (15000 fanout, 0 capacity) found since the last valid mapping, terminating search.
[  1] STATEMENT: 15000 invalid mappings (15000 fanout, 0 capacity) found since the last valid mapping, terminating search.
[  2] STATEMENT: 15000 invalid mappings (15000 fanout, 0 capacity) found since the last valid mapping, terminating search.
[  5] STATEMENT: 15000 invalid mappings (15000 fanout, 0 capacity) found since the last valid mapping, terminating search.
[  6] STATEMENT: 15000 invalid mappings (15000 fanout, 0 capacity) found since the last valid mapping, terminating search.


In [None]:
# the Question 4.2.1 chart
d = {'scratchpad access energy (pJ)': [],   # fill in your answer here
     'mac compute energy (pJ)': [],         # fill in your answer here  
     'D2A_NoC average energy (pJ)': [],     # fill in your answer here
     'A2D_NoC average energy (pJ)': [],     # fill in your answer here
    }
df = pd.DataFrame(data=d)
print(df.to_string(index=False, justify='center'))

### Question 4.3

Navigate to `designs/PIM_large`. 

In this folder, we provide you with an architecture with a larger PE array of size 144*18. 

1. Run `timeloop-mapper arch/*.yaml arch/components/*.yaml mapper/* constraints/* ../../layer_shapes/tiny_layer.yaml`. Is timeloop able to find any mappings? If not, in 1 or 2 sentences, explain why. If yes, provide the number of cycles and total energy consumption for running the workload.

  The number of cycles 
 


In [61]:
%%bash
cd designs/PIM_large/
timeloop-mapper arch/*.yaml arch/components/*.yaml mapper/* constraints/* ../../layer_shapes/tiny_layer.yaml

  _______                __                
 /_  __(_)___ ___  ___  / /___  ____  ____ 
  / / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
 / / / / / / / / /  __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/ 
                                  /_/      

Problem configuration complete.
execute:/usr/local/bin/accelergy arch/system_PIM.yaml arch/components/A2D_conversion_system.yaml arch/components/ADC_SimpleMulticast.yaml arch/components/D2A_conversion_system.yaml arch/components/DAC_SimpleMulticast.yaml arch/components/digital_accumulation_system.yaml arch/components/memcell_compute.yaml arch/components/storage.yaml mapper/mapper.yaml constraints/constraints.yaml ../../layer_shapes/tiny_layer.yaml --oprefix timeloop-mapper. -o ./ > timeloop-mapper.accelergy.log 2>&1
Generate Accelergy ERT (energy reference table) to replace internal energy model.
Generate Accelergy ART (area reference table) to replace internal area model.
Architecture configuration complete.
Using threads = 

[  1] Utilization = 1.00 | pJ/MACC =    4.447 | L3[IO] P5 - L2[IO] Q5 - L1[] N2 M16Y C18X S3X R3X - L0[W] Q1 
[  2] Utilization = 1.00 | pJ/MACC =    4.447 | L3[IO] P5 - L2[IO] Q5 N2 - L1[] Q1 M16Y C18X S3X R3X - L0[W] Q1 
[  5] Utilization = 1.00 | pJ/MACC =    4.447 | L3[IO] P5 - L2[IO] Q1 - L1[] Q5 N2 M16Y C18X S3X R3X - L0[W] Q1 
[  6] Utilization = 1.00 | pJ/MACC =    4.447 | L3[IO] Q5 - L2[IO] Q1 - L1[] N2 M16Y C18X S3X R3X - L0[W] P5 
[  4] Utilization = 1.00 | pJ/MACC =    4.447 | L3[IO] Q5 - L2[IO] Q1 - L1[] Q1 M16Y C18X S3X R3X - L0[W] P5 N2 
[  7] Utilization = 1.00 | pJ/MACC =    4.447 | L3[IO] P5 - L2[IO] Q1 - L1[] Q1 M16Y C18X S3X R3X - L0[W] Q5 N2 
[  3] Utilization = 1.00 | pJ/MACC =    4.447 | L3[IO] N2 P5 - L2[IO] Q5 - L1[] Q1 M16Y C18X S3X R3X - L0[W] Q1 
[  0] Utilization = 1.00 | pJ/MACC =    4.447 | L3[IO] Q5 - L2[IO] P5 - L1[] N2 M16Y C18X S3X R3X - L0[W] Q1 
[  0] STATEMENT: 800 suboptimal mappings found since the last upgrade, terminating search.
[  7] STATEMEN

2. Your circuit designer has invented a very low-power 8-bit ADC, which only consumes half of the energy per conversion. We call this type of ADC as `low_power_SAR` ADC. You decided to model a design with thie new `low_power_SAR` ADC unit integrated. Please perform the following updates and fill in the table below.


 - Update the `designs/PIM_large/arch/components/A2D_conversion_system.yaml` approriately to replace the old `SAR` ADC with the new `low_power_SAR` ADC.
 
 - Update the energy tables at `PIM_estimation_tables/32nm_data/data/ADC.csv` for the 9-bit `low_power_SAR` ADC used in this design.

*Hint: mac compute energy should not be a large number, e.g., >100. If so, you probably restarted/recreated the docker container and therefore erased the PIM plug-in path added by the 
accelergyTables command in the readme*. Please rerun:

```
accelergyTables -r /home/workspace/<your-lab4-folder-name>/PIM_estimation_tables
```

In [69]:
%%bash
cd designs/PIM_large/
timeloop-mapper arch/*.yaml arch/components/*.yaml mapper/* constraints/* ../../layer_shapes/tiny_layer.yaml

  _______                __                
 /_  __(_)___ ___  ___  / /___  ____  ____ 
  / / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
 / / / / / / / / /  __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/ 
                                  /_/      

Problem configuration complete.
execute:/usr/local/bin/accelergy arch/system_PIM.yaml arch/components/A2D_conversion_system.yaml arch/components/ADC_SimpleMulticast.yaml arch/components/D2A_conversion_system.yaml arch/components/DAC_SimpleMulticast.yaml arch/components/digital_accumulation_system.yaml arch/components/memcell_compute.yaml arch/components/storage.yaml mapper/mapper.yaml constraints/constraints.yaml ../../layer_shapes/tiny_layer.yaml --oprefix timeloop-mapper. -o ./ > timeloop-mapper.accelergy.log 2>&1
Generate Accelergy ERT (energy reference table) to replace internal energy model.
Generate Accelergy ART (area reference table) to replace internal area model.
Architecture configuration complete.
Using threads = 

[  0] Utilization = 1.00 | pJ/MACC =    3.626 | L3[IO] Q5 - L2[IO] P5 - L1[] N2 M16Y C18X S3X R3X - L0[W] Q1 
[  5] Utilization = 1.00 | pJ/MACC =    3.626 | L3[IO] P5 - L2[IO] Q1 - L1[] Q5 N2 M16Y C18X S3X R3X - L0[W] Q1 
[  1] Utilization = 1.00 | pJ/MACC =    3.626 | L3[IO] P5 - L2[IO] Q5 - L1[] N2 M16Y C18X S3X R3X - L0[W] Q1 
[  2] Utilization = 1.00 | pJ/MACC =    3.626 | L3[IO] P5 - L2[IO] Q5 N2 - L1[] Q1 M16Y C18X S3X R3X - L0[W] Q1 
[  6] Utilization = 1.00 | pJ/MACC =    3.626 | L3[IO] Q5 - L2[IO] Q1 - L1[] N2 M16Y C18X S3X R3X - L0[W] P5 
[  4] Utilization = 1.00 | pJ/MACC =    3.626 | L3[IO] Q5 - L2[IO] Q1 - L1[] Q1 M16Y C18X S3X R3X - L0[W] P5 N2 
[  7] Utilization = 1.00 | pJ/MACC =    3.626 | L3[IO] P5 - L2[IO] Q1 - L1[] Q1 M16Y C18X S3X R3X - L0[W] Q5 N2 
[  3] Utilization = 1.00 | pJ/MACC =    3.626 | L3[IO] N2 P5 - L2[IO] Q5 - L1[] Q1 M16Y C18X S3X R3X - L0[W] Q1 
[  0] STATEMENT: 800 suboptimal mappings found since the last upgrade, terminating search.
[  4] STATEMEN

In [70]:
# the Question 4.3.2 chart
# 3.760
print('\n== Static Hardware Properties ==')
d = {'scratchpad access energy': ['0.00 pJ'],         # fill in your answer here
     '  mac compute energy': ['33177.60 pJ'],      # fill in your answer here
     '  D2A_NoC average energy': ['3110.40 pJ'],  # fill in your answer here
     '  A2D_NoC average energy': ['93593.60 pJ'],  # fill in your answer here
    }
df = pd.DataFrame(data=d)
print(df.to_string(index=False, justify='center'))

print('\n== Runtime Stats ==')
d = {'total cycles running tiny_layer':[50],        # fill in your answer here
     '  total energy running tiny_layer':['0.47 uJ'] # fill in your answer here
    }
df = pd.DataFrame(data=d)
print(df.to_string(index=False, justify='center'))


== Static Hardware Properties ==
scratchpad access energy   mac compute energy   D2A_NoC average energy   A2D_NoC average energy
        0.00 pJ              33177.60 pJ             3110.40 pJ              93593.60 pJ       

== Runtime Stats ==
 total cycles running tiny_layer   total energy running tiny_layer
               50                             0.47 uJ             
