# Your name: 

This notebook implements timeloop/accelergy-based energy estimation for the neural network model you trained. This part has to be run with the docker we provide, and does not require GPU support. 

One strategy to reduce the profiling time is to design a model with repeated layers since layers with the same architecture only need one time of profiling.
The profiler will also automatically save the information of profiled layers to a .json file specifiled by `profiled_lib_dir`, so that next time the same layer is profiled, the results can be obtained immediately. 


### 1. Load the model

In [16]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# Change this model class to the architecture you used
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

In [17]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 5)
        self.conv1_bn = nn.BatchNorm2d(12)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 5)
        self.conv2_bn = nn.BatchNorm2d(32)
        self.conv3 = nn.Conv2d(32, 64, 3)
        self.conv3_bn = nn.BatchNorm2d(64)
        self.fc1 = nn.Linear(64 * 8 * 8, 120)
        self.fc1_bn = nn.BatchNorm1d(120)        
        self.dropout1 = nn.Dropout(p=0.5)        
        self.fc2 = nn.Linear(120, 84)
        self.fc2_bn = nn.BatchNorm1d(84)
        self.dropout2 = nn.Dropout(p=0.5)        
        self.fc3 = nn.Linear(84, 84)
        self.fc3_bn = nn.BatchNorm1d(84)
        self.dropout3 = nn.Dropout(p=0.5)        
        self.fc4 = nn.Linear(84, 10)
        self.fc4_bn = nn.BatchNorm1d(10)
        self.dropout4 = nn.Dropout(p=0.5)        

        
    def forward(self, x):
        x = self.pool(F.relu((self.conv1(x))))
        x = F.relu((self.conv2(x)))
        x = F.relu((self.conv3(x)))
        x = x.view(-1, 64 * 8 * 8)
        #x = self.dropout1(x)
        x = F.relu((self.fc1(x)))
        x = F.relu((self.fc2(x)))
        x = (self.fc3(x))
        x = ((self.fc4(x)))
        return x


net = Net()

criterion = nn.CrossEntropyLoss()
#optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
#optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
#optimizer = optim.Adam([var1, var2], lr=0.0001)

### 2. Run the Profiler to estimate the peak activation size

In [18]:
from profiler import count_activation_size
peak_activation_size = count_activation_size(
    net=net,
    input_size=(1, 3, 32, 32),
)

print(f"Peak Activation Sizes: {peak_activation_size} Byte")

Peak Activation Sizes: 62464.0 Byte


### 3. Run the Profiler for Timeloop/Accelergy

In [19]:
from profiler import Profiler
from datetime import date

today = date.today()
sub_dir = "network-" + today.strftime("%b-%d-%Y")

profiler = Profiler(
    top_dir='workloads',
    sub_dir=sub_dir,
    timeloop_dir='simple_weight_stationary',
    model=net,
    input_size=(3, 32, 32),
    batch_size=1,
    convert_fc=True,
    exception_module_names=[],
    profiled_lib_dir=f"profiled_lib.json"
)

layer_wise, overall = profiler.profile()

for layer_id, info in layer_wise.items():
    print(f"Name: {info['name']} \t Energy: {info['energy']:.2f} \t Cycle: {info['cycle']} \t Number of same architecture layers: {info['num']}")
    
print(f"\nTotal Energy: {overall['total_energy']/1e9:.8f} mj \nTotal Cycles: {overall['total_cycle']/1e6:.8f} Million")

print(f"MACs: {overall['macs']}\nNum of Parameters: {overall['num_params']} \nPeak Activation Size: {overall['activation_size']} Byte")




converting nn.Conv2d and nn.Linear in network-Feb-27-2022 model ...
workload file --> /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer1.yaml
workload file --> /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer2.yaml
workload file --> /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer3.yaml
workload file --> /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer4.yaml
workload file --> /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer5.yaml
workload file --> /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer6.yaml
workload file --> /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer7.yaml
conversion complete!

running timeloop to get energy and latency...


100%|██████████| 4/4 [00:47<00:00, 11.87s/it]

timeloop running finished!
Name: /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer1 	 Energy: 12423679.32 	 Cycle: 19600 	 Number of same architecture layers: 1
Name: /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer2 	 Energy: 33959447.72 	 Cycle: 5000 	 Number of same architecture layers: 1
Name: /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer3 	 Energy: 16209900.67 	 Cycle: 4608 	 Number of same architecture layers: 1
Name: /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer4 	 Energy: 69751849.65 	 Cycle: 2048 	 Number of same architecture layers: 1
Name: /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer4 	 Energy: 1478502.96 	 Cycle: 60 	 Number of same architecture layers: 1
Name: /home/workspace/lab2/workloads/network-Feb-27-2022/network-Feb-27-2022_layer6 	 Energy: 1037421.21 	 Cycle: 36 	 Number of same architecture layers: 1
Name: /home/worksp


