# Evaluation of a Convolutional Neural Network Architecture
Course project for the class _IoT Based Smart Systems_, carried out by _Riccardo Maria Pesce_ during Academic Year _2021-2022_, under the kind supervision of Professor _Maurizio Palesi_.

## Introduction

### Motivation
With the latest scientific and technological advancements that has taken place in the past few years, AI techniques have been employed in different fields, with great success. 

While Deep Learning Models are still trained on the cloud (using state-of-the-art computing machines with specialized hardware such as _GPU_ or _TPU_), it is becoming always more common to perform inference on the edge, i.e. on the devices itself, so as to reduce latency and optimize the usage of bandwith, a relevant issue for constrained devices.

### Objective
The objective of this thesis is to analyze a _CNN_ (Convolutional Neural Network) architecture performance on a constrained hardware, seeing how the mapping will affect the performance in terms of throughput and energy consumption.
In particular, these performance achievements are obtained in the following ways:

* Through reducing data movements, since communication is more expensive than computation in terms of energy nowadays. We can reduce data movements by either reducing the number of times memory is accessed, employing for instance DRAM which are nearby the _PEs_ (Processing Elements), or else we can compress data to a smaller number bits to represent it, thus making data movements cheaper. From these observations, we notice how __memory is the main bottleneck__.

* Maximizing PEs parallelism.

### Mapping
Mapping defines the order of execution of the MAC operations. The ordering can either be _temporal_ when operations are mapped serially on the same PE (i.e. the temporal order of execution), or _spatial_ when operations are mapped to multiple PE to execute in parallel.

### Introduction to Timeloop and Accelergy
In order to correctly design a DNN accelerator we need to cater for the different DNN architectures, and for each one of them we have to find an optimal mapping of these workloads onto specific hardware architectures.
This is what Timeloop and Accelergy do, and in particular:
* Timeloop generates a characterization of the energetical efficiency for each workload, through a mapper which finds the optimal way to plan operations on a specified architecture. To do so, Timeloop uses a coincise and unified representation of those core elements which are generally found in DNN accelerators.
* Accelergy, on the basis of the above created characterization, provides a pretty good estimate of energy consumption.

## ResNet-50

### Rationale

In this thesis, we want to simulate