# Vision Transformer Accelerator ASIC for In-Ear Sleep Staging

by Tristan Robitaille

Supervisor: Professor Xilin Liu April 2024

# B.A.Sc. Thesis





## ESC499 Engineering Science Thesis

Vision Transformer Accelerator ASIC for In-Ear Sleep Staging

### Tristan Robitaille

 $Student\ number:\ 1006343397$ 

Email: tristan.robitaille@mail.utoronto.ca

Supervisor: Professor Xilin Liu Email: xilinliu@ece.utoronto.ca

April 12th, 2024

### Abstract

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim rutrum.

Keywords: Sleep staging, ASIC accelerator, vision transformer, computer architecture

## Acknowledgements

I would like to express my gratitude to my supervisor, Prof. Xilin Liu, for his guidance and support throughout the project. He has given me the freedom to explore new ideas and had provided me with the support and tools I needed.

I would also like to thank my father, Claude Robitaille, for letting me remotely use his workstation to train the model and run the accuracy study. He has also helped review the code for the functional simulation.

In addition, I owe much to the professors who have taught me the fundamentals of computer architecture at the University of Toronto - Profs. Jason Anderson, Natalie Enright-Jerger, Andreas Moshovos and Mark C. Jeffrey.

Throughout this project, I have made extensive use the Compute Canada cluster, which has provided me with the computational resources I needed to run the simulations and train the model. I would like to thank the staff at Compute Canada for their initiative. I am also appreciative of the tools provided by the Canadian Microelectronics Corporation, which have been instrumental in the hardware implementation of the accelerator.

I would also like to acknowledge the work of Professors Lisa Romkey and Alan Chong who organized this thesis project for us, ensuring a structured and productive environment.

Finally, I would like to thank my family and friends for their support and encouragement throughout this project. I am grateful for their patience and understanding during this time.

## Contents

| 1            | Intr | Introduction        |                                          |   |   |   |  | 1 |   |   |  |   |    |
|--------------|------|---------------------|------------------------------------------|---|---|---|--|---|---|---|--|---|----|
| 2            | Bac  | kgroui              | nd                                       | • | • |   |  |   |   |   |  |   | 2  |
| 3            | Hov  | w to D              | esign an AI Accelerator                  |   |   |   |  |   |   |   |  |   | 3  |
|              | 3.1  | Model               | Prototyping                              |   |   |   |  |   |   |   |  |   | 3  |
|              | 3.2  | Accele              | erator Functional Simulation             |   |   |   |  |   |   |   |  |   | 3  |
|              | 3.3  | Accele              | erator Hardware Implementation           |   |   |   |  |   |   |   |  |   | 4  |
| 4            | Des  | sign Ov             | verview                                  |   | • |   |  |   |   |   |  |   | 5  |
|              | 4.1  | Vision              | Transformer                              |   |   |   |  |   |   |   |  | • | 5  |
|              | 4.2  | Accele              | erator Architecture                      |   |   |   |  |   |   |   |  | • | 5  |
|              |      | 4.2.1               | Centralized vs. Distributed Architecture |   |   |   |  |   |   |   |  | • | 5  |
|              |      | 4.2.2               | Master Architecture                      |   |   |   |  |   |   |   |  | • | 5  |
|              |      | 4.2.3               | Data and Control Bus                     |   |   |   |  |   |   |   |  | • | 5  |
|              |      | 4.2.4               | Compute-in-Memory: Fixed-Point Accuracy  |   |   |   |  |   |   |   |  | • | 5  |
|              |      | 4.2.5               | Compute-in-Memory: Memory                |   |   |   |  |   |   |   |  | • | 5  |
|              |      | 4.2.6               | Compute-in-Memory: Compute Modules       |   |   |   |  |   |   |   |  |   | 5  |
|              | 4.3  | A Not               | e About Software-Hardware Co-Design      |   |   |   |  |   | • | • |  |   | 5  |
| 5            | Eva  | luatio              | n of Performance Metrics                 |   |   |   |  |   |   |   |  |   | 7  |
|              | 5.1  | Vision              | Transformer                              |   |   |   |  |   |   |   |  | • | 7  |
|              | 5.2  | Accele              | erator                                   | • | • |   |  |   |   |   |  |   | 7  |
| 6            | Fut  | ure W               | ork                                      |   |   |   |  |   |   |   |  |   | 8  |
| 7            | Cor  | nclusio             | <b>n</b>                                 |   | • | • |  |   |   |   |  |   | 9  |
| $\mathbf{A}$ | Coc  | Codebase Statistics |                                          |   |   |   |  |   |   |   |  |   |    |
| В            | Ref  | lection             | on Learnings and Experience Gained .     |   |   |   |  |   |   |   |  |   | 12 |

# List of Figures

## List of Tables

## List of Abbreviations

 ${\bf ASIC}\,$  Application-Specific Integrated Circuit

**CMOS** Complimentary Metal Oxide Semiconductor

 $\mathbf{HDL}$  Hardware Description Language

See [1]. I am making an Application-Specific Integrated Circuit (ASIC). It's small, low-power and fast. It's better than Google's.

#### 1 Introduction

## 2 Background

### 3 How to Design an AI Accelerator

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim rutrum.

#### 3.1 Model Prototyping

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim rutrum.

#### 3.2 Accelerator Functional Simulation

#### 3.3 Accelerator Hardware Implementation

### 4 Design Overview

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim rutrum.

#### 4.1 Vision Transformer

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim rutrum.

#### 4.2 Accelerator Architecture

- 4.2.1 Centralized vs. Distributed Architecture
- 4.2.2 Master Architecture
- 4.2.3 Data and Control Bus
- 4.2.4 Compute-in-Memory: Fixed-Point Accuracy
- 4.2.5 Compute-in-Memory: Memory
- 4.2.6 Compute-in-Memory: Compute Modules

#### 4.3 A Note About Software-Hardware Co-Design

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero,

nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim rutrum.

#### 5 Evaluation of Performance Metrics

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim rutrum.

#### 5.1 Vision Transformer

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim rutrum.

#### 5.2 Accelerator

### 6 Future Work

### 7 Conclusion

## References

[1] Xilin Liu and Andrew G Richardson. "Edge deep learning for neural implants: a case study of seizure detection and prediction". In: *Journal of Neural Engineering* 18.4 (2021), p. 046034.

## A Codebase Statistics

It may be interesting to the reader to appreciate the size of the codebase needed to develop a project of similar scale. The code for this project is available in my GitHub repository. The following table provides a breakdown of the number of lines of code in the project.

Table I: Line and file count per file type in the codebase.

| File type     | File count | Line count | Percent of total |
|---------------|------------|------------|------------------|
| Python        | 12         | 3000       | 33.7%            |
| SystemVerilog | 12         | 2500       | 30.4%            |
| C++           | 12         | 1250       | 18.9%            |
| TeX           | 12         | 670        | 8.2%             |
| Shell         | 12         | 300        | 4.3%             |
| Other         | 12         | 20         | 4.5%             |
| Total         | 60         | 13,000     | 100%             |

In addition, there have been 200 commits to the repository.

B Reflection on Learnings and Experience Gained

