Skip to content

In this project a Tensor Processing Unit (TPU) is implemented on FPGA

Notifications You must be signed in to change notification settings

amrkhalid-star902/TPU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 

Repository files navigation

Hardware Accelerators and Tensor Processing Units (TPUs) in Modern Computing

The advancement of computing technology is facing challenges due to the diminishing impact of Moore's Law and the exponential growth of data. As a result, the computing needs in both client and server applications are surpassing the capabilities of general-purpose processors. To address this, specialized hardware co-processors have emerged as a solution, focusing on specific tasks. These hardware accelerators can perform complex computations significantly faster (10-100 times faster) and with lower power consumption. Given the increasing cost of power across the computing spectrum, hardware accelerators have become widely adopted.

In parallel, the creation of data is growing exponentially, leading to the need for algorithms that can extract valuable insights from this data. Neural networks, particularly in the form of embedded, mobile, and data center applications, have become prevalent in achieving this goal. However, neural networks heavily rely on matrix multiplication and convolution, which put a significant burden on general-purpose processors. This demand has paved the way for Tensor Processing Units (TPUs), which were first commercialized by Google for data center usage.

It's important to note that neural networks consist of two phases: training and predicting. While both phases are computationally intensive and the subject of cutting-edge research, this focus is primarily on the predicting phase. The majority of a neural network's lifespan involves making predictions rather than training.

Current TPUs aim to implement the aforementioned matrix multiplication, convolution, and common activation functions used in neural networks. This is typically achieved through a highly pipelined data path known as a systolic array, enabling an NxN matrix multiplication to be completed in just 2N cycles. Like most hardware accelerators, TPUs require efficient communication of large volumes of data between the accelerator and the host, which can introduce undesired overhead. TPUs also incorporate high on-chip memory bandwidth, as the systolic array necessitates substantial input and output bandwidth for sustained computation.

Google introduced the concept of TPUs in 2013 when their customers began using speech-to-text services extensively. Over a 15-month period, Google designed and fabricated their first-generation TPU, which yielded significant performance improvements in neural network inference. Google now utilizes TPUs as add-in cards in their own data centers.

This information provides an overview of the significance of hardware accelerators and TPUs in modern computing. It highlights the challenges faced by general-purpose processors, the increasing reliance on neural networks, and the role of TPUs in accelerating computations.

About

In this project a Tensor Processing Unit (TPU) is implemented on FPGA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published