Dongjoon Park, 박동준 dopark@seas.upenn.edu

Implementation of Computation Group University of Pennsylvania





How many of you have heard of "FPGAs"?



- FPGA: Field-Programmable Gate Arrays
- Compare with other hardware platforms...



- + Good at computationally intensive task
- + Flexible; can be reprogramed after manufacturing



- FPGA: Field-Programmable Gate Arrays
- Compare with other hardware platforms...



+ Flexible; can be reprogramed area.



- Problem
  - FPGA <u>compilation</u> takes forever

The process of translating the user code (C, C++, Verilog) in a form that hardware can interpret and execute



- CPU, GPU (Software): milliseconds, seconds, minutes
- FPGA (Hardware): minutes, hours, days



- Problem
  - FPGA <u>compilation</u> takes forever
  - You have a design... Wait for an hour to test it on FPGA
  - ... and you discover a small change you need to make
  - Now, you must wait another hour to test the modified design?





- Idea: Fast separate compilations on FPGA
  - Divide-and-conquer strategy!



Vendor tool(from AMD, Intel)'s slow monolithic compilation



Fast separate compilations in parallel



- Idea: Fast separate compilations on FPGA
  - Divide-and-conquer strategy!
  - Incremental Refinement in SW
    - We never come up with the final design at one shot
    - Start from something that's barely functional... then incrementally add functionality
    - Not possible in FPGA because each design iteration takes so long
  - With our strategy, SW-like Incremental Refinement on FPGA is possible! → Better design productivity!





- Broader Impact? (in high-level)
  - e.g. AI Chips

- Hardware development is time-consuming...
- With our modular design methodology, it can be accelerated!
- → Better cell phone, laptop, ChatGPT, everything!

|                                   | Tensor Processing Unit products <sup>[13][14][15]</sup> |            |            |                           |                       |         |         |
|-----------------------------------|---------------------------------------------------------|------------|------------|---------------------------|-----------------------|---------|---------|
|                                   | TPUv1                                                   | TPUv2      | TPUv3      | TPUv4 <sup>[14][16]</sup> | TPUv5 <sup>[17]</sup> | Edge v1 |         |
| Date introduced                   | 2016                                                    | 2017       | 2018       | 2021                      | 2023                  | 2018    |         |
| Process node                      | 28 nm                                                   | 16 nm      | 16 nm      | 7 nm                      | Unstated              | (ind c  | of slow |
| Die size (mm²)                    | 331                                                     | < 625      | < 700      | < 400                     | Unstated              |         |         |
| On-chip memory (MiB)              | 28                                                      | 32         | 32         | 32                        | 48                    |         |         |
| Clock speed (MHz)                 | 700                                                     | 700        | 940        | 1050                      | Unstated              |         |         |
| Memory                            | 8 GIB DDR3                                              | 16 GiB HBM | 32 GIB HBM | 32 GIB HBM                | 16 GB HBM             |         |         |
| Memory bandwidth                  | 34 GB/s                                                 | 600 GB/s   | 900 GB/s   | 1200 GB/s                 | 819 GB/s              |         |         |
| TDP (W)                           | 75                                                      | 280        | 220        | 170                       | Not Listed            | 2       |         |
| TOPS (Tera Operations Per Second) | 23                                                      | 45         | 123        | 275                       | 393                   | 4       |         |
| TOPS/W                            | 0.31                                                    | 0.16       | 0.56       | 1.62                      | Not Listed            | 2       |         |

<Google TPU products<sup>[1]</sup>>





