# ANG LI

Email: angl (at) princeton (dot) edu
Website: https://angl-dev.github.io

### **RESEARCH INTERESTS**

I am interested in all aspects of computer architecture and digital VLSI design, especially heterogeneous and reconfigurable systems for both high-performance and low-power applications. I enjoy building chips to validate and evaluate my ideas with high fidelity. I am also an advocator of open-source hardware/research as they increase research credibility/reproducibility and encourage community-wide collaboration.

#### **EDUCATION**

Princeton University, Princeton, NJ, USA

Ph.D. Candidate in Electrical and Computer Engineering May 2023 (Expected)

Advisor: Prof. David Wentzlaff

Princeton University, Princeton, NJ, USA

M.A. in Electrical Engineering Jun. 2018

Advisor: Prof. David Wentzlaff

Tsinghua University, Beijing, China

B.A. in Electrical Engineering Jun. 2016

Minor: Economics

Georgia Institute of Technology, Atlanta, GA, USA

Exchange Student, Department of Electrical and Computer Engineering Aug. – Dec. 2013

### PUBLICATIONS AND PATENTS

[CICC'23] (To appear) Ting-Jung Chang, Ang Li (Equal Contribution), Fei Gao, Tuan Ta, Georgios Tziantzioulis, Yanghui Ou, Moyang Wang, Jinzheng Tu, Kaifeng Xu, Paul J. Jackson, August Ning, Grigory Chirkov, Marcelo Orenes-Vera, Shady Agwa, Xiaoyu Yan, Eric Tang, Jonathan Balkind, Christopher Batten, and David Wentzlaff, "CIFER: A 12nm, 16mm², 22-Core SoC with a 1541 LUT6/mm², 1.92 MOPS/LUT, Fully Synthesizable, Cache-Coherent, Embedded FPGA", 2023 IEEE Custom Integrated Circuits Conference (CICC), April. 2023

[CICC'23] (To appear) Fei Gao, Ting-Jung Chang, Ang Li, Marcelo Orenes-Vera, Davide Giri, Paul Jackson, August Ning, Georgios Tziantzioulis, Joseph Zuckerman, Jinzheng Tu, Kaifeng Xu, Grigory Chirkov, Gabriele Tombesi, Jonathan Balkind, Margaret Martonosi, Luca Carloni, and David Wentzlaff, "DECADES: A 67mm², 1.46TOPS, 55 Giga Cache-Coherent 64-bit RISC-V Instructions per second, Heterogeneous Manycore SoC with 109 Tiles including Accelerators, Intelligent Storage, and eFPGA in 12nm FinFET", 2023 IEEE Custom Integrated Circuits Conference (CICC), April. 2023

[HPCA'23] (To appear) Ang Li, August Ning, and David Wentzlaff, "Duet: Creating Harmony between Processors and Embedded FPGAs", 29th IEEE International Symposium on High-Performance Computer Architecture, Feb. 2023

[FPGA'21] Ang Li, and David Wentzlaff, "PRGA: An Open-Source FPGA Research and Prototyping Framework", 29th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2021

[FPL'20] Ang Li, Ting-Jung Chang, and David Wentzlaff, "Automated Design of FPGAs Facilitated by Cycle-Free Routing", 30th International Conference on Field-Programmable Logic and Applications, Aug./Sep. 2020

[IEEE Micro] Jonathan Balkind, Ting-Jung Chang, Paul J. Jackson, Georgios Tziantzioulis, **Ang Li**, Fei Gao, Alexey Lavrov, Grigory Chirkov, Jinzheng Tu, Mohammad Shahrad, and David Wentzlaff, "OpenPiton at 5: A Nexus for Open and Agile Hardware Design", IEEE Micro Vol. 40, No. 1, Jul./Aug. 2020

[ASPLOS'20] Jonathan Balkind, Katie Lim, Michael Schaffner, Fei Gao, Grigory Chirkov, Ang Li, Alexey Lavrov, Tri M. Nguyen, Yaosheng Fu, Florian Zaruba, Kunal Gulati, Luca Benini, and David Wentzlaff, "BYOC: A "Bring Your Own Core" Framework for Heterogeneous-ISA Research", 25th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2020

[ISLPED'15] Shuangchen Li, Ang Li, Yuan Zhe, Yongpan Liu, Peng Li, Guangyu Sun, Yu Wang, Huazhong Yang, and Yuan Xie, "Leveraging emerging nonvolatile memory in high-level synthesis with loop transformations", International Symposium on Low Port Electronics and Design, Jul. 2015

[ASPDAC'15] Shuangchen Li, Ang Li, Yongpan Liu, Yuan Xie, and Huazhong Yang, "Nonvolatile memory allocation and hierarchy optimization for high-level synthesis", 20th Asia and South Pacific Design Automation Conference (ASPDAC'15), Jan. 2015

[Patent] Xiang Xie, Lifei Ren, Ang Li, Yanjun Han, Guolin Li, Jun Hu, Zhong Lv, Wei Song, Yi Zheng, and Zihua Wang, "A Touch Interacting System and Method Based on Adaptive Layered Structured Light", Chinese National Invention Patent, No. 2013103145347, Jul. 2013

#### POSTERS AND WORKSHOPS

[FPGA'20] Ang Li, and David Wentzlaff, "Cycle-Free FPGA Routing Graphs", 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2020

[OSDA'19] Ang Li, and David Wentzlaff, "PRGA: An Open-source Framework for Building and Using Custom FPGAs", 1st Workshop on Open-Source Design Automation (OSDA), Mar. 2019

[WOSET'18] Jonathan Balkind, Alexey Lavrov, Michael McKeown, Yaosheng Fu, Tri Nguyen, Mohammad Shahrad, Ang Li, Katie Lim, Yanqi Zhou, Ting-Jung Chang, Paul Jackson, Adi Fuchs, Samuel Payne, Xiaohua Liang, Matthew Matl, and David Wentzlaff, "OpenPiton: An Emerging Standard for Open-Source EDA Tool Development", Workshop on Open-Source EDA Technology, Nov. 2018

#### TALKS AND PRESENTATIONS

| "Efficient, Programmable, and Manufacturable Hardware: The Case for Synthesizable FPGAs | "   |
|-----------------------------------------------------------------------------------------|-----|
| Invited talk at the Intel/VMware Crossroads 3D-FPGA Academic Research Center, Virtual   | Nov |

Invited talk at the Intel/VMware Crossroads 3D-FPGA Academic Research Center, Virtual Nov. 2022
Invited talk at University of California, Santa Barbara, Virtual Dec. 2022

"PRGA: An Open-Source FPGA Research and Prototyping Framework"

The 29th ACM/SIGDA Int'l Symposium on Field-Programmable Gate Arrays (FPGA'21), Virtual
The 1st Workshop on Open-Source Design Automation (OSDA), Florence, Italy

Mar. 2019

"Automated Design of FPGAs Facilitated by Cycle-Free Routing"

The 30th International Conference on Field-Programmable Logic and Applications (FPL'20), Virtual Aug. 2020

## PROJECT EXPERIENCE

**Duet:** Harmonious CPU-FPGA Integration for Fine-Grained Acceleration Mar. 2021 – Present Research Assistant, Princeton Parallel Group, Princeton University

Hardware acceleration based on embedded FPGAs balances flexibility and performance/efficiency, yet the conventional *coarse-grained acceleration* paradigm that offloads algorithms in their entirety is ill-suited for dynamic and/or irregular applications. Duet promotes eFPGAs to be equal peers with many-

core processors in a hardware-coherent cache system. By innovating the interface between the on-chip network and the eFPGAs, Duet enables *fine-grained*, *collaborative execution* of the processors and the eFPGA-emulated accelerators. A paper on Duet has been accepted to HPCA'23. The RTL model of Duet is open-source and available at <a href="https://github.com/PrincetonUniversity/Duet">https://github.com/PrincetonUniversity/Duet</a>. A Gem5-based simulator for Duet is open-source and available at <a href="https://github.com/angl-dev/gem5-duet">https://github.com/angl-dev/gem5-duet</a>.

### **PRGA:** Princeton Reconfigurable Gate Array

Oct. 2017 - Present

Research Assistant, Princeton Parallel Group, Princeton University

A silicon-proven, open-source project for generating customized, synthesizable FPGAs with bespoke, RTL-to-bitstream CAD toolchain [FPGA'21]. I also proposed the cycle-free FPGA routing graph which enables constraint-driven, hierarchical optimization using off-the-shelf digital EDA tools [FPL'20]. PRGA is used in three chip tape-outs (details below). PRGA is open-source and available at <a href="https://github.com/PrincetonUniversity/prga">https://github.com/PrincetonUniversity/prga</a>.

### CIFER: Hetero-Granular Architecture Prototype Chip Tape-out

Nov. 2019 - Present

Research Assistant, Princeton Parallel Group, Princeton University; in collaboration with Computer System Laboratory, Cornell University

A heterogeneous, cache-coherent SoC integrating OS-capable processors, MIMD tiny-core clusters, and eFPGA fabrics, covering both ends of the parallelization-specialization spectrum via collaborative execution. Prototype chip is taped out in 12nm FinFET and tested in lab. A paper on CIFER has been accepted to CICC'23. Advised by Prof. David Wentzlaff.

## **DECADES:** Tiled Heterogeneous Architecture Prototype Chip Tape-out

Dec. 2020 - Present

Research Assistant, Princeton Parallel Group, Princeton University; in collaboration with Martonosi Research Group, Princeton University, and System-Level Design Group, Columbia University

A heterogeneous, cache-coherent SoC with OS-capable processors, specialized accelerators, intelligent storage units, bit-serial SIMD cores, and eFPGAs. Prototype chip is taped out in 12nm FinFET and tested in lab. A paper on DECADES has been accepted to CICC'23. Advised by Prof. David Wentzlaff.

### ORDER: An SoC Built with Open-Source Hardware, PDK, & EDA

Feb. - May. 2022

Research Assistant, Princeton Parallel Group, Princeton University

A RV32I + 512-LUT4/FF SoC designed with open-source hardware frameworks (including PRGA), synthesized using an open-source EDA flow (OpenRoad) and an open-source PDK (SKY130). ORDER is selected for the OpenMPW-6 free shuttle and is in fabrication. ORDER is open-source and available at <a href="https://github.com/angl-dev/caravel\_mpw5\_prga">https://github.com/angl-dev/caravel\_mpw5\_prga</a> (including a tapeout-ready GDS). Advised by Prof. David Wentzlaff.

### Near-Peak-Bandwidth, All-to-All, Many-FPGA Communication over UDP/IP

Jun. – Sep. 2019

Research Intern, Microsoft Research, WA, USA

Proposed and implemented an all-to-all, many-FPGA communication mechanism for a many-FPGA system over a mostly private, stable network, achieving near-peak bandwidth (~98%) of a full duplex network switch. By synchronizing FPGA clocks, characterizing clock error, and tolerating *clock drifting* and PLL variance, the proposed mechanism allows the FPGAs to run in lockstep epochs and saturate network links in a *time-division multiplexing* manner. Advised by Dr. Michael Papamichael.

## Hardware Transactional Memory on OpenPiton

Jan. - Jun. 2018

Research Assistant, Princeton Parallel Group, Princeton University

Implemented an in-cache, hardware transactional memory (HTM) on OpenPiton. The HTM employs lazy version management and lazy conflict detection. It uses each processor's private cache to buffer the

read-/write-set of a transaction and commits to the last-level cache if a transaction is validated. Advised by Prof. David Wentzlaff.

#### Real-World OCR with Gated-RNN and MD-LSTM

Aug. 2015 - Feb. 2016

Research Intern, Sensetime Co., Ltd., Beijing, China

Implemented Gated-RNN and MD-LSTM on the Caffe deep leaning framework.

### High-Level Synthesis with Non-Volatile Memory

Apr. 2013 – Jan. 2015

Research Assistant, Nanoscale Integrated Circuits and System Lab, Tsinghua University

Proposed an algorithm to optimize loop transformation for NVM-SRAM hybrid on-chip buffer allocation. Advised by Prof. Yongpan Liu.

#### Hardware Model Research on HICAMP

Jun. 2014 – Sep. 2014

Research Intern, Computer Systems Laboratory, Stanford University

Hierarchical Immutable Content-Addressable Memory Processor (HICAMP) is an architecture that organizes the memory in a tree-like structure with content-addressability and data deduplication. I proposed and implemented a fast *compare* instruction exploiting the content-addressable tree. I improved the *iteration register file* (similar to a private cache) with intra- and inter-processor coherence. I proposed and implemented an out-of-order commit, transaction manager for the hardware-supported, software-implemented transactional memory based on HICAMP. Advised by Prof. David Cheriton.

### Interactive Projection System Based on Structured Light

Nov. 2012 – Jul. 2013

Research Assistant, Nanoscale Integrated Circuits and System Lab, Tsinghua University

Proposed an algorithm to recognize users' interactions (multi-finger tapping, dragging, pinching, etc.) using one projector and one camera, without depth sensors. Advised by Prof. Xiang Xie.

#### **TEACHING AND MENTORING**

ECE 462/562 (also COS 462) – Design of Very Large-Scale Integrated (VLSI) Systems *Teaching Assistant* 

Fall 2022

Co-designed the final project on creating a minimal-area, DRC/LVS-clean, 4x4 SRAM block.

ECE 475/575 (also COS 475) – Computer Architecture *Teaching Assistant* 

Fall 2018

Upgraded the labs from implementing the PARC ISA to the RISC-V (RV32IM) ISA. Materials that I developed are still used in the course today.

Google Summer of Code

Summer 2020

FOSSi Mentor

Ansh Puvvada, Automating Hardware and Bitstream Verification for PRGA with cocotb

Co-Advisory of Undergraduate Research

2019 – Present

Jaebyeok Yoon, Architecture and Physical Design of Specialized FPGAs

Marlon Escobar, CPU-FPGA Integration

Kevin Liu, Creating Multimode Logic Elements for a Reconfigurable Gate Array

| – Curriculum Vitae – Ang Li –                                            | Page 5    |
|--------------------------------------------------------------------------|-----------|
| AWARDS AND HONORS                                                        |           |
| First Prize Scholarship for Excellent Student (10 out of 300+)           | Oct. 2013 |
| Top prize in 7th "Challenge Cup" Beijing Undergraduates' Extracurricular |           |
| Technology Innovation Competition (40 out of 500+)                       | Jul. 2013 |
| First Prize Scholarship for Excellent Student (10 out of 300+)           | Oct. 2012 |

### **REFERENCES**

Contact info available upon request.

Dr. Michael Papamichael, Microsoft Research

Prof. David Wentzlaff, Department of Electrical and Computer Engineering, Princeton University
 Prof. Christopher Batten, School of Electrical and Computer Engineering, Cornell University
 Prof. Vaughn Betz, Department of Electrical and Computer Engineering, University of Toronto
 Prof. Michael Taylor, Department of Electrical & Computer Engineering, University of Washington