# Hang Hu

(+41) 76-593-40-17 | hanghu@student.ethz.ch

## **EDUCATION**

ETH Zurich, CH

M.S in Computer Science

Sep. 2022 - present

• Relevant Coursework: Cloud Computing Architecture, Data Management Systems, Design of Parallel and High-Performance Computing, Principles of Distributed Computing, Advanced System Lab, Hardware Acceleration for Data Processing.

#### **Huazhong University of Science and Technology**

Wuhan, CN

B.E. in Computer Science and Technology (ACM class)

Sep. 2018 - Jun. 2022

- **Overall GPA**: 91.9 / 100
- Relevant Coursework: Computing Theory, Parallel and Sequential Data Structure and Algorithms, Parallel Programming Principle and Practice, Algorithmic Design and Analysis, Computer Architecture, Operating System, Database System.

## **INTERESTS**

My research interests mainly focus on systems and computer architecture, including parallel and distributed computing, memory architecture and any other tools that help improve the efficiency of data management and processing. I'm also open to broader topics, such as heterogeneous systems and systems for machine learning.

## RESEARCH EXPERIENCE

## Efficient Graph-based Vector Search via Hardware-Algorithm Co-design Zurich, CH

Advisor: Prof. Gustavo Alonso, ETH Zurich

Sep. 2023 - Jun. 2024

- Designed Falcon, a graph-based vector search (GVS) accelerator containing hardware building blocks for various GVS operators, with generalizability across various types of graphs and datasets.
- Collaborated to propose DST, an accelerator-optimized graph traversal algorithm designed to minimize GVS latency.
- Conducted an in-depth study of the performance limitations arising from intra- and inter-query parallelism on CPU and GPU platforms.
- Evaluated Falcon and DST prototyped on FPGA comprehensively, showcasing their performance and efficiency with up to 4.3× and 19.5× lower latency over CPU- and GPU-based GVS systems.

#### **Direction-Optimized Parallel BFS for Dynamic Graphs**

Atlanta, US

Advisor: Prof. Hyesoon Kim, Georgia Institute of Technology

Jul. 2021 – Feb. 2022

- Implemented the direction-optimized parallel BFS including the top-down, bottom-up, and hybrid approach utilizing the distributed memory model.
- Applied multifaceted optimization methods of the parallel BFS, including reconstruction of MPI communication, dynamic data structures and partitioning methods.
- Conduct a thorough evaluation and analysis of emerging hardware about the impact of hardware architecture, optimization direction, and the number of processors on BFS performance.

#### **Application of Graph Processing in ATPG**

Wuhan, CN

Advisor: Prof. Hai Jin, Huazhong University of Science and Technology

Sep. 2020 - Mar. 2021

- Self-learned graph processing and Automatic Test Pattern Generation (ATPG) algorithms including D, PODEM and FAN algorithms.
- Converted ATPG algorithms into parallel graph algorithms in a shared-memory graph processing framework, Ligra.
- Collaborated with teammates to build an efficient ATPG tool and fault simulator utilizing the parallelized FAN algorithm.
- Achieved higher efficiency and less memory usage while maintaining over 99% fault coverage compared to ATALANTA, a similar ATPG and simulation tool developed by Virginia Tech.

## **PROJECTS**

#### Enhancing the DaCe Programming Framework with OpenMP Tasking

Zurich, CH

Advisor: Prof. Torsten Hoefler, ETH Zurich

Oct. 2022 - Jan. 2023

- Designed a heuristic approach for the selection between OpenMP for-loop and tasking model for parallelism, as well as a cut-off mechanism, within the Data-Centric programming framework (DaCe) to optimize performance.
- Introduced and investigated a new metric, imbalance degree, to quantify the variation of computational workloads, which is utilized in the heuristic rules.
- Achieved a speedup from  $1.5 \times$  to  $2 \times$  of our implementation for the tasking backend of CPU compared with the old for-loop backend.

#### **Computer Organization Practice**

Wuhan, CN

Advisor: Prof. Zhihu Tan, Huazhong University of Science and Technology

Oct. 2020 - Nov. 2020

- Implemented a MIPS storage system including a 4-way set-associative cache.
- Implemented a 5-stage pipelined MIPS CPU and a single-cycle MIPS CPU with Logisim that supports three-level nested interrupts, both of which can run a bubble sort program correctly.

# SCHOLARSHIP & AWARDS

| 2022 | <b>Outstanding Graduates</b>    | Huazhong University of Science and Technology |
|------|---------------------------------|-----------------------------------------------|
| 2021 | Sangfor Scholarship             | Sangfor Technologies                          |
| 2019 | National Scholarship            | Chinese Educational Bureau                    |
| 2019 | <b>University Merit Student</b> | Huazhong University of Science and Technology |