# Yi-Hsi (Eric) Lu

2401 Longview Street Suite 306 | Austin, TX 78705 512-202-0064 | donkilu@utexas.edu

#### **OBJECTIVE**

Graduate student interested in 2017 spring co-op or 2017 graduate fulltime position in areas such as computer architecture, hardware verification, performance evaluation or software developement.

#### **EDUCATION**

## The University of Texas at Austin, Austin, TX

# **Master of Science in Electrical and Computer Engineering**

- Track: Computer Architecture and Embedded Processors
- Related Coursework: Computer Architecture, Embedded System Design/Modelling, High Speed Computer Arithmetic, Computer Graphics, Compilers, Locality & Parallelism, Computer Performance Evaluation & Benchmark
- Current Coursework: Digital System Verification

# National Taiwan University(NTU), Taipei, Taiwan

## **Bachelor of Science in Electrical Engineering**

• Ranking: No.22 / 198

• Related Coursework: Algorithm, Data Structure, Digital System Design, IC Design Lab

# **PROFESSIONAL EXPERIENCE**

**NVIDIA Corp.,** Austin, Texas

Jan 2017 – Present

**Expected Grad:** May 2017

Software Engineer Intern

Maintained and optimized internal development tools built with C++ & python.

# UT Austin, Department of ECE, under Dr. Ahmed Tewfik, Austin, Texas

July 2016 - August 2016

Graduate Research Assistant

- Conducted paper survey about positioning technologies, designed 3 types of experiments, and tested 240 points in 8 buildings to evaluate mobile location service performances on campus.
- Presented the status quo of E911 and experimental results to UTPD and ITS department. Confirmed the reliability of safety apps. Delivered several suggestions to UT authority to enhance campus safety.

#### VIVOTEK Inc., New Taipei City, Taiwan

July 2013 - August 2013

Summer Intern

- Provided a JavaScript test platform for automatic product pressure testing.
- Re-designed the GUI interface. Documented a maintenance manual for the platform.

#### **Integrated Silicon Solution Inc.**, Hsinchu, Taiwan

July 2012 – August 2012

Verification Intern

- Assisted in the revising of DRAM testing program memtest86+ to support 10 new test cases and more detailed error reports.
- Conducted tests to evaluate new test cases' capability to locate defected bits on DRAMs. Filtered out 3 effective
  patterns to implement in post-silicon validation process.

#### **PROJECT**

#### **Parallel Transparency Rendering**

October 2016

- Parallelized transparency rendering function in CUDA. Implemented Dynamic Fragment Buffer algorithm proposed by NVIDIA. Further exploited locality by coarse-grained geometric decomposition.
- Our work was 35 times faster than serialized CPU implementation when rendering 100k circles, 440 times faster when rendering 10k snowflakes.

June 2014

**GPA:** 3.82/4.00

**GPA**: 4.05/4.30

## **Optimization of Matrix-Matrix Multiply**

- April 2016
- Optimized matrix-matrix multiply function by data-copying, cache-blocking, register tiling and vectorization with Intel AVX extensions, all written in C.
- Achieved 10.6 GFLOPS with 4096-sized matrices, 36 times faster than naïve multiplication. Further achieved 26.5 GFLOPS by modifying cleanup codes.

## **Evaluation of Re-Reference Interval Prediction Policy (RRIP)**

December 2015

- Evaluated the performance of cache replacement policy RRIP with CPU simulator gem5 and 5 benchmarks from SPECCPU2006 suite.
- Demonstrated RRIP's slight miss rate improvement over LRU (1.7%).

## **Comparisons between Digital Recurrence Divisions**

December 2015

- Implemented 32-bit restoring, non-restoring, radix-2 SRT, radix-4 SRT dividers in Verilog, used Design Vision with FreePDK45 to synthesize four dividers and compare their areas, speeds and power.
- Concluded SRT divider's higher energy consumption because of the extra registers for on-the-fly conversion.

# **Pipelined 32-bit MIPS Processor**

Spring 2013

- Built a 32-bit MIPS processor in RTL-level Verilog. Implemented 5-stage pipeline, L1 & L2 directly mapped caches, forwarding networks and a saturating counter branch predictor to support 30 common MIPS instructions.
- Synthesized and verified the design with Synopsys Design Vision and Cadence NCSim. Successfully passed post-synthesis simulation under clock frequency = 5 ns.

## **LEADERSHIP**

Publicist, Taiwanese Student Association at UT Austin Political Warfare Officer, Armor Training Command, R.O.C. Army Vice President, NTU Art Club May 2016 – Now August 2014 – July 2015 September 2012 – May 2013

# **SKILLS**

- Programming Language: Proficient in C, C++; Exposure to Verilog, x86 Assembly, CUDA, Java
- Web Development: Proficient in HTML, PHP, JavaScript, CSS; Exposure to mySQL
- Spoken Language: Mandarin Chinese, English