## Charles Lo

Contact

E-mail: charles@charleslo.net

Information Google Scholar: https://scholar.google.ca/citations?user=sxzUvqoAAAAJ

Linkedin: https://www.linkedin.com/in/charles-lo/

**EDUCATION** 

University of Toronto

Doctor of Philosophy in Computer Engineering

2013 - March 2020

Thesis: Improving Hardware Design Reuse through Design-Space Exploration

Master of Applied Science in Computer Engineering

2010 - 2012

Thesis: A High-Performance Architecture for Training Viola-Jones Object Detectors

Bachelor of Applied Science in Engineering Science

2005 - 2010

Major in Electrical Engineering

EXPERIENCE

### Senior Digital Hardware Design Engineer

May 2020-Present

Innatera Nanosystems

• Designed digital hardware for neuromorphic systems.

#### Hardware Designer

Apr-May 2020

ArchES Computing Systems

• Developed hardware and software to for high-speed FPGA-CPU communication interfaces.

Ph.D. Research 2013-2020

University of Toronto

- Developed probabilistic models based on Gaussian processes to improve Bayesian optimization of hardware designs. Proposed methods integrate domain-specific design information to dramatically speed up design-space exploration of IP parameters.
- Implemented inference and hyperparameter tuning of Gaussian processes in Python 3 using C-based extensions and NumPy to run on a CPU-based computer cluster.
- Proposed constraint-based system design framework in Python to collect and enumerate possible designs when composing IP from multiple vendors. The framework was effective for rapidly constraining design spaces for further exploration.
- Assisted other graduate students with technical challenges and defining the scope of their work.

Teaching Assistant

2013-2018

University of Toronto

- Mentored groups of 2-4 students weekly while they developed FPGA design projects over 3 and 8-week periods. Designs targeted Intel Cyclone or Xilinx Artix FPGAs and nine such teaching assistantships were held between 2013-2018.
- Created course assignments and materials for Xilinx FPGAs covering Vivado HLS, the MicroBlaze Ethernet subsystem and FPGA primitive inference.
- Developed and documented a reference design for a camera module that was used in student projects.
- Designed shell platform and reference designs for a convolutional neural network assignment.
  The platform supported partial reconfiguration of HLS sub-systems to enable sharing of FP-GAs by multiple students and provided on-chip debug using Xilinx integrated logic analyzers (ILAs), external DDR4 memory support and a PCIe interface.

# Graduate Course Project, Advanced Machine Learning

2014

Image Labelling using Feature Learning and Boltzmann Machine-Augmented CRFs

- Trained a fully-connected neural network to learn features of image segments (superpixels).
- Experimented with fully-connected Conditional Random Fields and Restricted Boltzmann Machines to smooth labelling over a scene.

Heterogeneous Stream Computing in SAVI

- Proposed a method of mapping streaming task graphs on to virtualized FPGA/CPU resources in a cloud environment inspired by Software Defined Networking.
- Preliminary prototype designed with x86 virtual machines, virtualized FPGA kernels and OpenFlow.

# ${\bf Engineering\ Intern\ \textbf{-}\ Xilinx\ Research\ Labs},$

2012-2013

Xilinx Inc.

- Worked with a small research team to define and develop methods of integrating FPGA accelerators in the heterogeneous OpenCL programming framework. The methods become part of the Xilinx SDAccel product.
- Designed accelerated Sobel edge detection system using embedded C and custom hardware on an ARM-based SoC FPGA platform.
- Supervised a junior intern and guided their work on improving the SDAccel user interface.

#### M.A.Sc. Research

2010-2012

University of Toronto

- Developed a high-performance architecture for accelerating training of Viola-Jones object detectors targeting a PCIe-connected Xilinx Virtex-6 FPGA that provided 14-fold speed-up over a multi-threaded, CPU-based OpenCV implementation.
- Designed a systolic array architecture to provide high throughput and take advantage of parallelism during computation.
- Proposed and implemented pre-processing of input elements in off-chip memory to ensure high utilization of processing engines.
- Scaled and floorplanned the array up to 30 processing elements (72% LUT utilization) to meet a 200MHz clock frequency target.

# Graduate Course Project, Introduction to Machine Learning

2010

Nonlinear Dimensionality Reduction for Music Feature Extraction

- Experimented with PCA, Autoencoders, LLE and t-SNE for compressing music feature representations.
- Found best performance in classification and cluster performance using t-SNE.

### Undergraduate Research

2009-2010

# University of Toronto

- Developed a high-performance multi-FPGA system targeting four Xilinx Virtex-5 FPGAs for accelerating Restricted Boltzmann Machine neural networks.
- Leveraged an embedded message-passing interface (MPI) network for flexible communication between processors and processing engines across FPGAs.
- Designed an instruction-based DMA core that allowed off-chip memory access across the network.
- Proposed a weight storage mechanism to pack larger neural networks in on-chip memory.

## Electronic Design Engineer (Internship),

2008-2009

Advanced Micro Devices

- Assisted in the design of new discrete graphics solutions including schematic capture, PCB layout, BOM management and signal measurements.
- Interfaced with other engineers in a cross-functional team to resolve issues including signal integrity, electromagnetic compliance and power requirements.
- Developed scripts in Linux and Windows automating diagnostic tests to improve the efficiency of the graphics board debugging and design process.

| Awards  | AND    |
|---------|--------|
| SCHOLAF | RSHIPS |

| Doctoral Completion Award - \$15,000    | 2018 |
|-----------------------------------------|------|
| Huawei Prize - \$5,000                  | 2017 |
| Ontario Graduate Scholarship - \$15,000 | 2017 |
| Ontario Graduate Scholarship - \$15,000 | 2014 |

### REFEREED JOURNAL PUBLICATIONS

- Danyao Wang, **Charles Lo**, Jasmina Vasiljevic, Natalie Enright Jerger and J. Gregory Steffan. DART: A Programmable Architecture for NoC Simulation on FPGAs. *IEEE Transactions on Computers*, 2014
- Naif Tarafdar, Nariman Eskandari, Varun Sharma, **Charles Lo** and Paul Chow. Galapagos: A Full Stack Approach to FPGA Integration in the Cloud. *IEEE Micro*, 2018

## REFEREED CONFERENCE PUBLICATIONS

- Charles Lo and Paul Chow. Hierarchical Modelling of Generators in Design-Space Exploration. 28th International Symposium on Field-Programmable Custom Computing Machines (FCCM'20), 2020
- Charles Lo and Paul Chow. Multi-Fidelity Optimization for High-Level Synthesis Directives. 28th International Conference on Field Programmable Logic and Applications (FPL'18), 2018
- Charles Lo and Paul Chow. Model-Based Optimization of High Level Synthesis Directives. 26th International Conference on Field Programmable Logic and Applications (FPL'16), 2016 (acceptance rate: 21%)
- Charles Lo and Paul Chow. A High-Performance Architecture for Training Viola-Jones Object Detectors. *International Conference on Field-Programmable Technology (FPT'12)*, 2012 (acceptance rate: 21%)
- Zhongduo Lin, Charles Lo and Paul Chow. K-means Implementation on FPGA for High-Dimensional Data Using Triangle Inequality. 22nd International Conference on Field Programmable Logic and Applications (FPL'12), 2012 (acceptance rate: 28%)
- Charles Lo and Paul Chow. Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI. 19th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'11), 2011 (acceptance rate: 26%)

#### PATENTS

- H. Styles, J. Fifield, R. Wittig, P. James-Roxby, S. Santan, D. Varma, F. Martinez Vallina, S. Zhou, C. Lo, "Heterogeneous multiprocessor program compilation targeting programmable integrated circuit," US Patent #9,218,443, Issued December 2015
- H. Styles, J. Fifield, R. Wittig, P. James-Roxby, S. Santan, D. Varma, F. Martinez Vallina, S. Zhou, C. Lo, "Heterogeneous multiprocessor platform targeting programmable integrated circuits," US Patent #9,846,660, Issued December 2017
- PRESENTATIONS **Charles Lo** and Paul Chow. Multi-Fidelity Optimization for High-Level Synthesis Directives. At the 28th International Conference on Field Programmable Logic and Applications (FPL'18), Dublin, Ireland, 2018
  - Charles Lo and Paul Chow. Model-Based Optimization of High Level Synthesis Directives. At the 26th International Conference on Field Programmable Logic and Applications (FPL'16), Lausanne, Switzerland, 2016
  - Charles Lo and Paul Chow. A High Performance Architecture for Training Viola-Jones Object Detectors. At the Connections: University of Toronto Graduate Symposium, Toronto, Canada, 2012
  - Charles Lo and Paul Chow. Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI. At the Connections: University of Toronto Graduate Symposium, Seoul, Korea, 2011
  - **Charles Lo** and Paul Chow. Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI. At the *University of Toronto FPGA Seminar*, Toronto, Canada, 2011
  - **Charles Lo** and Paul Chow. Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI. At the *CMC Microsystems 2010 Annual Symposium TEXPO Demonstration*, Ottawa, Canada, 2010

## OTHER PROFESSIONAL DEVELOPMENT

Five Workshops English Language and Writing Support, University of Toronto

Prewriting Strategies for Developing and Organizing Your Ideas 2014

Four Workshops English Language and Writing Support, University of Toronto

TEACHING

Computer Organization,

Teaching Assistant

EXPERIENCE

ECE352, 2010

(3rd Year Undergraduates)

Monitored labs and marked exams covering embedded programming and implementation of a simple processor in Verilog.

Digital and Computer Systems,

Teaching Assistant

ECE253, 2011

(2nd Year Undergraduates)

Monitored labs and marked exams.

Digital Systems,

Teaching Assistant

ECE241, 2013, 2014, 2015, 2016, 2017

(2nd Year Undergraduates)

Monitored labs, marked exams and covered introductory lecture. Mentored students during 3-week design projects using Verilog to implement hardware designs.

Computer Hardware,

Teaching Assistant

ECE342, 2014

(2nd Year Undergraduates)

Monitors labs and marked exams covering advanced digital hardware designs.

Digital Systems Design,

Teaching Assistant

ECE532, 2015, 2016, 2017, 2018

(4nd Year Undergraduates & Graduate Students)

Mentored students during 8-week design projects. Developed course assignments and lectured on FPGA design concepts. Developed and delivered reference design for new camera peripheral.

Digital Systems Design for Systems-on-Chip,

Teaching Assistant

ECE1373, 2016, 2017, 2018

(Graduate Students)

Created assignments covering high-level synthesis as well as supporting testing methodology and code. Delivered FPGA shell platform for allowing students to share FPGA resources.