# Soumil Gupta

Computer Architect | (669) 246-2622 | Soumilgupta2@gmail.com | Pleasanton, CA | linkedin.com/in/soumil-gupta-ce

## **EDUCATION**

#### University of Illinois, Urbana-Champaign

May 2025

Bachelor of Science in Computer Engineering | GPA 3.80/4 | 3x Dean's List

#### Relevant Coursework:

- Hardware: Computer Architecture, Digital Systems, VLSI System Design, IC Device Theory and Fabrication, Semiconductor Electronics, Digital Signal Processing, Analog Signal Processing
- **Software:** Operating Systems, Applied Parallel Programming, Communication Networks, Algorithms & Models of Computation, Data Structures & Algorithms, Computer Systems & Programming
- AI/ML: Artificial Intelligence, Applied Machine Learning, Deep Learning for CV, IoT & Cognitive Computing

#### **Technical Skills**

Languages: C++, C, C#, Python, Java, x86 Assembly, SystemVerilog

Frameworks/Libraries: Tensorflow, Keras, PyTorch, NumPy, SciPy, Scikit, OpenCV, Pandas

Tools/Environments: CUDA, Linux, GDB, Vim, Git, QEMU, Jupyter, LaTeX

Hardware: Intel Quartus, AMD Xilinx, Verdi, ModelSim/QuestaSim, RVFI, Spike, Synopsys VCS, Synopsys DC, Spyglass,

Cadence Virtuoso, Chipyard, Verilator, Gem5, Chisel, FIRRTL, Cocotb, PyRTL, Veryl, UVM Robotics/Other: ROS ½, Rviz, Gazebo, Matlab, Simulink, Unity, Autodesk Inventor, SolidWorks

#### **WORK EXPERIENCE**

## Research Intern | Samsung

Apr 2024 - Present

- Project is to write, verify, layout, and tapeout concept SoC with RISC-V Cores, MESA Mapping Controller, and CGRA
- Developed custom ISA for MESA controller for mapping CPU programs with any CGRA Hardware Accelerator.
- Built <u>custom hardware verification framework</u> complete with golden model to compare DUT v. gold execution traces.
- Verified HDL modules for multiple IPs including modules on AXI protocol integration and Register Files.
- Deployed and validated HDL code through ModelSim, Verilator simulation, and Vivado software.
- Participated in weekly scrums with Samsung SAIT group, developing strong teamwork and communication skills.

### Undergraduate Teaching Assistant | Computer Systems & Programming Course, UIUC

Aug 2024 - Present

- Led weekly office hours, mentoring 5-10 students per session, encouraging development of critical programming and problem-solving skills, and guiding students in developing algorithms and debugging with GDB.
- <u>Taught C & C++ programming fundamentals</u>, file/device I/O, memory, signals & interrupt handling, and concurrency.

### Researcher | Autonomous & Unmanned Vehicle Systems Laboratory (AUVSL), UIUC

Dec 2022 - Present

- Led a team to design and implement outdoor robot localization solution leveraging Ultrawide-Band Antennas by Writing Python & C++ ROS2 packages, deployable on Clearpath Husky & Jackal platforms.
- <u>Developed PyTorch scripts</u> to tune Adaptive Neural Fuzzy Inference System parameters, implemented SHFAF localization algorithm that improved localization accuracy by 82% to within 10 cm.
- Utilized Python to write localization software and scripts utilizing popular libraries to analyze and process sensor data.
- <u>Customized C firmware</u> and Utilized SPI communication protocol with Decawave UWB Sensors.
- <u>Co-authored IEEE paper</u> on localization methods. <u>Presented progress updates and results</u> to CERL organization under U.S. Army Corps of Engineers, exhibiting strong critical thinking and communication skills.

### Researcher | Virtual-Reality Immersive Laboratory, UIUC

Dec 2021 - Aug 2022

- Developed VR application in C# for Oculus platform to help college students visualize concepts in electromagnetism.
- Assisted professor to improve existing Unity apps to provide a better user-experience.

## HARDWARE SYSTEMS PROJECT EXPERIENCE

## RV32IM N-Way Superscalar Out-of-Order Explicit Register Renaming-Based CPU

May 2024

- Designed & verified from scratch an out-of-order N-way parameterizable processor for RISC-V 32-bit ISA with M-extension, achieving IPC of 0.51 on standard benchmark taking 62.7 mW of power at clock frequency 325 MHz.
- Implemented speculative branching with branch prediction overriding, & perceptron and Gshare branch predictors.
- Implemented cache features including next-line & stride prefetchers, post-commit store buffer with write coalescing.
- Integrated Synopsys IPs including sequential divider, and wrote Dadda advanced multiplier and shift-add multiplier.
- Wrote full processor in SystemVerilog, Debugged with Verdi, Spike, RVFI, Used Python to generate test programs.

- Authored FreePDK 45nm standard library (logic, muxes, flip-flops) with schematic, layout, and Liberty timing views.
- Built hierarchical datapath library manually with metal-2/3 power grid & metal-4/6 global routing.
- Architected a 1-bit slice integrating ALU, write-low/read-high register file, and cascaded compare logic; tiled 32 slices into 5-stage RV32I datapath that meets byte-addressable memory and branch timing specs.
- Verified extracted netlist with SystemVerilog harness, passing reference and self-written RV32I assembly suites.
- Wrote Tcl scripts to make 2nd processor in addition to manual layout to automate the layout process & minimize area for the entire processor, including floorplanning, placement, routing, & DRC/LVS checks.

## FPGA-Accelerated DNN for Autonomous Traffic Sign Recognition

Apr 2025

- Designed a hardware-accelerated DNN for AMD PYNQ-Z2 FPGA using Xilinx Vivado HLS.
- Implemented convolution, activation, and pooling layers with HLS pragmas to optimize dataflow and pipelining.
- Integrated the accelerator via AXI DMA and MicroBlaze, and automated image feeding through a Python script.
- Leveraged ScaleHLS frameworks for hardware–software co-design, balancing accuracy with FPGA resource limits.
- Benchmarked energy consumption and compute times against Kaggle Yolo-v8 model on Raspberry Pi + Edge TPU.

## IC Fabrication - Course-Based Project at UIUC

Dec 2024

- Independently processed a silicon wafer from start to finish in a cleanroom environment, performing every step—from RCA cleaning and oxidation to photolithography, etching, doping, and final metal lift-off.
- Utilized a five-mask sequence to fabricate complex devices (MOSFETs, BJTs, diodes, logic gates).
- Acquired in-depth knowledge of oxidation, photolithography, diffusion, chemical vapor deposition, ion beam processing, and annealing.

## Digital Audio Workstation (DAW) with Audio Processor on FPGA

May 2023

- Independently created functional DAW with keyboard input and VGA output capable of reading, writing, manipulating, and mixing sounds to create music.
- Designed and Implemented DAW with audio codec hardware interface, read/write .wav files, FSM, & audio effects.
- Researched documentation to incorporate Intel FPGA peripherals and Unit-Tested functionalities.

## AI/ML PROJECT EXPERIENCE

# LeNet-5 Implementation in CUDA

Apr 2025

- Designed and implemented a version of LeNet-5 CNN to run on a cluster of Nvidia A40 GPUs.
- Built convolution layers with feature unrolling, matrix multiplication, and permutation steps.
- Optimized performance through kernel fusion of matrix multiplication, unrolling, result permutations.
- Profiled and fine-tuned GPU kernels using Nsight Systems & Nsight Compute.
- Ensured correctness and reproducibility with automated testing, Gprof (CPU), and compute-sanitizer (GPU).

## ResNet Implementation for Image Classification

Feb 2025

- Implemented a deep Residual Neural Network architecture from scratch to classify images (e.g. MNIST-like data).
- Incorporated skip connections to combat vanishing gradients and enable stable training of deeper CNN layers.
- Employed TensorFlow/Keras for model development, training, and hyperparameter optimization; utilized NumPy, Matplotlib for data preparation, analysis, and visualization.
- Employed TensorFlow Lite to deploy trained models on Raspberry Pi + Google Edge TPU for IoT applications.

## Deep Reinforcement Learning for Atari Breakout

Apr 2024

- Implemented DQN and Double DQN agents with Python, PyTorch, and OpenAI Gym to master Atari Breakout.
- Developed a recurrent state mechanism to maintain historical context, enhancing decision-making during gameplay.
- Tuned hyperparameters and executed training on a GPU-based Linux cluster to achieve target mean scores.
- Incorporated LSTM modules to further improve temporal dependency handling and performance.

## SOFTWARE PROJECT EXPERIENCE

## Unix-Like Operating System

Dec 2023

- Engineered a UNIX-like, single-core kernel for 32-bit hardware fully from scratch using C and x86 Assembly.
- Implemented a paging-only virtual memory system, writable file system, software context switching, terminal switching, and hardware interrupts and exception handling.
- Developed device drivers for keyboard, mouse, real-time clock, & built interactive shell for executing system calls.
- Used advanced GDB techniques to efficiently debug and resolve system-level issues.
- Emulated and tested the OS in a QEMU virtual environment, ensuring safe and efficient debugging.