# Harshith Kantamneni

kantamneniharshith@gmail.com | +1 (414) 916-5799 | linkedin.com/in/hk4231 | Portfolio Madison, Wisconsin – 53726, USA

M.S. ECE student specializing in ML-accelerated systems, GPU performance optimization, and embedded architecture Actively seeking full-time roles in: GPU Performance, Deep Learning Systems, ML + Hardware Co-Design

## **EDUCATION**

### University of Wisconsin-Madison

Madison, USA

M.S. in Electrical and Computer Engineering

Sep 2024 - Dec 2025

• Relevant Coursework: Advanced Computer Architecture II, High Performance Computing, Machine Learning, Computer Architecture, Digital System Design

Vellore Institute of Technology

Amaravati, India

B. Tech in Electronics and Communication Engineering

Nov 2020 - May 2024

# **CERTIFICATIONS**

• NVIDIA Deep Learning Institute (DLI) – Getting Started with Accelerated Computing using CUDA C++, 2025

### **PROJECTS**

# ML-Guided CUDA Kernel Configuration

Jan 2025 - May 2025

Python, PyTorch, CUDA

GitHub

- Developed a PyTorch model to predict optimal CUDA launch configurations based on matrix size.
- Integrated predictions into kernel launcher achieving 30% speedup over static tuning.
- Benchmarked kernel performance across varied matrix sizes and validated prediction stability.

#### **TDG Partition Size Prediction**

 $Jan\ 2025-May\ 2025$ 

GitHub

- Engineered TDG and matrix workload features to train XGBoost for predicting optimal partition sizes.
- Reduced configuration tuning time by 25% with <5% error vs. exhaustive search.
- Validated model generalization across multiple workloads and runtime conditions.

# 5-Stage Pipelined RISC Processor

Aug 2024 – Dec 2024

Verilog, ModelSim

Python, XGBoost

- Designed a pipelined WISC-F24 ISA processor with hazard detection and full forwarding logic.
- Achieved 100% instruction coverage using cycle-accurate ModelSim testbench.

# Knight's Tour FSM Design

Aug 2024 – Dec 2024

 $System Verilog, \ Quartus \ Prime, \ Model Sim$ 

- Built a pipelined FSM in Verilog to solve Knight's Tour, achieving timing closure at 333 MHz on FPGA.
- Integrated UART/SPI interfaces for Bluetooth-based Knight movement control.
- Verified functional and timing correctness with post-synthesis gate-level simulations.

#### Embedded CO/CO<sub>2</sub> Monitoring System

Aug 2024 – Dec 2024

FreeRTOS, C, PSoC 6, Altium Designer

- Developed an embedded home monitoring system with real-time sensor data acquisition and network logging.
- Wrote custom I2C drivers for SCD-41 (CO<sub>2</sub>) and MQ-7 (CO) sensors.
- Designed and assembled a 2-layer PCB with minimal EMI and robust signal integrity.

#### EXPERIENCE

# Society for Space Education, Research and Development

Bengaluru, India

Research Intern

Nov 2021 - Jan 2022

- Programmed ESP32 firmware for environmental telemetry and GPS data acquisition on balloon payload.
- Optimized startup sequence, reducing boot latency by 30%.

# SKILLS

- Languages: C, C++, Python, Verilog, SystemVerilog
- Tools: CUDA, PyTorch, Nsight, ModelSim, Quartus, Altium Designer, MATLAB, Simulink, Git, TensorRT
- Domains: ML for Hardware, GPU Performance Optimization, Embedded Systems, Computer Architecture
- Protocols: I2C, SPI, UART, CAN, WiFi, Bluetooth