Skip to content

1SHAMAY1/GPU_Compute_Core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project 2: GPU Compute Core & Ray-Tracing Accelerator

Overview

This project implements a custom SIMT GPU Streaming Multiprocessor (SM) integrated with a dedicated hardware Ray-Tracing Compute Unit (RTCU). It is designed to demonstrate hardware/software co-design for graphics acceleration, bypassing Apple Silicon's historical lack of dedicated ray-tracing hardware blocks in earlier generations.

graph LR
    SM[GPU Streaming Multiprocessor] -->|1024-bit Packed Ray| RTCU[Ray-Tracing Compute Unit]
    RTCU -->|Intersection Test| BVH[Unified Memory / BVH Nodes]
    RTCU -->|Hit/Miss Results| SM
Loading

Architecture Components

1. gpu_sm_core.sv (SIMT Multiprocessor)

Implements a 32-lane vector streaming multiprocessor.

  • SIMT Execution Mask: Manages thread active masks across the 32-lane warp.
  • Instruction Decoder: Decodes custom graphics instructions. Employs a custom opcode (0x7B) to dispatch ray-tracing operations directly to the co-processor.
  • Packed Array Interface: Flattens 32-thread vector paths (e.g., 32 threads * 32-bit float coordinates) into 1024-bit packed vectors for synthesizable, high-bandwidth communication with the RTCU.

2. rtcu_core.sv (Ray-Tracing Compute Unit)

A dedicated hardware accelerator designed to perform parallel Bounding Volume Hierarchy (BVH) node traversal and ray-triangle intersection tests.

  • Memory Interface: Fetches scene geometry and BVH tree structures directly from Unified Memory using a 256-bit wide bus.
  • Pipelined Traversal: Employs an internal state machine (IDLE, FETCH_BVH, INT_BOX, FETCH_TRI, INT_TRI) to walk the spatial index and check for ray intersections.

3. sim/gpu_sim.py (Architectural Simulator)

A cycle-accurate architectural simulator written in Python.

  • Functional Math Model: Implements vector math, camera ray generation, and bounding box/triangle intersection tests.
  • Output: Path-traces a scene with spherical geometry and shadows, generating a native render.bmp file to verify the visual correctness of the rendering pipeline.

Verification & Simulation

Testbench: tb/tb_gpu_core.sv

Verifies the SM-to-RTCU interface:

  1. Simulates an instruction fetch containing the custom RT opcode (0x7B).
  2. Checks that the SM decodes the opcode and asserts the rtcu_dispatch_valid signal.
  3. Verifies that the 1024-bit packed ray coordinates (origin and direction vectors) are properly driven onto the bus.

How to Run:

  • RTL Simulation: Run the design directly on EDA Playground using this pre-configured link: 👉 Live EDA Playground Simulator

    Alternatively, copy tb_gpu_core.sv and the source design files into the playground manually, select Aldec Riviera Pro, and click Run.

  • Visual Simulator: Execute python sim/gpu_sim.py in your local terminal to run the architectural path-tracer and generate the visual render.bmp output.


Understanding the Synthesized Schematic (Block Diagram)

The included docs/Schematic_gpu_top.pdf represents the top-level hardware routing between the Streaming Multiprocessor (SM) and the Ray-Tracing Compute Unit (RTCU) produced by the Yosys synthesis suite. For engineers reviewing this schematic, note the following symbolic representations:

  • Octagonal Nodes: Represent the physical input and output ports of the gpu_top module.
  • Comparator Box ($eq): You can trace the fetch_instr bus directly into a comparator checking against 7'b1111011 (the binary representation of the 0x7B custom Ray-Tracing opcode).
  • AND Gate Box ($logic_and): The output of the comparator is logically ANDed with fetch_valid to generate the rtcu_dispatch_valid handshake signal.
  • 1024-bit Bus Routing: The massive, ultra-wide data buses mapping the 32-lane packed ray origins and directions (e.g., rtcu_ray_dir_x) route cleanly and directly from the SM to the RTCU without combinatorial delay.
  • Hardware Handshake: The 1'1 (True) constant node is driven continuously into the rtcu_dispatch_ready port, confirming the single-cycle handshake capability of the coprocessor.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors