Skip to content

LessUp/tiny-dl-inference

Repository files navigation

Tiny-DL-Inference

License: MIT WebGPU TypeScript Vitest

A micro deep learning inference engine built with WebGPU from scratch. This project demonstrates AI infrastructure optimization techniques including kernel fusion, memory layout optimization, and Im2Col algorithm implementation.

Features

  • WebGPU-based Compute: Hand-written WGSL shaders for all neural network operators
  • Core Operators: Conv2d, MaxPool, ReLU, Softmax, Dense, Flatten
  • Kernel Fusion: Fused Conv2d+Bias+ReLU operator for reduced memory traffic
  • Memory Layout Optimization: Support for both NCHW and NHWC layouts
  • Im2Col Algorithm: Convolution-to-GEMM transformation for performance
  • Property-Based Testing: Comprehensive test suite using fast-check
  • MNIST Support: End-to-end digit classification demo

Project Structure

src/
├── core/           # Core infrastructure (GPUContext, Tensor, errors)
├── operators/      # Neural network operators
├── engine/         # Inference engine and model loader
└── utils/          # Utilities (benchmark, im2col, CPU reference)

tests/
├── core/           # Core component tests
└── operators/      # Operator tests with property-based testing

Installation

npm install

Building

npm run build

Testing

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Generate coverage report
npm run test:coverage

Usage

import { InferenceEngine, ModelLoader, Tensor } from 'tiny-dl-inference';

// Initialize engine
const engine = new InferenceEngine();
await engine.initialize();

// Load model
const loader = new ModelLoader();
const model = await loader.loadFromJSON('model.json');
await engine.loadModel(model);

// Run inference
const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);
const output = await engine.infer(input);
const result = await output.download();

console.log('Predictions:', result);

Key Optimizations

1. Kernel Fusion

Combines Conv2d + Bias + ReLU into a single GPU kernel, reducing memory traffic by 3x:

  • Non-fused: 6 memory operations (3 reads + 3 writes)
  • Fused: 2 memory operations (1 read + 1 write)

2. Memory Layout

  • NCHW: PyTorch-style layout [Batch, Channel, Height, Width]
  • NHWC: TensorFlow-style layout [Batch, Height, Width, Channel]
  • NHWC provides better memory coalescing on GPU for spatial operations

3. Im2Col Algorithm

Transforms convolution into matrix multiplication (GEMM) for better GPU utilization:

Input [N, C, H, W] → Im2Col → [N*outH*outW, C*kH*kW]
Weight [K, C, kH, kW] → Reshape → [K, C*kH*kW]
Output = GEMM(Weight, Im2Col(Input))

Testing Strategy

The project uses dual testing approach:

  • Unit Tests: Specific examples and edge cases
  • Property Tests: Universal properties across all inputs (100+ iterations each)

Correctness Properties

  1. Tensor Data Round-Trip: Upload → Download preserves data
  2. Layout Conversion Round-Trip: NCHW → NHWC → NCHW preserves data
  3. ReLU Element-wise Correctness: output[i] = max(0, input[i])
  4. Softmax Output Validity: All values in [0,1], sum to 1.0
  5. Softmax Numerical Stability: No NaN/Infinity for large inputs
  6. MaxPool Output Shape: Correct shape calculation
  7. MaxPool Correctness: Selects maximum in pooling window
  8. Conv2d Output Shape: Correct shape with stride/padding
  9. Conv2d Correctness: Matches CPU reference implementation
  10. Kernel Fusion Equivalence: Fused = Sequential execution
  11. Layout-Independent Conv2d: Same results across layouts
  12. MNIST Output Distribution: Valid probability distribution

Performance Benchmarking

import { Benchmark } from 'tiny-dl-inference';

const benchmark = new Benchmark();

// Measure operator performance
const result = await benchmark.measureOperator(operator, inputs, params, 100);
console.log(`Execution time: ${result.executionTimeMs}ms`);

// Compare fused vs non-fused
const comparison = await benchmark.compareFusion(input, weight, bias, fusedOp, separateOps, params);
console.log(`Speedup: ${comparison.separate.executionTimeMs / comparison.fused.executionTimeMs}x`);

// Compare memory layouts
const layoutComparison = await benchmark.compareLayouts(opNCHW, opNHWC, inputNCHW, inputNHWC, params);

Technical Highlights

  • Zero Dependencies: No TensorFlow.js or ONNX Runtime
  • Type-Safe: Full TypeScript implementation
  • Modular Architecture: Easy to extend with new operators
  • Comprehensive Testing: Property-based testing ensures correctness
  • Educational: Clear code demonstrating AI infrastructure concepts

Browser Compatibility

Requires WebGPU support:

  • Chrome 113+
  • Edge 113+
  • Safari 18+ (macOS Sonoma+)

License

MIT

Author

Built as a demonstration of AI infrastructure and GPU computing expertise.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published