Tiny-DL-Inference

A micro deep learning inference engine built with WebGPU from scratch. This project demonstrates AI infrastructure optimization techniques including kernel fusion, memory layout optimization, and Im2Col algorithm implementation.

Features

WebGPU-based Compute: Hand-written WGSL shaders for all neural network operators
Core Operators: Conv2d, MaxPool, ReLU, Softmax, Dense, Flatten
Kernel Fusion: Fused Conv2d+Bias+ReLU operator for reduced memory traffic
Memory Layout Optimization: Support for both NCHW and NHWC layouts
Im2Col Algorithm: Convolution-to-GEMM transformation for performance
Property-Based Testing: Comprehensive test suite using fast-check
MNIST Support: End-to-end digit classification demo

Project Structure

src/
├── core/           # Core infrastructure (GPUContext, Tensor, errors)
├── operators/      # Neural network operators
├── engine/         # Inference engine and model loader
└── utils/          # Utilities (benchmark, im2col, CPU reference)

tests/
├── core/           # Core component tests
└── operators/      # Operator tests with property-based testing

Installation

npm install

Building

npm run build

Testing

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Generate coverage report
npm run test:coverage

Usage

import { InferenceEngine, ModelLoader, Tensor } from 'tiny-dl-inference';

// Initialize engine
const engine = new InferenceEngine();
await engine.initialize();

// Load model
const loader = new ModelLoader();
const model = await loader.loadFromJSON('model.json');
await engine.loadModel(model);

// Run inference
const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);
const output = await engine.infer(input);
const result = await output.download();

console.log('Predictions:', result);

Key Optimizations

1. Kernel Fusion

Combines Conv2d + Bias + ReLU into a single GPU kernel, reducing memory traffic by 3x:

Non-fused: 6 memory operations (3 reads + 3 writes)
Fused: 2 memory operations (1 read + 1 write)

2. Memory Layout

NCHW: PyTorch-style layout [Batch, Channel, Height, Width]
NHWC: TensorFlow-style layout [Batch, Height, Width, Channel]
NHWC provides better memory coalescing on GPU for spatial operations

3. Im2Col Algorithm

Transforms convolution into matrix multiplication (GEMM) for better GPU utilization:

Input [N, C, H, W] → Im2Col → [N*outH*outW, C*kH*kW]
Weight [K, C, kH, kW] → Reshape → [K, C*kH*kW]
Output = GEMM(Weight, Im2Col(Input))

Testing Strategy

The project uses dual testing approach:

Unit Tests: Specific examples and edge cases
Property Tests: Universal properties across all inputs (100+ iterations each)

Correctness Properties

Tensor Data Round-Trip: Upload → Download preserves data
Layout Conversion Round-Trip: NCHW → NHWC → NCHW preserves data
ReLU Element-wise Correctness: output[i] = max(0, input[i])
Softmax Output Validity: All values in [0,1], sum to 1.0
Softmax Numerical Stability: No NaN/Infinity for large inputs
MaxPool Output Shape: Correct shape calculation
MaxPool Correctness: Selects maximum in pooling window
Conv2d Output Shape: Correct shape with stride/padding
Conv2d Correctness: Matches CPU reference implementation
Kernel Fusion Equivalence: Fused = Sequential execution
Layout-Independent Conv2d: Same results across layouts
MNIST Output Distribution: Valid probability distribution

Performance Benchmarking

import { Benchmark } from 'tiny-dl-inference';

const benchmark = new Benchmark();

// Measure operator performance
const result = await benchmark.measureOperator(operator, inputs, params, 100);
console.log(`Execution time: ${result.executionTimeMs}ms`);

// Compare fused vs non-fused
const comparison = await benchmark.compareFusion(input, weight, bias, fusedOp, separateOps, params);
console.log(`Speedup: ${comparison.separate.executionTimeMs / comparison.fused.executionTimeMs}x`);

// Compare memory layouts
const layoutComparison = await benchmark.compareLayouts(opNCHW, opNHWC, inputNCHW, inputNHWC, params);

Technical Highlights

Zero Dependencies: No TensorFlow.js or ONNX Runtime
Type-Safe: Full TypeScript implementation
Modular Architecture: Easy to extend with new operators
Comprehensive Testing: Property-based testing ensures correctness
Educational: Clear code demonstrating AI infrastructure concepts

Browser Compatibility

Requires WebGPU support:

Chrome 113+
Edge 113+
Safari 18+ (macOS Sonoma+)

License

MIT

Author

Built as a demonstration of AI infrastructure and GPU computing expertise.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.kiro/specs/tiny-dl-inference		.kiro/specs/tiny-dl-inference
changelog		changelog
examples		examples
src		src
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiny-DL-Inference

Features

Project Structure

Installation

Building

Testing

Usage

Key Optimizations

1. Kernel Fusion

2. Memory Layout

3. Im2Col Algorithm

Testing Strategy

Correctness Properties

Performance Benchmarking

Technical Highlights

Browser Compatibility

License

Author

About

Uh oh!

Releases

Packages

Languages

License

LessUp/tiny-dl-inference

Folders and files

Latest commit

History

Repository files navigation

Tiny-DL-Inference

Features

Project Structure

Installation

Building

Testing

Usage

Key Optimizations

1. Kernel Fusion

2. Memory Layout

3. Im2Col Algorithm

Testing Strategy

Correctness Properties

Performance Benchmarking

Technical Highlights

Browser Compatibility

License

Author

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages