A production-quality 2D game engine built from scratch in modern C++, focusing on low-latency systems programming, geometric algorithms, and performance optimization.
This engine is designed to showcase:
- Low-latency C++ mastery: Cache-aware programming, SIMD optimization, custom memory management
- Geometric algorithms: Novel spatial data structures with theoretical analysis
- Performance engineering: Profiling-driven optimization, nanosecond-level measurements
- Systems thinking: Memory layouts, concurrency, real-time constraints
Performance Target: Consistent 60fps (16ms frame budget) with 10,000+ dynamic entities
- Hardware-accelerated 2D rendering (OpenGL/DirectX backend)
- Sprite batching and texture atlasing
- Camera system with viewport culling
- Particle systems
- Debug visualization tools
- Multiple spatial partitioning strategies:
- Quadtree with dynamic balancing
- Bounding Volume Hierarchy (BVH)
- Spatial hash grid
- Novel cache-oblivious variant (research contribution)
- Broad-phase and narrow-phase collision detection
- Continuous collision detection (CCD)
- SIMD-optimized geometric primitives
- Data-oriented design for cache efficiency
- Component arrays with tight memory layouts
- System-based update architecture
- Memory pool allocators
- Custom memory allocators (pool, stack, frame allocators)
- SIMD vectorization (SSE/AVX) for math operations
- Multithreaded job system for parallel processing
- Lock-free spatial queries
- Profiling and instrumentation framework
- Cycle-accurate timing (rdtsc)
Goal: Establish solid foundation with clean architecture and basic functionality
- CMake build system with proper dependency management
- Cross-platform window creation (SDL3)
- OpenGL context setup and basic rendering
- Input handling system (keyboard, mouse)
- Git repository structure and .gitignore
Deliverable: Window opens, clears to color, handles input ✅ COMPLETE
- Vector2/3/4 classes with operator overloading
- Matrix2x2, 3x3, 4x4 classes
- Quaternion for rotations (Later requires newer Math)
- Geometric primitives (AABB, Circle)
- Intersection tests (AABB-AABB, Circle-Circle, AABB-Circle, Line-AABB with Liang-Barsky, Line-Circle)
- Unit tests for all math operations
Deliverable: Comprehensive math library with test coverage ✅ MOSTLY COMPLETE
- Entity manager with ID generation
- Component storage (consider SoA vs AoS tradeoffs)
- System base class and update loop
- Basic components (Transform, Sprite, RigidBody)
- Scene graph management
- Basic logging system (optional - for debugging)
Deliverable: Can create entities, add components, update systems
Goal: Production-quality rendering with batching and optimization
- Texture loading and management (stb_image)
- Shader compilation and management (GLSL 330)
- Sprite class with transform properties
- SpriteRenderer with VAO/VBO setup
- Orthographic projection matrix for 2D
- Matrix transformations (translate, scale, rotate)
- Sprite batching optimization
- Texture atlas generation
- Z-ordering and layer management
Deliverable: Can render 1000+ sprites efficiently ✅ COMPLETE
Technical Implementation:
- Texture system using stb_image with PNG/JPG/BMP support
- GLSL shaders with vertex/fragment compilation and linking
- Row-major matrices with GL_TRUE transpose for OpenGL compatibility
- VAO/VBO setup for quad rendering with position + UV coordinates
- Color tinting support via uniform shader variables
- Sprite batching system: beginBatch/submitSprite/endBatch workflow
- Dynamic VBO: GL_DYNAMIC_DRAW for efficient batch updates
- Single draw call: All sprites rendered in one glDrawArrays call
- Performance: 1000 sprites @ 166-168 FPS (single draw call vs 1000 individual calls)
- 2D camera with viewport transformations
- Frustum culling optimization
- Particle system with GPU instancing
- Debug rendering (shapes, lines, text)
- Frame buffer objects for post-processing
- Performance profiling of render pipeline
Deliverable: Camera controls, culled rendering, visual effects
Goal: RESEARCH FOCUS - Novel spatial data structures with theoretical analysis
- Classic quadtree implementation
- Dynamic insertion/deletion
- Range queries and frustum queries
- Memory layout optimization (cache-line awareness)
- Benchmark against naive O(n²) approach
Deliverable: Working quadtree with performance analysis
- Bounding Volume Hierarchy (top-down construction)
- Spatial hash grid
- Uniform grid with variable cell sizes
- Comparative benchmark of all three approaches
- Analysis: best/worst case scenarios for each
Deliverable: Three spatial partitioning options with benchmarks
- Design cache-oblivious variant of quadtree/BVH
- Theoretical complexity analysis (write proofs)
- Implementation with careful memory layout
- Cache miss profiling (perf/vtune)
- Comparison with traditional approaches
- Write technical paper/report (8-12 pages)
Deliverable: Research paper + implementation + benchmarks
- Broad-phase using best spatial structure
- Narrow-phase: SAT (Separating Axis Theorem)
- Narrow-phase: GJK algorithm for convex shapes
- Continuous collision detection (swept shapes)
- Collision response and resolution
- Contact manifold generation
Deliverable: Full collision system with <1ms for 10k entities
Goal: HFT-LEVEL FOCUS - Achieve microsecond-level latency targets
- SIMD Vector2/Vector3 operations (SSE/AVX)
- Batch collision detection (4-8 at once)
- Matrix operations with SIMD
- Benchmark: scalar vs SIMD (aim for 4-8x speedup)
- Assembly inspection of critical paths
- Alignment requirements and padding
Deliverable: 4x+ speedup on geometric operations
- Custom allocators (pool allocator for entities)
- Stack allocator for temporary data
- Frame allocator for per-frame allocations
- Memory arena for subsystems
- SoA (Structure of Arrays) layout for hot data
- Cache-line alignment for frequently accessed data
- Memory profiling (heap allocations per frame = 0)
Deliverable: Zero runtime allocations, optimized memory layout
- Job system with work-stealing queues
- Parallel broad-phase collision detection
- Lock-free spatial structure queries
- Thread pool management
- Benchmark: scaling across 4-8 cores
- Avoid false sharing (cache line padding)
- Race condition testing and validation
Deliverable: Near-linear scaling across cores
Goal: Production features and polish
- Velocity and acceleration integration
- Constraint solver (distance, hinge)
- Friction and restitution
- Stable simulation with fixed timestep
- Deterministic physics (optional: fixed-point math)
Deliverable: Stable physics simulation
- Asset loading system (JSON/custom format)
- Texture packer for atlas generation
- Scene serialization/deserialization
- Hot-reload system for code/assets
- Level editor (ImGui-based)
Deliverable: Complete asset workflow
- Audio system (OpenAL/FMOD)
- Sound effects and music playback
- Shader effects (bloom, distortion)
- Screen shake and camera effects
- Tweening/easing system
Deliverable: Polish and juice features
Goal: Professional presentation for portfolio
- Physics sandbox demo (10k+ entities stress test)
- Particle effects showcase
- Platformer game prototype
- Top-down shooter prototype
- Each demo showcases different features
Deliverable: 3-4 playable demos
- Automated performance tests
- Comparison graphs (spatial structures)
- Frame time analysis and percentiles
- Memory usage profiling
- Scaling tests (1k, 5k, 10k, 50k entities)
- Generate performance report
Deliverable: Comprehensive benchmark results
- API documentation (Doxygen)
- Architecture documentation
- Performance analysis document
- README with build instructions
- Code cleanup and commenting
- Demo video/GIFs for GitHub
Deliverable: Professional portfolio piece
- SoA (Structure of Arrays) vs AoS (Array of Structures) tradeoffs
- Cache line alignment and false sharing prevention
- Prefetching strategies in spatial queries
- Memory arena and custom allocator design
- Vectorized geometric operations (SSE/AVX)
- Alignment requirements and performance considerations
- Batch processing for collision detection
- Assembly-level optimization analysis
- Lock-free spatial structure design
- Work-stealing job system
- Memory ordering and synchronization
- Scaling analysis across multiple cores
- Cache-oblivious spatial data structure design
- Theoretical complexity analysis with proofs
- Experimental comparison of spatial partitioning approaches
- Performance characterization across diverse workloads
- Frame time: <16ms (60fps) with 10k entities
- Broad-phase collision: <1ms for 10k entities
- Narrow-phase per pair: <100ns average
- Spatial structure update: <500μs for 1k moving entities
- Memory allocations per frame: 0 (after initialization)
- Collision pairs processed: >1M pairs/sec
- Entities rendered: >50k sprites at 60fps
- Particle count: >100k particles with GPU instancing
- Memory per entity: <128 bytes average
- Cache miss rate: <5% on hot paths
- Memory fragmentation: <10% after 1 hour runtime
- Language: C++17/20
- Build System: CMake
- Compiler: MSVC/GCC/Clang with maximum optimizations
- Rendering: OpenGL 4.5+ / DirectX 11
- Windowing: GLFW or SDL2
- Math: Custom library + GLM for verification
- Image Loading: stb_image
- Audio: OpenAL or FMOD
- UI: Dear ImGui (for tools/debug)
- Testing: Google Test
- Profiling: Tracy, Optick, or custom instrumentation
- Memory: Valgrind, AddressSanitizer
- Cache Analysis: perf (Linux), Intel VTune, or AMD μProf
- Assembly Inspection: Compiler Explorer, objdump
# Clone repository
git clone https://github.com/yourusername/2d-game-engine.git
cd 2d-game-engine
# Create build directory
mkdir build && cd build
# Configure (Release mode for performance testing)
cmake .. -DCMAKE_BUILD_TYPE=Release
# Build
cmake --build . --config Release -j8
# Run tests
ctest --output-on-failure
# Run demo
./bin/demo_sandbox2d-game-engine/
├── engine/
│ ├── core/ # ECS, memory, threading
│ ├── math/ # Vector, matrix, geometric primitives
│ ├── physics/ # Collision detection, spatial structures
│ ├── renderer/ # Rendering pipeline, batching
│ ├── audio/ # Audio system
│ └── utils/ # Logging, profiling, utilities
├── tools/ # Asset pipeline, level editor
├── demos/ # Sample games and stress tests
├── tests/ # Unit and integration tests
├── benchmarks/ # Performance benchmarking suite
├── docs/ # Documentation and research papers
└── external/ # Third-party dependencies
- Designed and implemented a cache-oblivious spatial partitioning algorithm with provable O(n log n) construction and O(log n + k) query complexity
- Built high-performance C++ engine with sub-microsecond collision detection for 10k+ entities
- Achieved 4-8x speedup through SIMD vectorization and cache-aware memory layouts
- Implemented lock-free concurrent spatial queries with near-linear scaling across cores
- Profiled every hot path to cycle-level accuracy using rdtsc and hardware counters
- Optimized to handle 50k+ sprites at 60fps with dynamic lighting and particle effects
- Networking: Deterministic lockstep for multiplayer
- Scripting: Lua/ChaiScript integration for gameplay
- Advanced Rendering: Normal maps, lighting, shadows
- Editor: Full-featured level editor with undo/redo
- Mobile: iOS/Android port with touch controls
- WebAssembly: Browser-based demos
- "Game Engine Architecture" - Jason Gregory
- "Real-Time Collision Detection" - Christer Ericson
- "Game Programming Patterns" - Robert Nystrom
- "Computer Graphics: Principles and Practice" - Foley et al.
- "Computational Geometry: Algorithms and Applications" - de Berg et al.
- "Cache-Oblivious Algorithms" - Frigo et al.
- "Fast BVH Construction on GPUs" - Lauterbach et al.
- Dynamic spatial data structures papers from SIGGRAPH/I3D
- CS 106B/X (Stanford) - Programming Abstractions
- 6.172 (MIT) - Performance Engineering
- CS 148 (Stanford) - Computer Graphics
MIT License - See LICENSE file for details
Your Name Email: sarvik.student.cd.eee24@itbhu.ac.in LinkedIn: linkedin.com/in/sarvik1807 Portfolio: sarvik.tech
- Phase 1-2 (Weeks 1-5): Foundation + Rendering → Functional engine
- Phase 3 (Weeks 6-9): Geometric algorithms → Research contribution
- Phase 4 (Weeks 10-12): Optimization → HFT-level performance
- Phase 5-6 (Weeks 13-18): Features + Polish → Portfolio piece
Total Duration: 18 weeks (4.5 months) for complete project Minimum Viable: 12 weeks (Phases 1-4) for strong portfolio piece
Last Updated: December 2025 Current Status: Week 4 COMPLETE! Fully functional sprite rendering system with batching. Successfully rendering 1000 sprites at 166-168 FPS using single draw call optimization.