Mini DataFrame Engine is a learning-driven project focused on implementing core data structures from scratch in Python.
The objective is to gain deep control over data representation, API design, and algorithmic trade-offs by building a minimal DataFrame-like abstraction without relying on external libraries such as pandas.
This project is part of a broader engineering roadmap toward strong foundations in Python internals, data structures, and systems design.
Modern data tools abstract away many important implementation details. This project intentionally removes those abstractions to understand:
Internal memory representation of tabular data
Column-oriented vs row-oriented trade-offs
API ergonomics
Complexity analysis of operations
Type consistency and validation strategies
The implementation follows strict engineering discipline:
Typed code (mypy strict mode)
Automated linting (ruff)
Unit testing (pytest)
Continuous Integration (GitHub Actions)
Python 3.12 Ruff (lint + formatting) Mypy (strict typing) Pytest (testing framework) GitHub Actions (CI)
The engine follows a minimal modular design: DataFrame core abstraction Column storage layer Operation layer (selection, filtering, projection) Validation and typing utilities
Design principles: Explicit over implicit Predictable performance characteristics No hidden magic Clear separation between data model and operations
tests applied and descripted in the TOML file. The project must pass the following tests:
make lint
make type
make testbefore any commit.
Quality is enforced through: Static analysis (ruff) Strict type checking (mypy) Unit tests for all public methods
CI pipeline validating every push with Ruff: line-length = 88 target-version = "py312" select = ["E", "F", "I", "N", "UP", "B", "SIM"]
Phase 1: Minimal DataFrame structure Column validation Row selection Basic filtering
Phase 2: Vectorized operations Grouping logic Memory efficiency exploration
Phase 3: Performance benchmarking Internal optimization experiments
Version Log
2026-02-23: Initiate the project. First commit with the base code and basic tests.