Skip to content

CristianRods/mini-dataframe-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MINI DATAFRAME ENGINE

Overview

Mini DataFrame Engine is a learning-driven project focused on implementing core data structures from scratch in Python.

The objective is to gain deep control over data representation, API design, and algorithmic trade-offs by building a minimal DataFrame-like abstraction without relying on external libraries such as pandas.

This project is part of a broader engineering roadmap toward strong foundations in Python internals, data structures, and systems design.

Motivation

Modern data tools abstract away many important implementation details. This project intentionally removes those abstractions to understand:

Internal memory representation of tabular data

Column-oriented vs row-oriented trade-offs

API ergonomics

Complexity analysis of operations

Type consistency and validation strategies

The implementation follows strict engineering discipline:

Typed code (mypy strict mode)

Automated linting (ruff)

Unit testing (pytest)

Continuous Integration (GitHub Actions)

Technical Stack

Python 3.12 Ruff (lint + formatting) Mypy (strict typing) Pytest (testing framework) GitHub Actions (CI)

Arquitecture / Design Goals

The engine follows a minimal modular design: DataFrame core abstraction Column storage layer Operation layer (selection, filtering, projection) Validation and typing utilities

Design principles: Explicit over implicit Predictable performance characteristics No hidden magic Clear separation between data model and operations

Test / QA

tests applied and descripted in the TOML file. The project must pass the following tests:

make lint
make type
make test

before any commit.

Quality is enforced through: Static analysis (ruff) Strict type checking (mypy) Unit tests for all public methods

CI pipeline validating every push with Ruff: line-length = 88 target-version = "py312" select = ["E", "F", "I", "N", "UP", "B", "SIM"]

Roadmap

Phase 1: Minimal DataFrame structure Column validation Row selection Basic filtering

Phase 2: Vectorized operations Grouping logic Memory efficiency exploration

Phase 3: Performance benchmarking Internal optimization experiments

Version Log
2026-02-23: Initiate the project. First commit with the base code and basic tests.

About

Mini DataFrame Engine is a learning-driven project focused on implementing core data structures from scratch in Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors