BlackBox2C

Convert scikit-learn models to native embedded code — C, C++, Arduino, MicroPython

BlackBox2C converts any trained scikit-learn model into a minimal if-else decision tree in your target language. The generated code has zero runtime dependencies, runs on any microcontroller with a C compiler, and fits in a few hundred bytes of FLASH.

How It Works

Surrogate extraction — A lightweight DecisionTree is trained to mimic any black-box model (Random Forest, SVM, MLP, etc.) by generating synthetic boundary samples and labeling them with the original model's predictions.
Rule optimization — Redundant branches are pruned and similar leaves are merged to minimize code size.
Code generation — The optimized tree is serialized as a pure if-else function in the target language.

Supported Models and Targets

Input models	Output formats
Any scikit-learn estimator with `predict()`	Pure C (C99)
Decision Tree, Random Forest, SVM, MLP...	C++11 (class + namespace)
Classification and Regression tasks	Arduino (`.h` with PROGMEM)
	MicroPython (`.py` module)

Installation

pip install blackbox2c

Requirements: Python 3.8+, NumPy >= 1.21, scikit-learn >= 1.0.

Tip: Use a virtual environment to keep your project isolated:

# python -m venv .venv && source .venv/bin/activate  # Linux/macOS
python -m venv .venv && .venv\Scripts\activate     # Windows
pip install blackbox2c

For development (from source):

git clone https://github.com/AxelSkrauba/BlackBox2C.git
cd BlackBox2C
pip install -e ".[dev]"

Quick Start

Classification

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from blackbox2c import convert

iris = load_iris()
model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(iris.data, iris.target)

# Convert to C (default target)
c_code = convert(
    model,
    iris.data,
    feature_names=list(iris.feature_names),
    class_names=list(iris.target_names),
    max_depth=5,
)
print(c_code)

Generated output:

/*
 * Auto-generated C code by BlackBox2C
 *   - Input features: 4
 *   - Output classes: 3
 *   - Precision: 8-bit
 */
#include <stdint.h>

#define setosa 0
#define versicolor 1
#define virginica 2

uint8_t predict(float features[4]) {
    if (features[2] <= 2.449999f) {
        return 0;
    } else {
        if (features[3] <= 1.750000f) {
            return 1;
        } else {
            return 2;
        }
    }
}

Export to Other Formats

# Arduino .ino file
arduino_code = convert(model, iris.data, target='arduino')

# C++ class
cpp_code = convert(model, iris.data, target='cpp')

# MicroPython module
mp_code = convert(model, iris.data, target='micropython')

Regression

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_diabetes
from blackbox2c import convert

data = load_diabetes()
model = GradientBoostingRegressor(random_state=42)
model.fit(data.data, data.target)

c_code = convert(model, data.data, max_depth=5)
# Generates: float predict(float features[10]) { ... }

Feature Analysis

from blackbox2c.analysis import FeatureSensitivityAnalyzer

analyzer = FeatureSensitivityAnalyzer(n_repeats=10, random_state=42)
results = analyzer.analyze(model, X_train, y_train, feature_names=feature_names)
print(results.summary())

# Get top 3 most important features by index
top3 = results.get_top_features(3)

Configuration

from blackbox2c import Converter, ConversionConfig

config = ConversionConfig(
    max_depth=5,             # Surrogate tree depth (1-10, default 5)
    optimize_rules='medium', # 'low' | 'medium' | 'high'
    use_fixed_point=False,   # Use integer arithmetic instead of float
    precision=8,             # Bit width for fixed-point: 8 | 16 | 32
    function_name='predict', # Name of the generated function
    n_samples=10000,         # Synthetic samples for surrogate training
    feature_threshold=None,  # Auto-select N most important features
    memory_budget_kb=None,   # Auto-tune params to fit a KB budget
)

converter = Converter(config)
code = converter.convert(model, X_train, target='arduino')
metrics = converter.get_metrics()
# {'fidelity': 0.97, 'complexity': {...}, 'size_estimate': {...}}

CLI

# Convert a pickled model to C
blackbox2c convert model.pkl X_train.npy -o output.c

# Export to Arduino
blackbox2c convert model.pkl X_train.npy -t arduino -o predict.h

# Analyze feature importance
blackbox2c analyze model.pkl X_train.npy --top-n 5

# Export a decision tree directly (no surrogate extraction)
blackbox2c export model.pkl -f cpp -o predictor.hpp

# Help
blackbox2c --help
blackbox2c convert --help

Benchmarks

python benchmarks/benchmark_classic_datasets.py --output results.md

Covers Iris, Wine, Diabetes, and California Housing with Decision Trees, Random Forests, SVMs, and Neural Networks. Metrics: fidelity, estimated FLASH size, tree depth, conversion time.

Note: Code size figures are estimates from BlackBox2C's built-in size estimator, not measurements on real hardware.

Project Structure

blackbox2c/
├── blackbox2c/
│   ├── __init__.py      # Public API: convert(), Converter, ConversionConfig
│   ├── converter.py     # Main orchestration pipeline
│   ├── config.py        # ConversionConfig dataclass
│   ├── surrogate.py     # Surrogate tree extraction
│   ├── codegen.py       # C code generation
│   ├── optimizer.py     # Rule pruning and merging
│   ├── exporters.py     # C++, Arduino, MicroPython exporters
│   ├── analysis.py      # Feature sensitivity analysis
│   └── cli.py           # Command-line interface
├── tests/               # 182 tests, >91% coverage
├── notebooks/           # Jupyter notebook examples (runnable on Colab)
├── benchmarks/          # Classic dataset benchmarks
├── examples/            # Script-based end-to-end examples
└── docs/                # MkDocs documentation source

Comparison with Alternatives

Feature	BlackBox2C	emlearn	MicroMLGen	TFLite Micro
Any sklearn model	✅	⚠️ Trees only	⚠️ Trees only	❌ TF only
Pure if-else output	✅	✅	✅	❌
C++ / Arduino / MicroPython	✅	⚠️ Partial	❌	⚠️ Partial
Feature selection built-in	✅	❌	❌	❌
Memory budget control	✅	❌	❌	⚠️
Zero runtime dependencies	✅	✅	✅	❌

Roadmap (v0.2)

Quine-McCluskey and BDD rule optimization
Hardware-validated benchmarks on real MCUs
Quantization-aware training integration

License

MIT — see LICENSE.

Contributing

Issues and PRs welcome at github.com/AxelSkrauba/BlackBox2C.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
blackbox2c		blackbox2c
docs		docs
examples		examples
notebooks		notebooks
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BlackBox2C

How It Works

Supported Models and Targets

Installation

Quick Start

Classification

Export to Other Formats

Regression

Feature Analysis

Configuration

CLI

Benchmarks

Project Structure

Comparison with Alternatives

Roadmap (v0.2)

License

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BlackBox2C

How It Works

Supported Models and Targets

Installation

Quick Start

Classification

Export to Other Formats

Regression

Feature Analysis

Configuration

CLI

Benchmarks

Project Structure

Comparison with Alternatives

Roadmap (v0.2)

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages