Convert scikit-learn models to native embedded code — C, C++, Arduino, MicroPython
BlackBox2C converts any trained scikit-learn model into a minimal if-else decision tree in your target language. The generated code has zero runtime dependencies, runs on any microcontroller with a C compiler, and fits in a few hundred bytes of FLASH.
- Surrogate extraction — A lightweight
DecisionTreeis trained to mimic any black-box model (Random Forest, SVM, MLP, etc.) by generating synthetic boundary samples and labeling them with the original model's predictions. - Rule optimization — Redundant branches are pruned and similar leaves are merged to minimize code size.
- Code generation — The optimized tree is serialized as a pure if-else function in the target language.
| Input models | Output formats |
|---|---|
Any scikit-learn estimator with predict() |
Pure C (C99) |
| Decision Tree, Random Forest, SVM, MLP... | C++11 (class + namespace) |
| Classification and Regression tasks | Arduino (.h with PROGMEM) |
MicroPython (.py module) |
pip install blackbox2cRequirements: Python 3.8+, NumPy >= 1.21, scikit-learn >= 1.0.
Tip: Use a virtual environment to keep your project isolated:
# python -m venv .venv && source .venv/bin/activate # Linux/macOS
python -m venv .venv && .venv\Scripts\activate # Windows
pip install blackbox2cFor development (from source):
git clone https://github.com/AxelSkrauba/BlackBox2C.git
cd BlackBox2C
pip install -e ".[dev]"from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from blackbox2c import convert
iris = load_iris()
model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(iris.data, iris.target)
# Convert to C (default target)
c_code = convert(
model,
iris.data,
feature_names=list(iris.feature_names),
class_names=list(iris.target_names),
max_depth=5,
)
print(c_code)Generated output:
/*
* Auto-generated C code by BlackBox2C
* - Input features: 4
* - Output classes: 3
* - Precision: 8-bit
*/
#include <stdint.h>
#define setosa 0
#define versicolor 1
#define virginica 2
uint8_t predict(float features[4]) {
if (features[2] <= 2.449999f) {
return 0;
} else {
if (features[3] <= 1.750000f) {
return 1;
} else {
return 2;
}
}
}# Arduino .ino file
arduino_code = convert(model, iris.data, target='arduino')
# C++ class
cpp_code = convert(model, iris.data, target='cpp')
# MicroPython module
mp_code = convert(model, iris.data, target='micropython')from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_diabetes
from blackbox2c import convert
data = load_diabetes()
model = GradientBoostingRegressor(random_state=42)
model.fit(data.data, data.target)
c_code = convert(model, data.data, max_depth=5)
# Generates: float predict(float features[10]) { ... }from blackbox2c.analysis import FeatureSensitivityAnalyzer
analyzer = FeatureSensitivityAnalyzer(n_repeats=10, random_state=42)
results = analyzer.analyze(model, X_train, y_train, feature_names=feature_names)
print(results.summary())
# Get top 3 most important features by index
top3 = results.get_top_features(3)from blackbox2c import Converter, ConversionConfig
config = ConversionConfig(
max_depth=5, # Surrogate tree depth (1-10, default 5)
optimize_rules='medium', # 'low' | 'medium' | 'high'
use_fixed_point=False, # Use integer arithmetic instead of float
precision=8, # Bit width for fixed-point: 8 | 16 | 32
function_name='predict', # Name of the generated function
n_samples=10000, # Synthetic samples for surrogate training
feature_threshold=None, # Auto-select N most important features
memory_budget_kb=None, # Auto-tune params to fit a KB budget
)
converter = Converter(config)
code = converter.convert(model, X_train, target='arduino')
metrics = converter.get_metrics()
# {'fidelity': 0.97, 'complexity': {...}, 'size_estimate': {...}}# Convert a pickled model to C
blackbox2c convert model.pkl X_train.npy -o output.c
# Export to Arduino
blackbox2c convert model.pkl X_train.npy -t arduino -o predict.h
# Analyze feature importance
blackbox2c analyze model.pkl X_train.npy --top-n 5
# Export a decision tree directly (no surrogate extraction)
blackbox2c export model.pkl -f cpp -o predictor.hpp
# Help
blackbox2c --help
blackbox2c convert --helppython benchmarks/benchmark_classic_datasets.py --output results.mdCovers Iris, Wine, Diabetes, and California Housing with Decision Trees, Random Forests, SVMs, and Neural Networks. Metrics: fidelity, estimated FLASH size, tree depth, conversion time.
Note: Code size figures are estimates from BlackBox2C's built-in size estimator, not measurements on real hardware.
blackbox2c/
├── blackbox2c/
│ ├── __init__.py # Public API: convert(), Converter, ConversionConfig
│ ├── converter.py # Main orchestration pipeline
│ ├── config.py # ConversionConfig dataclass
│ ├── surrogate.py # Surrogate tree extraction
│ ├── codegen.py # C code generation
│ ├── optimizer.py # Rule pruning and merging
│ ├── exporters.py # C++, Arduino, MicroPython exporters
│ ├── analysis.py # Feature sensitivity analysis
│ └── cli.py # Command-line interface
├── tests/ # 182 tests, >91% coverage
├── notebooks/ # Jupyter notebook examples (runnable on Colab)
├── benchmarks/ # Classic dataset benchmarks
├── examples/ # Script-based end-to-end examples
└── docs/ # MkDocs documentation source
| Feature | BlackBox2C | emlearn | MicroMLGen | TFLite Micro |
|---|---|---|---|---|
| Any sklearn model | ✅ | ❌ TF only | ||
| Pure if-else output | ✅ | ✅ | ✅ | ❌ |
| C++ / Arduino / MicroPython | ✅ | ❌ | ||
| Feature selection built-in | ✅ | ❌ | ❌ | ❌ |
| Memory budget control | ✅ | ❌ | ❌ | |
| Zero runtime dependencies | ✅ | ✅ | ✅ | ❌ |
- Quine-McCluskey and BDD rule optimization
- Hardware-validated benchmarks on real MCUs
- Quantization-aware training integration
MIT — see LICENSE.
Issues and PRs welcome at github.com/AxelSkrauba/BlackBox2C.