MELITE is a pre-stable Python toolkit for tabular classification benchmarking, model selection, repeated stratified cross-validation, final model export, and artifact-based inference.
MELITE is tabular at the modeling level. The learning algorithms consume
numeric X and y arrays, so the feature matrix may come from PCA, UMAP,
fingerprints, descriptors, clinical variables, experimental measurements,
industrial features, or manually selected numeric features.
Project: MELITE
PyPI distribution: melite
Import package: melite
CLI: melite
Version: 0.2.2
License: LGPL-3.0-or-later
Status: alpha / pre-stable
The live documentation is published at:
https://nanobiostructuresrg.github.io/melite/
Key pages:
After PyPI publication:
python -m pip install meliteFor local development:
git clone https://github.com/NanoBiostructuresRG/melite.git
cd melite
python -m pip install -e .For development and documentation tools:
python -m pip install -e ".[dev]"
python -m pip install -e ".[docs]"Run a fast smoke benchmark with the bundled synthetic example dataset:
melite run --smoke --config examples/example_config.tomlExport a selected model artifact:
melite export --row 0 --csv examples/output/results.csv --outdir examples/output/Run artifact-based inference:
import numpy as np
from melite import predict
X_new = np.load("examples/sample_PCA70.npz")["X"]
result = predict("examples/output/Model_SVC_sample_pca70.pkl", X_new)
print(result["predictions"])
print(result["probabilities"])| MELITE does | MELITE does not |
|---|---|
Accept prepared X and y arrays. |
Generate fingerprints. |
| Benchmark SVC, Random Forest, and XGBoost classifiers. | Process SMILES. |
| Select the best row by F1-macro. | Generate PCA or UMAP reductions from raw data. |
Export a final retrained .pkl model. |
Act as a general AutoML framework. |
Run artifact-based inference through predict(). |
Promise a stable 1.0 API yet. |
| Handle any numeric tabular matrix. | Generate or validate domain-specific descriptors. |
Datasets are registered as concrete tabular matrix candidates under
[datasets.<dataset_id>]. The dataset_id is user-defined and is used in
results.csv, figures, and exported model filenames.
[datasets.morgan_r2_2048]
path = "data/morgan_r2_2048.npz"
label_path = "raw/labels.npy"
family = "fingerprints"
method = "Morgan"
variant = "r2_2048"
[datasets.rdkit_descriptors]
path = "data/rdkit_descriptors.npz"
label_path = "raw/labels.npy"
family = "descriptors"
method = "RDKit"
[datasets.pca85]
path = "data/PCA85.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "PCA"
level = 85Each registered dataset must define path and label_path. Optional metadata
fields are family, method, variant, level, and description; they are
reported for traceability and do not drive special-case model execution.
Registered datasets are loaded strictly: missing files, missing X, non-2D or
non-numeric X, length mismatches, and embedded y mismatches fail the run.
Legacy [benchmark].reduction_types and levels configs are still accepted
and are normalized into equivalent dataset entries such as PCA70 and UMAP90.
Model families are controlled by [models].active:
[models]
active = ["svc", "rf", "xgb"]Remove a key to skip that family during training. Valid keys are svc, rf,
and xgb.
SVC is trained and exported as a StandardScaler -> SVC sklearn pipeline.
Random Forest and XGBoost are trained as unscaled estimators.
melite --help
melite run --help
melite export --help
melite --versionCommon commands:
melite run
melite run --smoke
melite run --config my_config.toml
melite export --row 0
melite export --config my_config.toml --row 0
melite export --row 0 --forcefrom melite import Config
from melite import load_datasets
from melite import plot_cv_distributions
from melite import predict
from melite import __version__Modules not listed above are importable directly but are not part of the public contract and may change before 1.0.
raw/labels.npy <- target vector y, shape (n_samples,)
data/morgan_r2_2048.npz <- required key: X, optional key: y
data/rdkit_descriptors.npz
data/PCA85.npz
data/UMAP90.npz
Each .npz file must contain an X array. If an embedded y array is present,
MELITE validates it against the configured label_path.
output/
|-- results.txt
|-- results.csv
|-- Model_<model>_<dataset>.pkl
`-- figures/
`-- <model>_<dataset>.png
Local inputs and generated artifacts such as raw/, data/, output/,
.pkl, and .joblib files are intentionally ignored by Git.
The current dev/v0.2.2 branch targets:
python -m pytest tests/ -v --basetemp=.review_pytest_tmp -o cache_dir=.review_pytest_cache
mkdocs build --strict
python -m build --no-isolation
python -m twine check dist/*
python scripts/smoke_install_wheel.py
melite --help
melite run --help
melite export --help
melite --versionIf you use MELITE in your research, please cite it using the metadata in CITATION.cff.
Contreras-Torres, F. F., & Murrieta, A. C. (2026). MELITE: Multi-model Evaluation and Learning for Inference-ready Tabular Experiments. Zenodo. https://doi.org/10.5281/zenodo.20382752
Developed by Flavio F. Contreras-Torres. Tecnologico de Monterrey
Co-author: Ana C. Murrieta. Tecnologico de Monterrey
This project is licensed under the terms of the GNU Lesser General Public License v3.0 or later.
SPDX identifier: LGPL-3.0-or-later