# Battery-included Design mocks for AIoD workflows

## Workflow 1: Model Retrieval

Users of AIoD should be able to get models from popular machine learning packages. AIoD will thus become a common interface for all the popular ML libraries that are indexed by it. 

### Workflow 1a: Retrieving classes

By using `aiod.get()`, users can directly import any class from any library that is indexed by AIoD. If the required soft deoendencies are present in the environment (e.g. `scikit-learn`, `xgboost`, `sktime`, `mlxtend`, `pytorch-tabular`, etc.), then the classes will be imported otherwise an error will be raised to let users know of the missing soft dependencies.

In [None]:
import aiod

RandomForestClassifier = aiod.get("RandomForestClassifier")
XGBClassifier = aiod.get("XGBClassifier")
LGBMClassifier = aiod.get("LGBMClassifier")
NaiveForecaster = aiod.get("NaiveForecaster")
EnsembleVoteClassifier = aiod.get("EnsembleVoteClassifier")
SimpleImputer = aiod.get("SimpleImputer")
OneHotEncoder = aiod.get("OneHotEncoder")

So in the above example,

`RandomForestClassifier = aiod.get("RandomForestClassifier")` would be same as `from sklearn.ensemble import RandomForestClassifier` and `print(type(RandomForestClassifier))` would return `<class 'type'>`

and so will the other examples. This will turn AIoD into an ML algorithms index.

### Workflow 1b: Retrieving instances

Besides classes, users should also be able to retrieve live instances of the class. These instances can be

* an estimator instance without any hyperparams

* an estimator instance with hyperparams

* a preprocessing step

* a pipeline

etc.

This would be useful in getting the exact instance used in an experiment. More on this in Workflow 2 and Workflow 3.

In [None]:
from aiod import craft

rf_classifier = craft("RandomForestClassifier(n_estimators=100)")
pipeline = craft("Pipeline(steps=[('imputer', SimpleImputer(strategy='mean')), ('classifier', RandomForestClassifier(n_estimators=100))])")

`print(type(rf_classifier))` would then return `<class 'sklearn.ensemble._forest.RandomForestClassifier'>`.

A user can now directly fit an instantiated object. In the below example, we will see how a user can use the `pipeline` built in the above example from a string specification using `craft`.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

pipeline.fit(X_train, y_train)
print(pipeline.predict(X_test))

Notice the difference between AIoD and HuggingFace from the above examples. We are dealing with classes and instances and not the model weights, but HuggingFace deals with model weights.

## Workflow 2: Model Catalogues

A catalogue is a curated collection of machine learning components. These components can be estimators, datasets, and metrics. For now, we will limit our scope to model (estimator) catalogues. But catalogues can be of mixed type too, representing an entire benchmark setup, more on this in Workflow 3.

Let's say there is a popular benchmarking paper from NeurIPS which compares different tabular classification models. A catalogue, then, allows to create a collection of all the models used in the paper with or without hyperparams as used in the paper, so that a user can get them all at once.

In [None]:
from aiod.catalogues import NeurIPS2026ClassificationCatalogue

catalogue = NeurIPS2026ClassificationCatalogue()

# returns a list of all estimators in the catalogue as strings
print(catalogue.get(object_type="all"))

# returns a list of all estimators in the catalogue as instantiated objects;
# passing `as_object=True` internally calls `craft` on each of the strings
# and instantiates them as estimator instances (see workflow 1b above)
print(catalogue.get(object_type="all", as_object=True))

## Workflow 3: Model Benchmarking

We will now see how Workflow 1 and Workflow 2 enable us to carry efficient benchmarking experiments using AIoD.

### Workflow 3a: Basic Benchmarking

Users should be able to import models from different machine learning libraries, register them with the benchmark, define one or more tasks (including dataset loaders, resampling strategies, and evaluation metrics), and then execute the benchmark with a single, consistent interface. The system should handle fitting, prediction, scoring, and timing automatically across all specified configurations.

Upon execution, the benchmark should return a structured dataframe containing the aggregated results of the experiment. This dataframe should summarize predictive performance (e.g., metric values per fold, means, and standard deviations) as well as computational statistics such as fit time, prediction time, and total runtime.

In [None]:
import aiod
from aiod.benchmarking import ClassificationBenchmark
from sklearn.datasets import load_iris
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score

RandomForestClassifier = aiod.get("RandomForestClassifier")
XGBClassifier = aiod.get("XGBClassifier")
LGBMClassifier = aiod.get("LGBMClassifier")

benchmark = ClassificationBenchmark()

benchmark.add(RandomForestClassifier(n_estimators=100))
benchmark.add(XGBClassifier(n_estimators=100))
benchmark.add(LGBMClassifier(n_estimators=100))

benchmark.add(load_iris(return_X_y=True))
benchmark.add(KFold(n_splits=5, shuffle=True, random_state=42))
benchmark.add(accuracy_score)

results = benchmark.run()

### Workflow 3b: Reproducing and Extending Experiments

In the above example, we added a bunch of estimators and a bunch of tasks and ran the benchmark. But a user should be able to add all the estimators from an existing experiment (e.g. a NeurIPS paper) at once, without writing the boilerplate code. The benchmark object should internally get the estimators from catalogues and add them to itself for execution.

In [None]:
from aiod.benchmarking import ClassificationBenchmark
from aiod.catalogues import NeurIPS2026ClassificationCatalogue

from sklearn.datasets import load_iris
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score


benchmark = ClassificationBenchmark()
catalogue = NeurIPS2026ClassificationCatalogue()

# adds all the estimators from the catalogue (reproduce the experiment)
benchmark.add(catalogue)

# add another estimaator (extend the experiment)
from sklearn.linear_model import LogisticRegression
benchmark.add(LogisticRegression())

benchmark.add(load_iris(return_X_y=True))
benchmark.add(KFold(n_splits=5, shuffle=True, random_state=42))
benchmark.add(accuracy_score)

benchmark.run()

In the above example, and in Workflow 2, we had a catalogue which contained just the estimators. But there can be catalogues of mixed object types as well, containing estimators, dataset loaders, metrics, and cv splitters. In that case we can directly add the catalogue to the benchmark, and the benchmark internally resolves the catalogue and identifies the estimators and tasks, adding them to itself and running the benchmark as demonstrated in the example below.

In [None]:
from aiod.benchmarking import ClassificationBenchmark
from aiod.catalogues import NeurIPS2026ClassificationCatalogueonSteroids

benchmark = ClassificationBenchmark()
catalogue = NeurIPS2026ClassificationCatalogueonSteroids()

# adds all the estimators from the catalogue (reproduce the experiment)
benchmark.add(catalogue)

# add another estimaator (extend the experiment)
from sklearn.linear_model import LogisticRegression
benchmark.add(LogisticRegression())

benchmark.run()

## Workflow 4: Getting Models from Scientific Papers/Projects

WIP