# Design Mockups for AIoD Workflows

## Workflow 1: Model Retrieval

Users of AIoD should be able to get models from popular machine learning packages. AIoD will thus become a common interface for all the popular ML libraries that are indexed by it. 

### Workflow 1a: Retrieving classes

By using `aiod.get()`, users can directly import any class from any library that is indexed by AIoD. If the required soft deoendencies are present in the environment (e.g. `scikit-learn`, `xgboost`, `sktime`, `mlxtend`, `pytorch-tabular`, etc.), then the classes will be imported otherwise an error will be raised to let users know of the missing soft dependencies.

In [None]:
import aiod

RandomForestClassifier = aiod.get("RandomForestClassifier")
XGBClassifier = aiod.get("XGBClassifier")
LGBMClassifier = aiod.get("LGBMClassifier")
NaiveForecaster = aiod.get("NaiveForecaster")
EnsembleVoteClassifier = aiod.get("EnsembleVoteClassifier")
SimpleImputer = aiod.get("SimpleImputer")
OneHotEncoder = aiod.get("OneHotEncoder")

So in the above example,

`RandomForestClassifier = aiod.get("RandomForestClassifier")` would be same as `from sklearn.ensemble import RandomForestClassifier` and `print(type(RandomForestClassifier))` would return `<class 'type'>`

and so will the other examples. This will turn AIoD into an ML algorithms index.

### Workflow 1b: Retrieving instances

Besides classes, users should also be able to retrieve live instances of the class. These instances can be

* an estimator instance without any hyperparams

* an estimator instance with hyperparams

* a preprocessing step

* a pipeline

etc.

This would be useful in getting the exact instance used in an experiment. More on this in Workflow 2 and Workflow 3.

In [None]:
import aiod

rf_classifier = aiod.get("RandomForestClassifier(n_estimators=100)")
pipeline = aiod.get("Pipeline(steps=[('imputer', SimpleImputer(strategy='mean')), ('classifier', RandomForestClassifier(n_estimators=100))])")

`print(type(rf_classifier))` would then return `<class 'sklearn.ensemble._forest.RandomForestClassifier'>`.

A user can now directly fit an instantiated object. In the below example, we will see how a user can use the `pipeline` built in the above example from a string specification using `craft`.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

pipeline.fit(X_train, y_train)
print(pipeline.predict(X_test))

Notice the difference between AIoD and HuggingFace from the above examples. We are dealing with classes and instances and not the model weights, but HuggingFace deals with model weights.

### Workflow 1c: Executable specifications

Beyond simple class lookup and instance construction, users should be able to use `aiod.get()` to construct fully executable multi-line specifications. These specifications may define intermediate variables and must end with a return statement indicating the object to be constructed.

In [None]:
import aiod

spec = """
pipe = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="mean")),
    ("scaler", StandardScaler()),
    ("classifier", RandomForestClassifier(n_estimators=100))])
cv = KFold(n_splits=5, shuffle=True, random_state=42)

return GridSearchCV(
    estimator=pipe,
    param_grid=[{
        "classifier__max_depth": [5, 10],
        "classifier__min_samples_split": [2, 5],
    },
    ],
    cv=cv,
    )
"""

print(aiod.get(spec))

Output:

In [None]:
GridSearchCV(cv=KFold(n_splits=5, random_state=42, shuffle=True),
             estimator=Pipeline(steps=[('imputer', SimpleImputer()),
                                       ('scaler', StandardScaler()),
                                       ('classifier',
                                        RandomForestClassifier())]),
             param_grid=[{'classifier__max_depth': [5, 10],
                          'classifier__min_samples_split': [2, 5]}])

## Workflow 2: Model Catalogues

A catalogue is a curated collection of machine learning components. These components can be estimators, datasets, and metrics. For now, we will limit our scope to model (estimator) catalogues. But catalogues can be of mixed type too, representing an entire benchmark setup, more on this in Workflow 3.

Let's say there is a popular benchmarking paper from NeurIPS which compares different tabular classification models. A catalogue, then, allows to create a collection of all the models used in the paper with or without hyperparams as used in the paper, so that a user can get them all at once. For this below example, we will assume that this catalogue contains three classifiers.

Returns a list of all estimators in the catalogue as strings:

In [None]:
catalogue = aiod.get("NeurIPS2026ClassificationCatalogue()")

print(catalogue.fetch(object_type="all"))

Output:

In [None]:
[
    "RandomForestClassifier(n_estimators=100)", 
    "XGBClassifier(n_estimators=100)", 
    "LGBMClassifier(n_estimators=100)",
]

Returns a list of all estimators in the catalogue as instantiated objects; passing `as_object=True` internally calls `craft` on each of the strings and instantiates them as estimator instances (see workflow 1b above):

In [None]:
print(catalogue.fetch(object_type="all", as_object=True))

Output:

In [None]:
[
    RandomForestClassifier(n_estimators=100), 
    XGBClassifier(n_estimators=100), 
    LGBMClassifier(n_estimators=100),
]

## Workflow 3: Model Benchmarking

We will now see how Workflow 1 and Workflow 2 enable us to carry efficient benchmarking experiments using AIoD.

### Workflow 3a: Basic Benchmarking

Users should be able to register estimator instances with the benchmark, define one or more tasks (including dataset loaders, resampling strategies, and evaluation metrics), and then execute the benchmark with a single, consistent interface. The system should handle fitting, prediction, scoring, and timing automatically across all specified configurations.

Upon execution, the benchmark should return a structured dataframe containing the aggregated results of the experiment as a leaderboard.

In [None]:
from aiod.benchmarking import ClassificationBenchmark

benchmark = ClassificationBenchmark()

benchmark.add("RandomForestClassifier(n_estimators=100)")
benchmark.add("XGBClassifier(n_estimators=100)")
benchmark.add("LGBMClassifier(n_estimators=100)")

benchmark.add("load_iris(return_X_y=True)")
benchmark.add("KFold(n_splits=2, shuffle=True, random_state=42)")
benchmark.add("accuracy_score")

results = benchmark.run()

Output:

| Model                  | Organization/Library | Accuracy | Accuracy Rank |
|------------------------|--------------|----------|---------------|
| RandomForestClassifier | scikit-learn   | 0.9733   | 1             |
| XGBClassifier          | xgboost | 0.9533   | 2             |
| LGBMClassifier         | lightgbm    | 0.9467   | 3             |

### Workflow 3b: Reproducing and Extending Experiments

In the above example, we added a bunch of estimators and a bunch of tasks and ran the benchmark. But a user should be able to add all the estimators from an existing experiment (e.g. a NeurIPS paper) at once, without writing the boilerplate code. The benchmark object should internally get the estimators from catalogues and add them to itself for execution. Users should also be able to extend the benchmark experiment by adding estimators besides what are contained in a catalogue.

In [None]:
import aiod
from aiod.benchmarking import ClassificationBenchmark

benchmark = ClassificationBenchmark()
catalogue = aiod.get("NeurIPS2026ClassificationCatalogue()")

# adds all the estimators from the catalogue (reproduce the experiment)
benchmark.add(catalogue)

# add another estimaator (extend the experiment)
benchmark.add("LogisticRegression()")

# add tasks
benchmark.add("load_iris(return_X_y=True)")
benchmark.add("KFold(n_splits=2, shuffle=True, random_state=42)")
benchmark.add("accuracy_score")

benchmark.run()

Output:

| Model                  | Organization/Library | Accuracy | Accuracy Rank |
|------------------------|--------------|----------|---------------|
| RandomForestClassifier | scikit-learn   | 0.9733   | 1             |
| XGBClassifier          | xgboost | 0.9533   | 2             |
| LGBMClassifier         | lightgbm    | 0.9467   | 3             |
| LogisticRegression     | scikit-learn | 0.9343 | 4 |

Notice that we also have a fourth column now for `LogisticRegression`, that we added on top of the estimators from the catalogues, so we can see how _our_ added estimator/algorithm performs as compared to the algorithms in a given catalogue (or e.g. a NeurIPS paper)

In the above example, and in Workflow 2, we had a catalogue which contained just the estimators. But there can be catalogues of mixed object types as well, containing estimators, dataset loaders, metrics, and cv splitters. In that case we can directly add the catalogue to the benchmark, and the benchmark internally resolves the catalogue and identifies the estimators and tasks, adding them to itself and running the benchmark as demonstrated in the example below. Let's assume in the below example, `NeurIPSClassificationCatalogueonSteroids` is a catalogue of mixed object type and contains estimators, dataset loaders, metrics, and cv splitters.

In [None]:
import aiod
from aiod.benchmarking import ClassificationBenchmark

benchmark = ClassificationBenchmark()

# adds all the estimators from the catalogue (reproduce the experiment)
benchmark.add("NeurIPS2026ClassificationCatalogueonSteroids()")

# add another estimaator (extend the experiment)
benchmark.add("LogisticRegression()")

benchmark.run()

Output:

| Model                  | Organization/Library | Accuracy | Accuracy Rank |
|------------------------|--------------|----------|---------------|
| RandomForestClassifier | scikit-learn   | 0.9733   | 1             |
| XGBClassifier          | xgboost | 0.9533   | 2             |
| LGBMClassifier         | lightgbm    | 0.9467   | 3             |
| LogisticRegression     | scikit-learn | 0.9343 | 4 |

Since in `NeurIPSClassificationCatalogueonSteroids`, the tasks were also included in the catalogue, we did not have to add them seperately to the benchmark object in the above example. It got added automatically via the catalogue, and results in the same result dataframe as the example before where we added tasks on top of adding the estimator catalogue.

## Workflow 4: Getting Models from Scientific Papers/Projects

WIP