# Logging Asynchronously

The asynchronous ``rubicon_ml`` client offers a way to read and write ``rubicon_ml`` objects using Python's
built in ``asyncio`` module. ``rubicon_ml`` is lightweight computationally, but reading and writing to S3
takes time over the network. We can use ``asyncio`` to asynchronously communicate with S3 while executing
other work.

There are two main differences between the standard and asynchronous ``rubicon_ml`` clients:

* the asynchronous client is for **S3 logging only**
* the asynchronous client's functions **return coroutines** rather than their standard return values

In [1]:
from rubicon_ml.client.asynchronous import Rubicon


s3_bucket = "my-bucket"
root_dir = f"s3://{s3_bucket}/rubicon-root"

rubicon = Rubicon(persistence="filesystem", root_dir=root_dir)
project = await rubicon.get_or_create_project(
    "Asynchronous Experiments", description="training multiple models asynchronously"
)

project

<rubicon_ml.client.asynchronous.project.Project at 0x160303c40>

We'll take another look at the wine dataset in this example.

In [2]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split


wine = load_wine()
wine_feature_names = wine.feature_names
wine_datasets = train_test_split(
    wine["data"],
    wine["target"],
    test_size=0.25,
)

We can define an asynchronous ``run_experiment`` function to log a new **experiment** to
the provided **project** then train, run and log a model of type ``classifier_cls`` using
the training and testing data in ``wine_datasets``.

In [3]:
import asyncio
from collections import namedtuple

import pandas as pd


SklearnTrainingMetadata = namedtuple("SklearnTrainingMetadata", "module_name method")

async def run_experiment(
    project, classifier_cls, wine_datasets, feature_names, **kwargs
):
    X_train, X_test, y_train, y_test = wine_datasets
    
    # await logging the experiment so we can log other things to it
    experiment = await project.log_experiment(
        training_metadata=[
            SklearnTrainingMetadata("sklearn.datasets", "load_wine"),
        ],
        model_name=classifier_cls.__name__,
        tags=[classifier_cls.__name__],
    )
    
    # gather a list of coroutines that will log objects to the experiment
    rubicon_logging_coroutines = []
    
    for key, value in kwargs.items():
        parameter_coroutine = experiment.log_parameter(key, value)
        rubicon_logging_coroutines.append(parameter_coroutine)
    
    for name in feature_names:
        feature_coroutine = experiment.log_feature(name)
        rubicon_logging_coroutines.append(feature_coroutine)
        
    classifier = classifier_cls(**kwargs)
    classifier.fit(X_train, y_train)
    classifier.predict(X_test)
    
    accuracy = classifier.score(X_test, y_test)
    
    metric_coroutine = experiment.log_metric("accuracy", accuracy)
    rubicon_logging_coroutines.append(metric_coroutine)

    if accuracy >= .94:
        tag_coroutine = experiment.add_tags(["success"])
    else:
        tag_coroutine = experiment.add_tags(["failure"])
    rubicon_logging_coroutines.append(tag_coroutine)
    
    # execute all logging coroutines asynchronously
    await asyncio.gather(*rubicon_logging_coroutines)
    
    return experiment

This time we'll take a look at three classifiers - ``RandomForestClassifier``, ``DecisionTreeClassifier``, and
``KNeighborsClassifier`` - to see which performs best. Each classifier will be run across four sets of parameters
(provided as ``kwargs`` to ``run_experiment``), for a total of 12 experiments. Here, we'll build up a list of
coroutines that will run each experiment asynchronously.

In [4]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier


coroutines = []

for n_estimators in [10, 20, 30, 40]:
    coroutines.append(run_experiment(
        project,
        RandomForestClassifier,
        wine_datasets,
        wine_feature_names,
        n_estimators=n_estimators,
    ))

for n_neighbors in [5, 10, 15, 20]:
    coroutines.append(run_experiment(
        project,
        KNeighborsClassifier,
        wine_datasets,
        wine_feature_names,
        n_neighbors=n_neighbors,
    ))

for criterion in ["gini", "entropy"]:
    for splitter in ["best", "random"]:
        coroutines.append(run_experiment(
            project,
            DecisionTreeClassifier,
            wine_datasets,
            wine_feature_names,
            criterion=criterion,
            splitter=splitter,
        ))

Let's run all our experiments asynchronously!

In [5]:
experiments = await asyncio.gather(*coroutines)
experiments

[<rubicon_ml.client.asynchronous.experiment.Experiment at 0x16032a0a0>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x160303e80>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x162ddcee0>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x16032a6d0>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x162ddcc70>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x162df0e50>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x15fe86e50>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x163182400>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x162dffd00>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x162df0ca0>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x162df01c0>,
 <rubicon_ml.client.asynchronous.experiment.Experiment at 0x162df0820>]

Now we can validate that we successfully logged all 12 experiments to our project in S3.

In [6]:
len(await project.experiments())

12

Let's see which experiments we tagged as successful and what type of model they used.

In [7]:
for e in await project.experiments(tags=["success"]):    
    print(f"experiment {e.id[:8]} was successful using a {e.model_name}")

experiment 1deffe63 was successful using a RandomForestClassifier
experiment 301d0404 was successful using a RandomForestClassifier
experiment 32b14108 was successful using a DecisionTreeClassifier
experiment a9629c2c was successful using a RandomForestClassifier
experiment c2ec1246 was successful using a RandomForestClassifier
experiment e4372a68 was successful using a DecisionTreeClassifier


We can also take a deeper look at any of our experiments.

In [8]:
first_experiment = experiments[0]

training_metadata = SklearnTrainingMetadata(*first_experiment.training_metadata)
tags = await first_experiment.tags

parameters = [(p.name, p.value) for p in await first_experiment.parameters()]
metrics = [(m.name, m.value) for m in await first_experiment.metrics()]
    
print(
    f"experiment {first_experiment.id}\n"
    f"training metadata: {training_metadata}\ntags: {tags}\n"
    f"parameters: {parameters}\nmetrics: {metrics}"
)

experiment c2ec1246-a664-43b4-b76c-c34b9bdcb723
training metadata: SklearnTrainingMetadata(module_name='sklearn.datasets', method='load_wine')
tags: ['RandomForestClassifier', 'success']
parameters: [('n_estimators', 10)]
metrics: [('accuracy', 0.9777777777777777)]


Or we could grab the project's data as a dataframe!

In [9]:
ddf = await rubicon.get_project_as_dask_df("Asynchronous Experiments")
ddf.compute()

Unnamed: 0,id,name,description,model_name,commit_hash,created_at,tags,n_estimators,accuracy,criterion,splitter,n_neighbors
0,e4372a68-6094-42de-a3d6-cf7bca2cdafe,,,DecisionTreeClassifier,,2021-04-16 13:18:21.029869,"[DecisionTreeClassifier, success]",,0.955556,entropy,random,
1,9b88c307-7c20-48bc-8dca-ed0ea04efba9,,,DecisionTreeClassifier,,2021-04-16 13:18:21.028594,"[failure, DecisionTreeClassifier]",,0.888889,entropy,best,
2,32b14108-2fe1-4e75-9ba0-729193e8298f,,,DecisionTreeClassifier,,2021-04-16 13:18:21.027189,"[DecisionTreeClassifier, success]",,0.955556,gini,random,
3,974ed9b2-28dd-4a9f-b29c-4245daa70091,,,DecisionTreeClassifier,,2021-04-16 13:18:21.025501,"[failure, DecisionTreeClassifier]",,0.911111,gini,best,
4,6f15b3a2-8586-4cbc-8ac1-2a06ce9df6e9,,,KNeighborsClassifier,,2021-04-16 13:18:21.023694,"[failure, KNeighborsClassifier]",,0.711111,,,20.0
5,89aff6fd-d66b-405c-a5b1-663568da75b6,,,KNeighborsClassifier,,2021-04-16 13:18:21.021649,"[failure, KNeighborsClassifier]",,0.711111,,,15.0
6,56753d93-5f79-43b2-87a4-fbc1bbd5c1c0,,,KNeighborsClassifier,,2021-04-16 13:18:21.020056,"[failure, KNeighborsClassifier]",,0.733333,,,10.0
7,7177b63b-370c-4877-bbb3-5fd15078fbfb,,,KNeighborsClassifier,,2021-04-16 13:18:21.018690,"[failure, KNeighborsClassifier]",,0.755556,,,5.0
8,1deffe63-82ed-49cd-b75d-3aa67fc26911,,,RandomForestClassifier,,2021-04-16 13:18:21.017173,"[RandomForestClassifier, success]",40.0,0.955556,,,
9,a9629c2c-20b5-44e7-84ab-f859e9060c6b,,,RandomForestClassifier,,2021-04-16 13:18:21.015431,"[RandomForestClassifier, success]",30.0,0.977778,,,
