# Logging Concurrently

Once we've got the hang of using `rubicon`, we can expand on our project from the *Iris Classifier* example. 
Let's see how a few other popular `scikit-learn` models perform with the Iris dataset. `rubicon` logging is totally 
thread-safe, so we can test a lot of model configurations at once.

In [None]:
from rubicon import Rubicon

root_dir = "./rubicon-root"

rubicon = Rubicon(persistence="filesystem", root_dir=root_dir)
project = rubicon.create_project("Concurrent Experiments", description="Training multiple models in parallel")

print(project)

For a recap of the contents of the Iris dataset, check out `iris.DESCR` and `iris.data`. We'll put together
a training dataset using a subset of the data.

In [None]:
from datetime import datetime
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

random_state = int(datetime.utcnow().timestamp())

iris = load_iris()
iris_datasets = train_test_split(iris['data'], iris['target'], random_state=random_state)

We'll use `run_experiment` to log a new experiment to the provided `project` then train, run and log a model of type
`classifier_cls` using the training and testing data in `iris_datasets`.

In [None]:
import pandas as pd
from collections import namedtuple

SklearnTrainingMetadata = namedtuple("SklearnTrainingMetadata", "module_name method")

def run_experiment(project, classifier_cls, iris_datasets, **kwargs):
    X_train, X_test, y_train, y_test = iris_datasets
    
    experiment = project.log_experiment(
        training_metadata=[SklearnTrainingMetadata("sklearn.datasets", "load_iris")],
        model_name=classifier_cls.__name__,
        tags=[classifier_cls.__name__],
    )
    
    for key, value in kwargs.items():
        experiment.log_parameter(key, value)
    
    n_features = len(iris.feature_names)
    experiment.log_parameter("n_features", n_features)
    
    for name in iris.feature_names:
        experiment.log_feature(name)
        
    classifier = classifier_cls(**kwargs)
    classifier.fit(X_train, y_train)
    classifier.predict(X_test)
    
    accuracy = classifier.score(X_test, y_test)
    
    experiment.log_metric("accuracy", accuracy)

    if accuracy >= .95:
        experiment.add_tags(["success"])
    else:
        experiment.add_tags(["failure"])

This time we'll take a look at two more classifiers, **decision tree** and **k-neighbors**, in addition to the **random forest** classifier we used in the last example. Each classifier will be ran across four sets of parameters (provided as `kwargs` to `run_experiment`), for a total of 12 experiments. Here, we'll build up a list of processes that will run each experiment in parallel.

In [None]:
import multiprocessing
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

processes = []

for n_estimators in [10, 20, 30, 40]:
    processes.append(multiprocessing.Process(
        target=run_experiment, args=[project, RandomForestClassifier, iris_datasets],
        kwargs={"n_estimators": n_estimators, "random_state": random_state},
    ))
    
for criterion in ["gini", "entropy"]:
    for splitter in ["best", "random"]:
        processes.append(multiprocessing.Process(
            target=run_experiment, args=[project, DecisionTreeClassifier, iris_datasets],
            kwargs={"criterion": criterion, "splitter": splitter, "random_state": random_state},
        ))

for n_neighbors in [5, 10, 15, 20]:
    processes.append(multiprocessing.Process(
        target=run_experiment, args=[project, KNeighborsClassifier, iris_datasets],
        kwargs={"n_neighbors": n_neighbors},
    ))

Let's run all our experiments in parallel!

In [None]:
for process in processes:
    process.start()
    
for process in processes:
    process.join()

Now we can validate that we successfully logged all 12 experiments to our project.

In [None]:
len(project.experiments())

Let's see which experiments we tagged as successful and what type of model they used.

In [None]:
for e in project.experiments(tags=["success"]):    
    print(f"experiment {e.id} was successful using a {e.model_name}")

We can also take a deeper look at any of our experiments.

In [None]:
experiment = project.experiments()[0]

print(f"training_metadata: {SklearnTrainingMetadata(*experiment.training_metadata)}")
print(f"tags: {experiment.tags}")
print("parameters:")
for parameter in experiment.parameters():
    print(f"\t{parameter.name}: {parameter.value}")
print("features:")
for feature in experiment.features():
    print(f"\t{feature.name}")
print("metrics:")
for metric in experiment.metrics():
    print(f"\t{metric.name}: {metric.value}")

Model developers can take advantage of `rubicon`'s thread-safety to test tons of models at once and collect results
in a standardized format to analyze which ones performed the best. `rubicon` can even be run in more advanced
distributed setups, like on a *Dask* cluster.