# Sharing Experiments

In the [first part](https://capitalone.github.io/rubicon-ml/quick-look/logging-experiments.html)
of the quick look, we learned how to log ``rubicon_ml`` experiments in the context of a
simple classification problem. We also performed a small hyperparameter search to show how ``rubicon_ml``
can be used to compare the results of multiple model fits.

Inspecting our model fit results in the same session that we trained the model is certainly useful, but
sharing experiments can help us collaborate with teammates and compare new model training results to old
experiments.

First, we'll create a ``rubicon_ml`` entry point and get the project we logged in the first part of the
quick look.

In [1]:
from rubicon_ml import Rubicon

rubicon = Rubicon(persistence="filesystem", root_dir="./rubicon-root")
project = rubicon.get_project(name="classifying penguins")
project

<rubicon_ml.client.project.Project at 0x1674333d0>

Let's say we want to share the results of our hyperparmeter search with another teammate so they can evaluate the results.
``rubicon_ml``'s ``publish`` function takes a list of experiments as an input and  uses ``intake`` to generate a catalog
containing the bare-minimum amount of metadata needed to retrieve an experiment, like its ID and filepath. More on ``intake``
can be found [in their docs](https://intake.readthedocs.io/en/latest/).

Hyperparameter searches can span thousands of combos, so sharing every single file ``rubicon_ml`` logs during the training
process can be a lot. That's why we use ``intake`` via our ``publish`` function to only share what needs to be shared in a
single YAML file. Then, later, users can use said YAML file to retrieve the experiments shared within it.

**Note**: Sharing experiments relys on both the sharer and the recipient having access to the same underlying data source.
In this example, we're using a local filesystem - so these experiments couldn't actually be shared with anyone other than 
people on this same physical machine. To get the most out of sharing, log your experiments to an S3 bucket that all teammates
have access to.

In [2]:
from rubicon_ml import publish

publish(
    project.experiments(tags=["parameter search"]),
    output_filepath="./penguin_catalog.yml",
)

!head -7 penguin_catalog.yml

sources:
  experiment_02a89318_b8d9_49a5_9337_7e4368cc54da:
    args:
      experiment_id: 02a89318-b8d9-49a5-9337-7e4368cc54da
      project_name: classifying penguins
      urlpath: ./rubicon-root
    driver: rubicon_ml_experiment


Each catalog contains a "source" for each ``rubicon_ml`` experiment. These sources contain the minimum metadata needed
to retrieve the associated experiment - the ``experiment_id``, ``project_name`` and ``urlpath`` to the root of the 
``rubicon_ml`` repository used as an entry point. The ``rubicon_ml_experiment`` driver can be found 
[within our library](https://github.com/capitalone/rubicon-ml/blob/main/rubicon_ml/intake_rubicon/experiment.py)
and leverages the metadata in the YAML catalog to return the experiment objects associated to it.

Provided the recipient of the shared YAML catalog has read access to the filesystem represented by ``urlpath``,
they can now use ``intake`` directly to read the catalog and load in the shared ``rubicon_ml`` expierments
for their own inspection.

In [3]:
import intake

catalog = intake.open_catalog("./penguin_catalog.yml")

for source in catalog:
    catalog[source].discover()
    
shared_experiments = [catalog[source].read() for source in catalog]

print("shared experiments:")
for experiment in shared_experiments:
    print(
        f"\tid: {experiment.id}, "
        f"parameters: {[(p.name, p.value) for p in experiment.parameters()]}, "
        f"metrics: {[(m.name, m.value) for m in experiment.metrics()]}"
    )

shared experiments:
	id: 02a89318-b8d9-49a5-9337-7e4368cc54da, parameters: [('strategy', 'mean'), ('n_neighbors', 10)], metrics: [('accuracy', 0.75)]
	id: 093a9d02-89f7-4e48-82b1-f9ade435ef03, parameters: [('strategy', 'mean'), ('n_neighbors', 20)], metrics: [('accuracy', 0.7211538461538461)]
	id: 9d6ffe67-088d-483f-9d3f-8f0fb34c22e8, parameters: [('strategy', 'median'), ('n_neighbors', 15)], metrics: [('accuracy', 0.7596153846153846)]
	id: a75b1258-2276-4eb1-beb5-caf83e9aacf3, parameters: [('strategy', 'mean'), ('n_neighbors', 5)], metrics: [('accuracy', 0.7307692307692307)]
	id: b2cd8067-ad4c-4ed5-87f7-2cd4536b2c73, parameters: [('strategy', 'most_frequent'), ('n_neighbors', 5)], metrics: [('accuracy', 0.7211538461538461)]
	id: bc4d0503-32d1-4a11-8222-4151dae893cf, parameters: [('strategy', 'median'), ('n_neighbors', 5)], metrics: [('accuracy', 0.7211538461538461)]
	id: c1b6cb3a-0ad1-4932-914d-ba53a054891b, parameters: [('strategy', 'median'), ('n_neighbors', 10)], metrics: [('accura