# Query ``rubcion-ml`` experiments with ``RubiconJSON``

We can utilize the ``RubiconJSON`` class to query ``rubicon-ml`` logs in a JSONPath-like manner.
``RubiconJSON`` takes in top-level ``Rubicon`` objects, ``Projects``, and/or ``Experiments`` and
will generate a JSON representation of them. We can then leverage the `search` method for JSONPath-like
querying [based on ``jsonpath_ng``](https://github.com/h2non/jsonpath-ng).

### Run some experiments

In [1]:
import random

random.seed(21)

In [2]:
import pandas as pd
from rubicon_ml import Rubicon

class MODEL: pass
NUM_EXPERIMENTS = 4

rubicon = Rubicon(persistence="memory")
project = rubicon.get_or_create_project(name="jsonpath")

for accuracy in [random.random() for _ in range(NUM_EXPERIMENTS)]:
    tags = [random.choice(["a", "b", "c"])]
    ex = project.log_experiment(tags=tags)

    for feature in ["var_001", "var_002", "var_003", "var_004"]:
        ex.log_feature(name=feature)

    for parameter in [("alpha", 1e-3), ("gamma", 1e-5), ("n_iter", 1000)]:
        name, value = parameter
        ex.log_parameter(name=name, value=value)

    ex.log_metric(name="accuracy", value=accuracy)
    ex.log_artifact(name="model", data_object=MODEL())
    ex.log_dataframe(pd.DataFrame([[1.0, 0.0], [0.0, 1.0]]))

project.log_artifact(name="best model", data_object=MODEL())
project.log_dataframe(pd.DataFrame([[1.0, 0.0], [0.0, 1.0]]))

project

<rubicon_ml.client.project.Project at 0x15d37c890>

### Load experiments into the ``RubiconJSON`` class

Once instantiated, the ``RubiconJSON`` class has a ``json`` property detailing each project
and experiment.

In [3]:
from rubicon_ml import RubiconJSON

pr_json = RubiconJSON(projects=[project])
pr_json.json

{'project': [{'name': 'jsonpath',
   'id': '3e7a35da-e84a-410c-aa6a-f732298570af',
   'description': None,
   'github_url': None,
   'training_metadata': None,
   'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 170946),
   'artifact': [{'name': 'best model',
     'id': '629ed53e-2e0a-436c-b0f1-72d3ada3ad85',
     'description': None,
     'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 179957),
     'tags': [],
     'parent_id': '3e7a35da-e84a-410c-aa6a-f732298570af'}],
   'dataframe': [{'id': '8273b062-e2b2-4c3b-8aa8-45120f3a1245',
     'name': None,
     'description': None,
     'tags': [],
     'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 180381),
     'parent_id': '3e7a35da-e84a-410c-aa6a-f732298570af'}],
   'experiment': [{'project_name': 'jsonpath',
     'id': 'cdd3ff65-f366-4e49-90ca-5d17d78908d4',
     'name': None,
     'description': None,
     'model_name': None,
     'branch_name': None,
     'commit_hash': None,
     'training_metadata': None,
 

### Query experiments with ``RubiconJSON.search``

We'll start by getting all the metrics from each experiment:

In [4]:
res = pr_json.search("$..experiment[*].metric")

print(f"{len(res)} experiments")
for match in res:
    print(f"{len(match.value)} metric")
    print(match.value)

4 experiments
1 metric
[{'name': 'accuracy', 'value': 0.16494947983319797, 'id': 'be4cef71-bd01-4357-9cad-3cb49c2e8f98', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 172671), 'tags': []}]
1 metric
[{'name': 'accuracy', 'value': 0.6897669242175674, 'id': 'a9a72180-b47c-435e-ae9b-1f92ccdb7155', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 174990), 'tags': []}]
1 metric
[{'name': 'accuracy', 'value': 0.6349999404047206, 'id': '712a7a11-3f44-4116-bd0f-e485ddc07bab', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 176956), 'tags': []}]
1 metric
[{'name': 'accuracy', 'value': 0.4791004720574512, 'id': 'cf1bd8e9-e0a3-43d4-a1fc-6d7c6c19d2e1', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 178914), 'tags': []}]


Now let's get each experiments with a tag "b".

In [5]:
res = pr_json.search("$..experiment[?(@.tags[*]=='b')]")

print(f"{len(res)} experiment")
for match in res:
    print(match.value)

1 experiment
{'project_name': 'jsonpath', 'id': '5d9a639c-ddd5-419a-81a6-90c3afe43cf1', 'name': None, 'description': None, 'model_name': None, 'branch_name': None, 'commit_hash': None, 'training_metadata': None, 'tags': ['b'], 'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 173879), 'feature': [{'name': 'var_001', 'id': '5faeda4d-5770-4a9b-aaee-f84604d5e0f1', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 174008)}, {'name': 'var_002', 'id': '49cfb6e6-1c6e-445a-9820-141484e74128', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 174178)}, {'name': 'var_003', 'id': 'ec3be848-590d-44f7-9bc7-128cde25d1bf', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 174311)}, {'name': 'var_004', 'id': '5dea22c7-cf36-4a77-825e-85d952711298', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime

Use the "?" operator to get metrics named "accuracy" with a value greater than 0.5 from
each experiment:

In [6]:
res = pr_json.search("$..experiment[*].metric[?(@.name=='accuracy' & @.value>=0.5)]")

print(f"{len(res)} metrics")
for match in res:
    print(match.value)

2 metrics
{'name': 'accuracy', 'value': 0.6897669242175674, 'id': 'a9a72180-b47c-435e-ae9b-1f92ccdb7155', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 174990), 'tags': []}
{'name': 'accuracy', 'value': 0.6349999404047206, 'id': '712a7a11-3f44-4116-bd0f-e485ddc07bab', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 3, 16, 21, 36, 8, 176956), 'tags': []}


Let's retrieve the ID's of the experiments those metrics belog to for further exploration:

In [7]:
res = pr_json.search("$..experiment[?(@.metric[?(@.name=='accuracy')].value>=0.5)].id")

print(f"{len(res)} experiment IDs")
for match in res:
    print(match.value)

2 experiment IDs
5d9a639c-ddd5-419a-81a6-90c3afe43cf1
895d866b-62c1-4c4a-aeea-098152361b96


We can use the IDs to retrieve ``rubicon-ml`` experiments and dig deeper into the metadata.

In [8]:
experiment = project.experiment(id="5d9a639c-ddd5-419a-81a6-90c3afe43cf1")
experiment.artifact(name="model").get_data(unpickle=True)

<__main__.MODEL at 0x15ddbbc90>