# Tagging

Tags can be used to group and indentify specific rubicon-ml entities by shared characteristics.
Any rubicon-ml entity can be tagged when logged with any number of tags. Later, tags can be leveraged
to query rubicon-ml logs during retrieval.

In general, a tag is any arbitrary string. rubicon-ml provides additonal functionality for tags that
follow a ``<key>:<value>`` format.

## Logging with tags

First, create a ``Rubicon`` entrypoint.

In [1]:
from rubicon_ml import Rubicon

rubicon = Rubicon(persistence="memory")
project = rubicon.create_project("tagging")

Now we'll log three experiments with tags "a" and "b".

In [2]:
experiment_a = project.log_experiment(tags=["a"])
experiment_b = project.log_experiment(tags=["b"])
experiment_c = project.log_experiment(tags=["a", "b"])

print(f"`experiment_a` ID: {experiment_a.id}, tags: {experiment_a.tags}")
print(f"`experiment_b` ID: {experiment_b.id}, tags: {experiment_b.tags}")
print(f"`experiment_c` ID: {experiment_c.id}, tags: {experiment_c.tags}")

`experiment_a` ID: 09b2ff25-5152-4e82-9532-c1a09de65409, tags: ['a']
`experiment_b` ID: 932f21ff-e839-437f-a7e8-6c05a1186294, tags: ['b']
`experiment_c` ID: ef64c07f-a7ba-4248-bde2-a4a323a09428, tags: ['a', 'b']


Any other entity logged to an experiment can also be tagged.

In [3]:
import pandas as pd

artifact = experiment_a.log_artifact(
    data_bytes=b"artifact", name="artifact", tags=["c"]
)
dataframe = experiment_a.log_dataframe(
    df=pd.DataFrame([[0], [1]]), tags=["d"]
)
feature = experiment_a.log_feature(name="var_0", tags=["e"])
parameter = experiment_a.log_parameter(name="input", value=0, tags=["f"])
metric = experiment_a.log_metric(name="output", value=1, tags=["g"])

## Retrieving with tags

Each of the retrieval functions on a project or experiment (``experiments``, ``metrics``, etc.)
accept the ``tags`` and ``qtype`` ("or" or "and", default "or") arguments to filter retrieval.

First, grab all the experiments with tag "a".

In [4]:
[e.id for e in project.experiments(tags=["a"])]

['09b2ff25-5152-4e82-9532-c1a09de65409',
 'ef64c07f-a7ba-4248-bde2-a4a323a09428']

Next, get each experiment with tag "b". Note that the final experiment is the same as the last
output since it has both tags "a" and "b".

In [5]:
[e.id for e in project.experiments(tags=["b"])]

['932f21ff-e839-437f-a7e8-6c05a1186294',
 'ef64c07f-a7ba-4248-bde2-a4a323a09428']

Querying with multiple tags uses a logical _or_ to return results by default.

In [6]:
[e.id for e in project.experiments(tags=["a", "b"])]

['09b2ff25-5152-4e82-9532-c1a09de65409',
 '932f21ff-e839-437f-a7e8-6c05a1186294',
 'ef64c07f-a7ba-4248-bde2-a4a323a09428']

This can be switched to a logical _and_ with the ``qtype`` argument.

In [7]:
[e.id for e in project.experiments(tags=["a", "b"], qtype="and")]

['ef64c07f-a7ba-4248-bde2-a4a323a09428']

## Updating tags

Tags can be update later, after logging as well.

In [8]:
experiment_c.tags

['a', 'b']

`add_tags` adds any number of new tags to an existing entity. Each entity that allows
tagging will have both the ``add_tags`` and ``remove_tags`` functions.

In [9]:
experiment_c.add_tags(["h", "i"])
experiment_c.tags

['i', 'h', 'a', 'b']

Removal works similarly.

In [10]:
experiment_c.remove_tags(["a", "b"])
experiment_c.tags

['i', 'h']

Now, the same query from above for an experiment with tags "a" and "b" returns no results.

In [11]:
[e.id for e in project.experiments(tags=["a", "b"], qtype="and")]

[]

## Key-value tags

rubicon-ml provides extended support for tags that follow the ``<key>:<value>`` format.

In [12]:
experiment_d = project.log_experiment(tags=["j:k"])
experiment_e = project.log_experiment(tags=["l:m", "l:n"])

The list returned by the `tags` property of any entity can be indexed into like a
regular list to retrieve the full tags, just like with normal tags.

In [13]:
experiment_d.tags[0]

'j:k'

But it also supports string indexing, like a dictionary. To retrieve the value of a
key-value tag, just index into the `tags` property with its key.

In [14]:
experiment_d.tags["j"]

'k'

If there are multiple keys, a list containing each value will be returned.

In [15]:
experiment_e.tags["l"]

['m', 'n']

### Managing experiment relationships

A common use for key-value tags is managing relationships between experiments. rubicon-ml
has built-in support for managing such relationships in this manner.

In [16]:
experiment_a.add_child_experiment(experiment_d)
experiment_a.add_child_experiment(experiment_e)

experiment_a.tags

['a',
 'child:f080134a-b118-4ac4-b400-25fc097366a8',
 'child:9443daa7-0bee-4af8-8f31-79a85017bcd5']

The experiment IDs themselves can be retrieved by indexing into the tags with the "child" key.

In [17]:
experiment_a.tags["child"]

['f080134a-b118-4ac4-b400-25fc097366a8',
 '9443daa7-0bee-4af8-8f31-79a85017bcd5']

From there, we can use the IDs grab the entire experiments from the original project.

In [18]:
[project.experiment(id=exp_id) for exp_id in experiment_a.tags["child"]]

[<rubicon_ml.client.experiment.Experiment at 0x1744780d0>,
 <rubicon_ml.client.experiment.Experiment at 0x1744781f0>]