# Metadata Use Cases

The user can add arbitrary metadata fields. Only a handful are required (owner, group, beamline_id). The metadata can be a number, a string, a list, or a dictionary with any amount of nesting. These custom fields are stored in the RunStart Document as siblings of the required fields, and the Data Broker can search on them.

In [0]:
from bluesky import RunEngine, Msg

## The metadata validator

The RunEngine has a metadata validating function to check that all required metadata is present and valid. It is up to individual beamlines and users to decide what this means.

### Let's trip the validator

In [0]:
plan = [Msg('open_run'), Msg('close_run')]  # a pointless plan

In [0]:
RE = RunEngine({})

In [0]:
RE(plan)

### What happens if replace the default validator with one that does nothing?

The metadata validating function must accept a dictionary as its argument. Unless the function raises an error, the RunEngine will assume all is well and proceed with the instructions.

In [0]:
def permissive_validator(md):
    pass

In [0]:
RE.md_validator = permissive_validator

In [0]:
RE(plan)

We tripped the hard-coded Document validatation. There is no getting around this. (If there were, the errors would just occur all the way down in mongo!)

The moral: Your can use a custom metadata validator to be *more* strict, but there is no point in trying to make it less strict.

In [0]:
RE.md = {'owner': 'demo', 'group': 'demo', 'beamline_id': 'demo'}  

### Now use the validator to check for some beamline- or experiment-specific metadata.

In [0]:
def color_validator(md):
    if 'color' not in md:
        raise ValueError("You must specify a color.")
        
RE.md_validator = color_validator

This will trip the validator:

In [0]:
RE(plan)

## Three ways to set metadata

### We can set the metadata when we run a scan.

In [0]:
RE(plan, color='red')

It does not persist between runs.

In [0]:
RE(plan)

### We can add the metadata to the RunEngine's metadata dictionary, which is reused for subsequent runs.

In [0]:
RE.md['color'] = 'red'

In [0]:
RE(plan)  # this picks up color=red from RE.md

Now that `color` is in `RE.md`, it is updated if we specify it.

In [0]:
RE(plan, color='blue')  # this updates RE.md['color'] as a side effect

In [0]:
RE.md['color']

To stop persisting color between runs, delete it.

In [0]:
del RE.md['color']

And now this will error again:

In [0]:
RE(plan)

### It is also possible to associated metadata with the plan itself.

This is rarely necessary.

In [0]:
plan_with_md = [Msg('open_run', color='green'), Msg('close_run')]

In [0]:
RE(plan_with_md)

### Persistence between sessions

Above we have used an ordinary Python dictionary for `RE.md`. In fact, we can use anything that behaves like a dictionary. In bluesky's standard configuration, we use a custom-made object called `History` that behaves like a dictionary but backs up its contents to disk in a small file. That way, the contents are restored between sessions.

```
from history import History
RE.md = History('metadata.db')
```

And the `History` object adds one more useful feature we won't cover here -- see the [documentation](https://github.com/Nikea/history#examples) for details.

## Searching on Custom Metadata

First we'll run some experiments and put the results in metadatastore.

In [0]:
from bluesky.register_mds import register_mds

register_mds(RE)  # hook up our RunEngine to metadatastore

In [0]:
RE(plan, user='dan', color='pink', mood='optimistic')
RE(plan, user='dan', color='pink', mood='optimistic')
RE(plan, user='dan', color='purple', mood='optimistic', doubts=None)
RE(plan, user='dan', color='purple', mood='skeptical', doubts='serious')

Let's retrieve the 'skeptical' run. There should be just one.

In [0]:
from databroker import DataBroker as db

headers = db(mood='skeptical')

In [0]:
len(headers)

In [0]:
headers[0]

We should find three 'optimistic' runs.

In [0]:
len(db(mood='optimistic'))

Use MongoDB syntax to form complex queries. See [MongoDB documentation](https://docs.mongodb.org/manual/tutorial/query-documents/) for more.

In [0]:
query = {'$and': [{'mood': 'optimistic'}, {'color': 'purple'}]}  # optimistic and purple
len(db(**query))

In [0]:
len(db(doubts={'$exists': True}))  # where doubts was specified, with no restriction on its value

Let's generate more data, this time making use of persistent metadata. We'll set some sample information in `RE.md` and see that it applies to all future runs.

In [0]:
RE.md['sample'] = {'element': 'Au', 'mass': 10, 'dimensions': (1, 5, 10)}

RE(plan, color='yellow')
RE(plan, color='yellow')
RE(plan, color='yellow')

RE.md['sample']['mass'] = 20
RE.md['sample']['dimensions'] = (2, 5, 20)

RE(plan, color='yellow')
RE(plan, color='yellow')
RE(plan, color='yellow')

In [0]:
query = {'sample.element': 'Au'}
len(db(**query))

In [0]:
query = {'sample.mass': 10}
len(db(**query))