Currently not functional

# Play with Mapping
This notebook lets us play with the mapping that we can create guide the MappedH5Generator class into producing a document stream (ingesting) from an hdf5 file.

1. Build a projection
A [projection](https://blueskyproject.io/event-model/data-model.html#projections) maps from metadata and fields in a docstream to a datastructure keyed on keys with a known ontology. Ontologies could be a structure like fields needed to recreate a NeXus file, or fields needed to display data in an application, like Splash.

2. Import projections file
For now, we have a projection stored in [projections.py](./projections.py). We're using python instead of json because we could add comments. But as long as you use double quotes instead of single quotes for strings and keys, the format is almost the same. This projection will be added to the start doc when we ingest the file.

3. Build a file
First, let's create a sample hdf5 file. We can modify the fields here and watch them update down below when we generate the docstream.

4. Build a mapping
Let's work on the mapping. You create a file...could be python or json, and provide it to the `ingestor`. The exact mechnanism for doing that has not quite been designed, yet. 

For now, we have a mapping stored in [mapping.py](./mapping.py). We're using python instead of json because we could add comments. But as long as you use double quotes instead of single quotes for strings and keys, the format is almost the same.

5. Import the mapping

## Ingest!
Now we construct an instance of the MappedH5Generator and ask it to generate us a docstream that reads the mapping and provides fields from the file

The root directory 'test_root' variable that can help us find the file based on a configurable root dir. It will be written directly into the resource document.


In [33]:
import datetime
from importlib import reload
import json
import os
from pprint import pprint
import pytz
import sys
sys.path.append("../..") 
import tempfile
from IPython.utils.tempdir import TemporaryWorkingDirectory
from IPython.display import FileLink
import h5py
import numpy as np
from splash_ingest.docstream import MappedH5Generator, MappingNotFoundError
from splash_ingest.model import Mapping
from splash_ingest.scicat import NPArrayEncoder

  
    
def build_projections():
    projections_dict = {}
    # with open('mapping.json') as json_file:
    #    mapping_dict = json.load(json_file)
    import projections

    reload(projections)
    return projections.projections

def build_mapping():
    mapping_dict = {}
    # with open('mapping.json') as json_file:
    #    mapping_dict = json.load(json_file)
    import mapping
    reload(mapping)
    mapping_dict = mapping.mapping_dict
    # construct a mapping object from dict to validate that we typed it correctly
    return Mapping(**mapping_dict)


detailed_output = True

# file_name = build_file()
file_name = '/home/dylan/data/beamlines/als832/20210511_163010_test1313.h5'
print(file_name)
with h5py.File(file_name, 'r') as my_file:

    mapping = build_mapping()
    projections = build_projections()

    ingestor = MappedH5Generator(mapping, my_file, "/tmp", single_event=False)

    start_doc = {}
    stop_doc = {}

    # fill up a dictionary to later run a projection from
    from databroker.core import BlueskyRun, SingleRunCache
    run_cache = SingleRunCache()
    try:
        counter = 0
        for name, doc in ingestor.generate_docstream():
            run_cache.callback(name, doc)
            if name == "start":
                start_doc = doc
            if name == 'datum' and counter < 3:
                print("\n\n===============")
                print("Document:  " + name)
                counter += 1
            if name == 'event' and counter < 3:
                print("\n\n===============")
                print("Document:  " + name)

                print (json.dumps(doc, indent=4, cls=NPArrayEncoder))
            else:
                if name == 'start' or name == 'stop':
                    doc_str = json.dumps(doc, indent=4, cls=NPArrayEncoder)
                    print(doc_str)

    except MappingNotFoundError as e:
        print('Indigestion! ' + repr(e))

run = run_cache.retrieve()
pprint(ingestor.issues)

/home/dylan/data/beamlines/als832/20210511_163010_test1313.h5
{
    "uid": "c8808f9d-706f-4319-a30d-43396fd21ccf",
    "time": 1621545248.8418987,
    ":measurement:sample:experimenter:name": "Dilworth, Y., Parkinson",
    ":measurement:sample:experiment:proposal": "BLS-00319",
    ":process:acquisition:start_date": [
        1620775822.43021
    ],
    "projections": null,
    "data_groups": [
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319",
        "BLS-00319"
    ]
}
{
    "uid": "a070a26b-ea2d-4ca8-9157-68dbd23c6034",
    "time": 1621545248.843331,
    "run_start": "c8808f9d-706f-4319-a30d-43396fd21ccf",
    "exit_sta

In [32]:
run.primary.to_dask()[':exchange:data']

Unnamed: 0,Array,Chunk
Bytes,0 B,0 B
Shape,"(0, 420, 2560)","(0, 420, 2560)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 0 B 0 B Shape (0, 420, 2560) (0, 420, 2560) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",,

Unnamed: 0,Array,Chunk
Bytes,0 B,0 B
Shape,"(0, 420, 2560)","(0, 420, 2560)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray


In [59]:

from databroker.projector import project_xarray



xarray = project_xarray(run)
xarray

Unnamed: 0,Array,Chunk
Bytes,3.46 GB,2.30 MB
Shape,"(1500, 1200, 1920)","(1, 1200, 1920)"
Count,51000 Tasks,1500 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 3.46 GB 2.30 MB Shape (1500, 1200, 1920) (1, 1200, 1920) Count 51000 Tasks 1500 Chunks Type uint8 numpy.ndarray",1920  1200  1500,

Unnamed: 0,Array,Chunk
Bytes,3.46 GB,2.30 MB
Shape,"(1500, 1200, 1920)","(1, 1200, 1920)"
Count,51000 Tasks,1500 Chunks
Type,uint8,numpy.ndarray
