# Intake for Bluesky

Intake has a concept of a `Catalog` whose entries may be other Catalogs or a `Datasource` that can be `read()` into a PyData/SciPy data structure, in whole or in chunks, or into its lazy dask-based counterpart.

Intake includes:
* authentication
* caching
* an intake server and client
* solutions for packaging so that Catalogs can be installable and accessible via import hooks (`from intake import csx_catalog`)

The demo below employs intake plugins for access and a simple callback using pymongo directly for insert. It does not import `databroker`.

## Acquire some sample data.

In [1]:
from bluesky import RunEngine
from intake_bluesky import MongoInsertCallback
from bluesky.plans import scan
from bluesky.preprocessors import SupplementalData
from ophyd.sim import det, motor

RE = RunEngine({})
sd = SupplementalData(baseline=[motor])
RE.preprocessors.append(sd)

# This is just a simple callback that does MongoDB insert_one. No databroker.
uri = 'mongodb://localhost:27017/test1'
insert = MongoInsertCallback(uri)
RE.subscribe(insert)


uid, = RE(scan([det], motor, -1, 1, 20))

## Access data using intake.

Instantiate an intake Catalog aimed at our MongoDB. (This boilerplate code could be made more magical via config files and import hooks; this is the explicit way.)

In [2]:
from intake_bluesky import MongoMetadataStoreCatalog

mds = MongoMetadataStoreCatalog(uri)

In [3]:
mds

<Intake catalog: mongodb://localhost:27017/test1>

Access a Run by `uid`. A Run is also a Catalog. It has special `__repr__`.

In [4]:
run = mds[uid]
run

<Intake catalog: Run 67c7c04a...>
  2018-11-21 16:03:08.600 -- 2018-11-21 16:03:08.680
  Streams:
    * baseline
    * primary

Read the data from all the streams in one structure, time-sorted. This is a convenient starting point for interpolation workflows.

In [5]:
run.read()

TypeError: list indices must be integers or slices, not str

In [7]:
%debug

> [0;32m/Users/dallan/Repos/bnl/intake-bluesky/intake_bluesky.py[0m(278)[0;36mread_slice[0;34m()[0m
[0;32m    276 [0;31m        [0mdata_keys[0m [0;34m=[0m [0mlist[0m[0;34m([0m[0mself[0m[0;34m.[0m[0m_event_descriptor_docs[0m[0;34m[[0m[0;36m0[0m[0;34m][0m[0;34m[[0m[0;34m'data_keys'[0m[0;34m][0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    277 [0;31m        [0;32mif[0m [0minclude[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 278 [0;31m            [0mkeys[0m [0;34m=[0m [0mlist[0m[0;34m([0m[0mset[0m[0;34m([0m[0mkeys[0m[0;34m)[0m [0;34m&[0m [0mset[0m[0;34m([0m[0minclude[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    279 [0;31m        [0;32melif[0m [0mexclude[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    280 [0;31m            [0mkeys[0m [0;34m=[0m [0mlist[0m[0;34m([0m[0mset[0m[0;34m([0m[0mkeys[0m[0;34m)[0m [0;34m-[0m [0mset[0m[0;34m([0m[0mexclude[0m[0;

ipdb>  set(keys)


*** NameError: name 'keys' is not defined


ipdb>  u


> [0;32m/Users/dallan/Repos/bnl/intake-bluesky/intake_bluesky.py[0m(309)[0;36mread[0;34m()[0m
[0;32m    307 [0;31m            [0mblank[0m[0;34m.[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    308 [0;31m        """
[0m[0;32m--> 309 [0;31m        [0;32mreturn[0m [0mself[0m[0;34m.[0m[0mread_slice[0m[0;34m([0m[0mslice[0m[0;34m([0m[0;32mNone[0m[0;34m)[0m[0;34m,[0m [0minclude[0m[0;34m=[0m[0minclude[0m[0;34m,[0m [0mexclude[0m[0;34m=[0m[0mexclude[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    310 [0;31m[0;34m[0m[0m
[0m[0;32m    311 [0;31m    [0;32mdef[0m [0mread_chunked[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mchunks[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0;34m*[0m[0;34m,[0m [0minclude[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mexclude[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  d


> [0;32m/Users/dallan/Repos/bnl/intake-bluesky/intake_bluesky.py[0m(278)[0;36mread_slice[0;34m()[0m
[0;32m    276 [0;31m        [0mdata_keys[0m [0;34m=[0m [0mlist[0m[0;34m([0m[0mself[0m[0;34m.[0m[0m_event_descriptor_docs[0m[0;34m[[0m[0;36m0[0m[0;34m][0m[0;34m[[0m[0;34m'data_keys'[0m[0;34m][0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    277 [0;31m        [0;32mif[0m [0minclude[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 278 [0;31m            [0mkeys[0m [0;34m=[0m [0mlist[0m[0;34m([0m[0mset[0m[0;34m([0m[0mkeys[0m[0;34m)[0m [0;34m&[0m [0mset[0m[0;34m([0m[0minclude[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    279 [0;31m        [0;32melif[0m [0mexclude[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    280 [0;31m            [0mkeys[0m [0;34m=[0m [0mlist[0m[0;34m([0m[0mset[0m[0;34m([0m[0mkeys[0m[0;34m)[0m [0;34m-[0m [0mset[0m[0;34m([0m[0mexclude[0m[0;

ipdb>  exit


In [6]:
run.read(include=['motor']).head()

UnboundLocalError: local variable 'keys' referenced before assignment

In [7]:
run.read(exclude=['motor_setpoint']).head()

Unnamed: 0_level_0,det,motor
time,Unnamed: 1_level_1,Unnamed: 2_level_1
1542750000.0,,0.0
1542750000.0,,1.0
1542750000.0,0.606531,-1.0
1542750000.0,0.670134,-0.894737
1542750000.0,0.732249,-0.789474


The `mds` catalog has a `serach()` method. It returns... a Catalog! This Catalog will have a subset of the entries from `mds`. This Catalog in turn has a `search()` method, which can be used to further refine the results in yet another Catalog, and so on.

In [8]:
results = mds.search({'plan_name': 'count'})
len(list(results))

76

In [9]:
import time
refined_results = results.search({'time': {'$lt': time.time() - 60 * 60 * 24}})
len(list(refined_results))

27

Whitelist or blacklist fields. (You can't do both at once -- that's a `ValueError`.)

In [10]:
run.read(include=['motor']).head()

Unnamed: 0_level_0,motor
time,Unnamed: 1_level_1
1542750000.0,0.0
1542750000.0,1.0
1542750000.0,-1.0
1542750000.0,-0.894737
1542750000.0,-0.789474


In [11]:
run.read(exclude=['motor_setpoint']).head()

Unnamed: 0_level_0,det,motor
time,Unnamed: 1_level_1,Unnamed: 2_level_1
1542750000.0,,0.0
1542750000.0,,1.0
1542750000.0,0.606531,-1.0
1542750000.0,0.670134,-0.894737
1542750000.0,0.732249,-0.789474


Remember that `run` is a `Catalog`. Its entries are the Streams. We can read them individually.

In [12]:
list(run)

['baseline', 'primary']

In [13]:
run['primary']

<Intake catalog: Stream 'primary' from Run 179bf0ba...>

Same as pandas DataFrame columns, dot access works as well unless the stream name collides with an existing attribute. Tab-complete works as well.

In [14]:
run.primary

<Intake catalog: Stream 'primary' from Run 179bf0ba...>

We can read the data all at once:

In [15]:
run.primary.read().head()

Unnamed: 0,time,det,motor,motor_setpoint
1,1542750000.0,0.606531,-1.0,-1.0
2,1542750000.0,0.670134,-0.894737,-0.894737
3,1542750000.0,0.732249,-0.789474,-0.789474
4,1542750000.0,0.791305,-0.684211,-0.684211
5,1542750000.0,0.8457,-0.578947,-0.578947


Or access a slice (along the Event axis, potentially along other axes in the future):

In [16]:
run.primary.read_slice(slice(7, 13))

Unnamed: 0,time,det,motor,motor_setpoint
7,1542750000.0,0.934385,-0.368421,-0.368421
8,1542750000.0,0.965967,-0.263158,-0.263158
9,1542750000.0,0.987612,-0.157895,-0.157895
10,1542750000.0,0.998616,-0.052632,-0.052632
11,1542750000.0,0.998616,0.052632,0.052632
12,1542750000.0,0.987612,0.157895,0.157895


We can also read the data as a generator of chunks. The chunk size is some default provided by the `Catalog`, but it is optionally configurable.

In [17]:
for chunk in run.primary.read_chunked():
    print(chunk)

            time       det     motor  motor_setpoint
1   1.542750e+09  0.606531 -1.000000       -1.000000
2   1.542750e+09  0.670134 -0.894737       -0.894737
3   1.542750e+09  0.732249 -0.789474       -0.789474
4   1.542750e+09  0.791305 -0.684211       -0.684211
5   1.542750e+09  0.845700 -0.578947       -0.578947
6   1.542750e+09  0.893876 -0.473684       -0.473684
7   1.542750e+09  0.934385 -0.368421       -0.368421
8   1.542750e+09  0.965967 -0.263158       -0.263158
9   1.542750e+09  0.987612 -0.157895       -0.157895
10  1.542750e+09  0.998616 -0.052632       -0.052632
            time       det     motor  motor_setpoint
11  1.542750e+09  0.998616  0.052632        0.052632
12  1.542750e+09  0.987612  0.157895        0.157895
13  1.542750e+09  0.965967  0.263158        0.263158
14  1.542750e+09  0.934385  0.368421        0.368421
15  1.542750e+09  0.893876  0.473684        0.473684
16  1.542750e+09  0.845700  0.578947        0.578947
17  1.542750e+09  0.791305  0.684211        0.

In [18]:
for chunk in run.primary.read_chunked(3):
    print(chunk)

           time       det     motor  motor_setpoint
1  1.542750e+09  0.606531 -1.000000       -1.000000
2  1.542750e+09  0.670134 -0.894737       -0.894737
3  1.542750e+09  0.732249 -0.789474       -0.789474
           time       det     motor  motor_setpoint
4  1.542750e+09  0.791305 -0.684211       -0.684211
5  1.542750e+09  0.845700 -0.578947       -0.578947
6  1.542750e+09  0.893876 -0.473684       -0.473684
           time       det     motor  motor_setpoint
7  1.542750e+09  0.934385 -0.368421       -0.368421
8  1.542750e+09  0.965967 -0.263158       -0.263158
9  1.542750e+09  0.987612 -0.157895       -0.157895
            time       det     motor  motor_setpoint
10  1.542750e+09  0.998616 -0.052632       -0.052632
11  1.542750e+09  0.998616  0.052632        0.052632
12  1.542750e+09  0.987612  0.157895        0.157895
            time       det     motor  motor_setpoint
13  1.542750e+09  0.965967  0.263158        0.263158
14  1.542750e+09  0.934385  0.368421        0.368421
15  1

The stream is *also* a Catalog. Its entries are fields a.k.a data keys a.k.a. columns.

In [19]:
list(run.primary)

['det', 'motor', 'motor_setpoint']

In [20]:
run.primary.det

<Intake datasource: Field 'det' of Stream 'primary' from Run 179bf0ba...>

The same methods --- `read()`, `read_slice()`, `read_chunked()` --- apply. They can typically return simpler data structures because the data they represent is more homogeneous.

In [21]:
run.primary.det.read().head()

1    0.606531
2    0.670134
3    0.732249
4    0.791305
5    0.845700
dtype: float64