# Finding Collections & Streams

In this notebook we will focus on finding our data streams.  We start with collections which are how streams are organized.  Then, we can use the collections to find individual streams.

If you would like to learn more about any of the topics covered here, please see the btrdb library [documentation](https://btrdb.readthedocs.io/en/develop/index.html).


**NOTE**: To get access to the Sunshine dataset to run this notebook, please register for an API key at [ni4ai.org](https://ni4ai.org/).

## Imports

In [3]:
import btrdb
import yaml
from tabulate import tabulate

## Establish Server Connection

We always start with establishing a connection to the server using the `connect` function from the `btrdb` library.  The connect function takes two optional arguments - the address of the BTrDB cluster and an API key to identify the user as follows:  

```python
conn = btrdb.connect("api.ni4ai.org:4411", apikey="mykey")
```

Both of these arguments are optional and if not supplied, then the function will look for corresponding `BTRDB_ENDPOINTS` and `BTRDB_API_KEY` environment variables. 

We've included a configuration file 'config.yaml' in the home directory of this repository where you can enter your own API key.

In [4]:
# Make sure you add your API key to the config file to connect!
with open('../config.yaml', 'r') as f:
    config = yaml.safe_load(f)
    
conn = btrdb.connect(config['connection']['api_url'], config['connection']['api_key'])
conn.info()

{'majorVersion': 5, 'build': '5.11.124', 'proxy': {'proxyEndpoints': []}}

# Finding Collections

Time series data in BTrDB is organized into collections which can be thought of as a hierarchical paths such as `CALIFORNIA/SanFrancisco/91405`.  Within this collection/path you can put as many time series streams as you like.  Listing all available collections is easy an can be done with the `list_collections` method from the primary database handle.

In [5]:
import pandas as pd

collections = conn.list_collections()
collections.sort()

for i, c in enumerate(collections):
    levels = c.split('/')
    for j, l in enumerate(levels):
        if i == 0:
            pass
        elif l in collections[i-1]:
            continue
        print(j*' ','->', l)


 -> Health
  -> EKG
   -> patient001
 -> POW
  -> EPFL
  -> GridSweep
  -> signatures
   -> event0001
   -> event0004
   -> event0005
   -> event0007
   -> event0021
   -> event0022
   -> event0062
   -> event0064
   -> event0065
   -> event0067
   -> event0068
   -> event0207
   -> event0243
   -> event0245
   -> event0247
   -> event0249
   -> event0251
   -> event0253
   -> event0281
   -> event0283
   -> event0284
   -> event0287
   -> event0289
   -> event0306
   -> event0366
   -> event0550
   -> event0713
   -> event0715
   -> event0716
   -> event0719
   -> event0722
   -> event0723
   -> event0775
   -> event0779
   -> event0781
   -> event0782
   -> event0785
   -> event0788
   -> event0790
   -> event0793
   -> event0796
   -> event0799
   -> event0801
   -> event0803
   -> event0804
   -> event0807
   -> event0809
   -> event0811
   -> event0852
   -> event0855
   -> event0857
   -> event0859
   -> event0868
   -> event0935
   -> event0940
   -> event0942
   -> event0947
  

## Narrowing Our Search

Alternatively, you can use a targetted search if you want to limit the results to a particular set of collections by providing the first part of the collection path.

In [6]:
conn.list_collections("sunshine/PMU")

['sunshine/PMU1',
 'sunshine/PMU2',
 'sunshine/PMU3',
 'sunshine/PMU4',
 'sunshine/PMU5',
 'sunshine/PMU6']

# Finding Streams

Streams in BTrDB are one of the most important objects you will be dealing with.  Each represents a particular time series within the database and contains both metadata as well as the underlying time/value pairs.

We will look at stream objects in more detail as a future exercise but for now we will concentrate on just retrieving the stream objects.

## Search By Collection

The easiest way to find the particular streams you are looking for is to use the `streams_in_collection` method.  In the simplest use case, you can provide the collection that contains your streams.  

Note that this method returns a generator and so the examples below convert it to a list to retrieve the data.

In [7]:
streams = list(conn.streams_in_collection('sunshine/PMU1'))
streams

[<Stream collection=sunshine/PMU1 name=LSTATE>,
 <Stream collection=sunshine/PMU1 name=C1ANG>,
 <Stream collection=sunshine/PMU1 name=C3MAG>,
 <Stream collection=sunshine/PMU1 name=C2MAG>,
 <Stream collection=sunshine/PMU1 name=C1MAG>,
 <Stream collection=sunshine/PMU1 name=C3ANG>,
 <Stream collection=sunshine/PMU1 name=L3ANG>,
 <Stream collection=sunshine/PMU1 name=L2ANG>,
 <Stream collection=sunshine/PMU1 name=L3MAG>,
 <Stream collection=sunshine/PMU1 name=L1ANG>,
 <Stream collection=sunshine/PMU1 name=C2ANG>,
 <Stream collection=sunshine/PMU1 name=L1MAG>,
 <Stream collection=sunshine/PMU1 name=L2MAG>]

## Convenience Function for Displaying Metadata 

Each of these streams has its own metadata such as `collection`, `name`, `uuid` and so on.  Let's create a simple convenience function to display the stream metadata using the `tabulate` library.

In [8]:
def describe_streams(streams):
    table = [["Collection", "Name", "Units", "Version", "UUID"]]
    for stream in streams:
        tags = stream.tags()
        table.append([
            stream.collection, stream.name, tags["unit"], stream.version(), stream.uuid
        ])
    return tabulate(table, headers="firstrow")

print(describe_streams(streams))

Collection     Name    Units      Version  UUID
-------------  ------  -------  ---------  ------------------------------------
sunshine/PMU1  LSTATE  mask        243640  6ffb2e7e-273c-4963-9143-b416923980b0
sunshine/PMU1  C1ANG   deg         240607  d625793b-721f-46e2-8b8c-18f882366eeb
sunshine/PMU1  C3MAG   amps        240481  fb61e4d1-3e17-48ee-bdf3-43c54b03d7c8
sunshine/PMU1  C2MAG   amps        240718  d765f128-4c00-4226-bacf-0de8ebb090b5
sunshine/PMU1  C1MAG   amps        240380  1187af71-2d54-49d4-9027-bae5d23c4bda
sunshine/PMU1  C3ANG   deg         240781  0be8a8f4-3b45-4fe3-b77c-1cbdadb92039
sunshine/PMU1  L3ANG   deg         240862  e4efd9f6-9932-49b6-9799-90815507aed0
sunshine/PMU1  L2ANG   deg         240662  886203ca-d3e8-4fca-90cc-c88dfd0283d4
sunshine/PMU1  L3MAG   volts       229263  b2936212-253e-488a-87f6-a9927042031f
sunshine/PMU1  L1ANG   deg         229265  51840b07-297a-42e5-a73a-290c0a47bddb
sunshine/PMU1  C2ANG   deg         229263  97de3802-d38d-403c-96af-d23b8

## Narrowing Our Search

We can also include extra parameters to `streams_in_collection` when searching for streams.  Streams contain dictionaries for metadata called `tags` and `annotations`.  Tags are generally reserved for internal use while annotations are for custom metadata.

Let's do our search again but narrow our results to just include streams that have a unit of "Volts".  Similarly we can provide a dictionary for the custom annotation data if that would help to narrow our search.

In [9]:
streams = conn.streams_in_collection('sunshine/PMU1', tags={"unit": "amps"})
print(describe_streams(streams))

Collection     Name    Units      Version  UUID
-------------  ------  -------  ---------  ------------------------------------
sunshine/PMU1  C3MAG   amps        240481  fb61e4d1-3e17-48ee-bdf3-43c54b03d7c8
sunshine/PMU1  C2MAG   amps        240718  d765f128-4c00-4226-bacf-0de8ebb090b5
sunshine/PMU1  C1MAG   amps        240380  1187af71-2d54-49d4-9027-bae5d23c4bda
