# Finding Collections & Streams

In this notebook we will focus on finding our data streams.  We start with collections which are how streams are organized.  Then, we can use the collections to find individual streams.

If you would like to learn more about any of the topics covered here, please see the btrdb library [documentation](https://btrdb.readthedocs.io/en/develop/index.html).

## Imports

In [1]:
import btrdb
from tabulate import tabulate

## Establish Server Connection

We always start with establishing a connection to the server using the `connect` function from the `btrdb` library.  The connect function takes two optional arguments - the address of the BTrDB cluster and an API key to identify the user as follows:  

```python
conn = btrdb.connect("btrdb.endpoint:4411", apikey="mykey")
```

Both of these arguments are optional and if not supplied, then the function will look for corresponding `BTRDB_ENDPOINTS` and `BTRDB_API_KEY` environment variables. 

In [2]:
# Note that this connection uses the environment
conn = btrdb.connect(apikey="AE0C013A87C48930E37ED8D8")
conn.info()

{'majorVersion': 5, 'build': '5.1.10', 'proxy': {'proxyEndpoints': []}}

# Finding Collections

Time series data in BTrDB is organized into collections which can be thought of as a hierarchical paths such as `CALIFORNIA/SanFrancisco/91405`.  Within this collection/path you can put as many time series streams as you like.  Listing all available collections is easy an can be done with the `list_collections` method from the primary database handle.

In [3]:
conn.list_collections()

['relay/Suffolk_11-1L8',
 'relay/Mtsto500_11-2Pmu',
 'relay/Lexington_11-1T1',
 'relay/Chesa115_11-2Pmu',
 'relay/Cunningha_11-1L2',
 'relay/Lexington_11-1L2',
 'relay/Carson_11-1Pmu',
 'relay/Nanna500_11-2Pmu',
 'relay/Morrisvi_11-1Pmu',
 'relay/Chancello_11-1T1',
 'relay/Suffo230_11-2Pmu',
 'relay/Chancello_11-1L1',
 'relay/Morrisvil_11-1T2',
 'relay/Chancello_11-1T4',
 'relay/Mount Sto_11-1L4',
 'relay/Valley_11-1L3',
 'relay/Mtsto500_11-1Pmu',
 'relay/Carson_11-1L5',
 'relay/Suffo230_11-1Pmu',
 'relay/Suffolk_11-1L7',
 'relay/Carson_11-1L1',
 'relay/Chesa115_11-1Pmu',
 'relay/Lexin500_11-1Pmu',
 'relay/Nanna500_11-1Pmu',
 'relay/Possum Po_11-1L3',
 'relay/Clover_11-1L1',
 'relay/Morrisvil_11-1L3',
 'relay/Chancello_11-1L3',
 'relay/Morrisvil_11-1L1',
 'relay/Cunningha_11-1L1',
 'relay/Loudoun_11-1L1',
 'relay/Suffolk_11-1L9',
 'relay/Mount Sto_11-1L3',
 'relay/Valle500_11-1Pmu',
 'relay/Chancello_11-1L2',
 'relay/Morrisvi_11-2Pmu',
 'relay/Carson_11-1L2',
 'relay/Dooms500_11-1Pmu',

## Narrowing Our Search

Alternatively, you can use a targetted search if you want to limit the results to a particular set of collections by providing the first part of the collection path.

In [4]:
conn.list_collections("relay/Mount")

['relay/Mount Sto_11-1L3', 'relay/Mount Sto_11-1L4']

# Finding Streams

Streams in BTrDB are one of the most important objects you will be dealing with.  Each represents a particular time series within the database and contains both metadata as well as the underlying time/value pairs.

We will look at stream objects in more detail as a future exercise but for now we will concentrate on just retrieving the stream objects.

## Search By Collection

The easiest way to find the particular streams you are looking for is to use the `streams_in_collection` method.  In the simplest use case, you can provide the collection that contains your streams.  

Note that this method returns a generator and so the examples below convert it to a list to retrieve the data.

In [5]:
streams = list(conn.streams_in_collection('relay/Possum Po_11-1L1'))
streams

[<btrdb.stream.Stream at 0x7f4eb3777208>,
 <btrdb.stream.Stream at 0x7f4eb3777400>,
 <btrdb.stream.Stream at 0x7f4eb37775f8>,
 <btrdb.stream.Stream at 0x7f4eb37777f0>,
 <btrdb.stream.Stream at 0x7f4eb37779e8>,
 <btrdb.stream.Stream at 0x7f4eb3777be0>,
 <btrdb.stream.Stream at 0x7f4eb3777dd8>,
 <btrdb.stream.Stream at 0x7f4eb3777fd0>,
 <btrdb.stream.Stream at 0x7f4eb12e4208>,
 <btrdb.stream.Stream at 0x7f4eb12e4400>,
 <btrdb.stream.Stream at 0x7f4eb12e45f8>,
 <btrdb.stream.Stream at 0x7f4eb12e47f0>,
 <btrdb.stream.Stream at 0x7f4eb12e49e8>,
 <btrdb.stream.Stream at 0x7f4eb12e4be0>,
 <btrdb.stream.Stream at 0x7f4eb12e4dd8>,
 <btrdb.stream.Stream at 0x7f4eb12e4fd0>,
 <btrdb.stream.Stream at 0x7f4eb12ed208>,
 <btrdb.stream.Stream at 0x7f4eb12ed400>,
 <btrdb.stream.Stream at 0x7f4eb12ed5f8>]

## Convenience Function for Displaying Metadata 

Each of these streams has its own metadata such as `collection`, `name`, `uuid` and so on.  Let's create a simple convenience function to display the stream metadata using the `tabulate` library.

In [6]:
def describe_streams(streams):
    table = [["Collection", "Name", "Units", "Version", "UUID"]]
    for stream in streams:
        tags = stream.tags()
        table.append([
            stream.collection, stream.name, tags["unit"], stream.version(), stream.uuid
        ])
    return tabulate(table, headers="firstrow")

print(describe_streams(streams))

Collection              Name             Units          Version  UUID
----------------------  ---------------  -----------  ---------  ------------------------------------
relay/Possum Po_11-1L1  LINE560VA-ANG    Degrees            942  b1f13dda-0aac-45d1-9b19-9529bf805d6b
relay/Possum Po_11-1L1  LINE560V1-ANG    Degrees            942  03676c19-ffbd-44f9-af8b-90f4b253bb19
relay/Possum Po_11-1L1  LINE560I1-ANG    Degrees            942  5ef8b03d-dc5e-47f5-866a-1ccaba234ca3
relay/Possum Po_11-1L1  LINE560IC-ANG    Degrees            942  c08c8a79-e0d4-43f0-9ef6-fb6a59e385c3
relay/Possum Po_11-1L1  LINE560V1-MAG    Volts              942  1027914c-b84a-43e9-9f6e-f04f6e006dbc
relay/Possum Po_11-1L1  LINE560VA-MAG    Volts              942  171facf4-fdb3-4589-9404-37fe705f46d3
relay/Possum Po_11-1L1  LINE560VB-MAG    Volts              942  44c8db28-18b8-4f12-9a22-7b062d8e646c
relay/Possum Po_11-1L1  LINE560IA-MAG    Amps              1874  83a53a42-2f8d-4e19-bf4d-b787bb710ee9
relay/Possum

## Narrowing Our Search

We can also include extra parameters to `streams_in_collection` when searching for streams.  Streams contain dictionaries for metadata called `tags` and `annotations`.  Tags are generally reserved for internal use while annotations are for custom metadata.

Let's do our search again but narrow our results to just include streams that have a unit of "Volts".  Similarly we can provide a dictionary for the custom annotation data if that would help to narrow our search.

In [7]:
streams = conn.streams_in_collection('relay/Possum Po_11-1L1', tags={"unit": "Volts"})
print(describe_streams(streams))

Collection              Name           Units      Version  UUID
----------------------  -------------  -------  ---------  ------------------------------------
relay/Possum Po_11-1L1  LINE560V1-MAG  Volts          942  1027914c-b84a-43e9-9f6e-f04f6e006dbc
relay/Possum Po_11-1L1  LINE560VA-MAG  Volts          942  171facf4-fdb3-4589-9404-37fe705f46d3
relay/Possum Po_11-1L1  LINE560VB-MAG  Volts          942  44c8db28-18b8-4f12-9a22-7b062d8e646c
relay/Possum Po_11-1L1  LINE560VC-MAG  Volts          942  af8cf764-6f04-4f9b-b25b-467778bc5320
