# Working with Streams

In this notebook we will focus on retrieving data from the individual time series streams in BTrDB.  As part of this we will also look into the `Point` classes which represent data within a `Stream` instance.  We will start with basic metadata and then move on to real value data.  As part of this we will retrieve both low level data as well as aggregated windows of data.

If you would like to learn more about any of the topics covered here, please see the btrdb library [documentation](https://btrdb.readthedocs.io/en/latest/).

**NOTE**: To get access to the Sunshine dataset to run this notebook, please register for an API key at [ni4ai.org](https://ni4ai.org/).

## Imports

In [None]:
import btrdb
from tabulate import tabulate
from pprint import pprint


## Connect To Server

In [None]:
conn = btrdb.connect()
conn.info()


# Stream Basics

Streams contain actual points of data and you can query the time series values in a number of ways.  But first, let's look at the Stream metadata for a better understanding of what is available.

## Metadata Properties

As discussed in the previous notebook, a stream has its own metadata properties such as its `collection` or `uuid`.  Let's start by defining each read-only property.

**collection:** The hierarchichal path where a stream lives.  This is used purely for organizational purposes and has no impact on the underlying data.

**name:** A friendly name for the stream.

**uuid:** A unique identifier for a given stream.  This UUID cannot be changed and can be used to fetch a stream directly using the connection object.

In [None]:
streams = list(conn.streams_in_collection('sunshine/PMU6', tags={"unit": "volts"}))
for stream in streams:
    print("collection:{}, name:{}, uuid:{}".format(stream.collection, stream.name, stream.uuid))


## Metadata Methods

In addition to the metadata properties above, each stream has a set of metadata methods.  This information is provided as instance methods to indicate that they typically require a round-trip to the server and may change often.

<dl>
    <dt>tags</dt>
    <dd>A dictionary containing metadata used internally by the BTrDB server.  This is also where the name of a stream can be found as the property is just for convenience. Tags are intended for internal use and generally should not be edited.</dd>
    <dt>annotations</dt> 
    <dd>A dictionary so that custom data can be attached to a stream.  As an example you could provide an annotation to indicate a point-of-contact for this data or the make/model of the sensor device.  Whenever you call the annotations method, a separate property version number is also returned. Similar to the stream version, this is a monotonically increasing number but is only updated when the metadata changes. Annotations are well suited for user-defined metadata that may evolve over time.</dd>
</dl>

Let's look at some sample tags and annotations below.

In [None]:
stream = streams[0]

print("tags")
pprint(stream.tags())

annotations, property_version = stream.annotations()
print("\nannotations")
pprint(annotations)

print("\nproperty_version")
print(property_version)


## Metadata Example

As with the collections, we can write a simple helper method to display information about a Stream.

In [None]:
def stream_detail(stream):
    """
    Prints detailed information about a stream.
    """
    table = [["Attribute", "Value"]]    
    table.append(["collection", stream.collection])
    table.append(["version", stream.version()])

    for k,v in stream.tags().items():
        table.append(["tag/{}".format(k), v])
                
    for k,v in stream.annotations()[0].items():
        table.append(["annotation/{}".format(k), v])
    
    return tabulate(table, headers="firstrow")

print(stream_detail(stream))

## Stream Version

Remember that the version of a stream changes whenever data is modified (though not when metadata is modified).  This is especially useful on live data such that you can pin all of your queries to a specific version of your data.  Otherwise, you could potentially get constantly changing statistics as new data is appended or older data arrives out of order.

Think of the version as a snapshot of your data in time.  Most of the calls we use on this page allow for an optional `version` argument which we have omitted. _The default value of zero is used which indicates the current (or latest) version_.

# Viewing Data

Most likely you are interested in examining the stored values in each time series stream.  This is fairly straightforward though it is important to note that the time series values are actually provided in a `RawPoint` class which has `time` and `value` attributes.

Because a stream may contain billions of values, data queries should provide a start and end time to bound the amount of data retrieved.  All times are stored internally in nanoseconds with the zero value being the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time). Luckily, the btrdb library comes with many convenience functions to help with converting between nanoseconds and other datetime formats.

Let's start with a simple example to view data in our `LINE560V1-MAG` stream. We will call the stream's `values` method which returns a sequence.  Each item in the sequence is a tuple containing a `RawPoint` and the version of the stream at retrieval time.  However, before we call the `values` method we will need to determine the start time for our call.

In [None]:
# retrieve the first point of data and the current stream version
earliest, version = stream.earliest()
print(earliest, version)

We can see above that the first data point in the stream has a time of `1536710401000000000` nanoseconds.  Let's convert that to a datetime so we can understand it better.

In [None]:
from btrdb.utils.timez import ns_to_datetime
ns_to_datetime(earliest.time)

Now, let's call `values` using our start time and ending 200 milliseconds later.

In [None]:
start = earliest.time
end = earliest.time + 2e8

for point, version in stream.values(start, end):
    print(point)

We can also access the `time` and `value` attribute of each point as follows:

In [None]:
for point, version in stream.values(start, end):
    print(point.time, point.value)

# Windowed Data

Aside from viewing the raw values, we can also query windows of data.  Windowed queries return `StatPoint` values that contain aggregations of the data in each window rather than individual `RawPoint` values.  Each `StatPoint` contains attributes for the minimum, maximum, mean, and standard deviation of the values in each window. StatPoints also report the number of raw points (i.e., the count) and the start time of the window.  You can find more information about windowing data in the [docs](https://btrdb.readthedocs.io/en/latest/working/stream-view-data.html#view-windows-of-data) or view the [API](https://btrdb.readthedocs.io/en/latest/api/streams.html) reference.

This time we will use another convenience function from `btrdb.utils.timez` which will help us to manipulate values at the nanosecond scale by better defining the query range and selecting aggregates of data at 100 millisecond intervals.


In [None]:
from btrdb.utils.timez import ns_delta

start = earliest.time
end = earliest.time + ns_delta(seconds=1)   # 1 second later
width = ns_delta(milliseconds=100)          # 100 milleseconds per window

for point, version in stream.windows(start, end, width):
    print(point)
    

Let's create a convenience function to view all of this data in a table using tabulate again.

In [None]:
def stat_table(points):
    attrs = ["time", "min", "max", "mean", "stddev", "count"]
    table = [attrs]
    for p in points:
        table.append([getattr(p, attr) for attr in attrs])
    return tabulate(table, headers="firstrow")

points = [point for point, _ in stream.windows(start, end, width)]
print(stat_table(points))

While we haven't used it in our examples, you can also optionally supply a `depth` argument to the `windows` query. The depth is the precision of the data in each window, which can greatly speed up your queries if you don't need to be precise down to the nanosecond level.  As an example, supplying `depth=30` would return to-second precision and take advantage of the BTrDB database to increase the retrieval performance.  Please see the docs for more information.

### Aligned Windows

The aligned windows method is an alternative aggregate query that more directly exploits the underlying database structure for the best possible performance. This method, `aligned_windows` takes a `pointwidth` argument that defines the window width as a power of two. Like the `windows` method, each point returned is a statistical aggregate of all the raw data within a window but of width $2^{pointwidth}$ nanoseconds.

Note that when bounding the query with a time range, `start` is inclusive, but `end` is exclusive. That is, results will be returned for all windows that start in the interval $[start, end)$. If $end < start+2^{pointwidth}$ you will not get any results. If start and end are not powers of two, the bottom pointwidth bits will be cleared. Each window will contain statistical summaries of the window. Statistical points with no points (`count=0`) will be ommitted.

In [None]:
pointwidth = 27    # 2^27 nanoseconds wide
points = [point for point, _ in stream.aligned_windows(start, end, pointwidth)]
print(stat_table(points))

This can be really useful if you dont particularly care where the start/end of your windows are, or if you just want a high level statistical view.

As an example, the code below will return a single window of all of the data so you can easily view statistics of the entire dataset. We will use the `currently_as_ns` function to get the current datetime as nanoseconds from epoch to use as our end time.

In [None]:
from btrdb.utils.timez import currently_as_ns

pointwidth = 52
points = [point for point, _ in stream.aligned_windows(start, currently_as_ns(), pointwidth)]
print(stat_table(points))