<img style="float: right;" src="hyperstream.svg">

# HyperStream Tutorial 1: Introduction

## Requirements

In order to run this and the following tutorials, it is necessary to have access to a MongoDB server running in the **localhost port 27017**. It is possible to change the host and port of the MongoDB server by modifying the configuration file __hyperstream_config.json__ located in the same folder as this notebook.

We also require all the dependencies listed in the HyperStream requirements, the installation instructions can be found in https://github.com/IRC-SPHERE/HyperStream

In [1]:
%load_ext watermark

import sys
sys.path.append("../") # Add parent dir in the Path

from hyperstream import HyperStream
from hyperstream import StreamId
from hyperstream import TimeInterval

from pytz import UTC
from datetime import datetime, timedelta

%watermark -v -m -p hyperstream -g

CPython 2.7.6
IPython 5.3.0

hyperstream 0.3.0-beta

compiler   : GCC 4.8.4
system     : Linux
release    : 3.19.0-80-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 4
interpreter: 64bit
Git hash   : 33621000d8f335505eac32a5a274194d2588a4c9


## Starting a Hyperstream instance

First of all, we will create a HyperStream instance. This instance will connect to the MongoDB server that is specified in the configuration file.

In [2]:
hs = HyperStream(loglevel=0)
print hs

HyperStream version 0.3.0-beta, connected to mongodb://localhost:27017/hyperstream


## Selecting a tool

HyperStream counts with a set of predefined tools in hyperstream.tools. These tools can be used to define the nodes of a factor graph that will produce values or compute certain functions given the specified input nodes. For this tutorial, we will focus on the **clock** tool. This tool produces time ticks from the specified start and stride times.

In [3]:
T = hs.channel_manager.tools

clock = StreamId(name="clock")

clock_tool = T[clock].window().last().value(stride=2.0)

## Specifying the memory channel

We need to specify where do we want to store the resulting stream of data that will be generated. It is possible to choose a MongoDB database instead of in memory selecting **hs.channel_manger.mongo**. In this tutorial we use the memory channel by creating an instance of memory and then creating the stream on it.

In [4]:
M = hs.channel_manager.memory

ticker = M.get_or_create_stream(stream_id=StreamId(name="ticker"))

## Executing the tool

Now we only need to specify the time interval that we want to query. We do this by specifieng the begining and end.

In [5]:
now = datetime.utcnow().replace(tzinfo=UTC)
before = (now - timedelta(seconds=10)).replace(tzinfo=UTC)

ti = TimeInterval(before, now)

Now that we defined the tool to use, where we want to store the results and the time interval, it is possible to execute the tool. Then, we will have all the computed results in the specified __sink__.

In [6]:
clock_tool.execute(sources=[], sink=ticker, interval=ti, alignment_stream=None)

In [7]:
ticker.calculated_intervals

TimeIntervals([TimeInterval(start=datetime.datetime(2017, 7, 6, 10, 48, 4, 437483, tzinfo=<UTC>), end=datetime.datetime(2017, 7, 6, 10, 48, 14, 437483, tzinfo=<UTC>))])

## Printing the results

The resulting stream is stored in the ticker. We can get now a list of tuples containing the timestamps and its corresponding clock value.

In [8]:
for timestamp, value in ticker.window().items():
    print '[%s]: %s' % (timestamp, value)

[2017-07-06 10:48:06+00:00]: 2017-07-06 10:48:06+00:00
[2017-07-06 10:48:08+00:00]: 2017-07-06 10:48:08+00:00
[2017-07-06 10:48:10+00:00]: 2017-07-06 10:48:10+00:00
[2017-07-06 10:48:12+00:00]: 2017-07-06 10:48:12+00:00
[2017-07-06 10:48:14+00:00]: 2017-07-06 10:48:14+00:00


# Executing new interval

It is possible to execute the tool again with an interval that has not been computed previously. In this case the missing interval will be computed again.

In [9]:
before = (now - timedelta(seconds=40)).replace(tzinfo=UTC)

ti = TimeInterval(before, now)

clock_tool.execute(sources=[], sink=ticker, interval=ti, alignment_stream=None)
ticker.calculated_intervals

TimeIntervals([TimeInterval(start=datetime.datetime(2017, 7, 6, 10, 47, 34, 437483, tzinfo=<UTC>), end=datetime.datetime(2017, 7, 6, 10, 48, 14, 437483, tzinfo=<UTC>))])

# Query

It is possible to query only certain window by passing the interval of interest to the stream.

In [10]:
time1 = (now - timedelta(seconds=30)).replace(tzinfo=UTC)
time2 = (now - timedelta(seconds=20)).replace(tzinfo=UTC)

ti = TimeInterval(time1, time2)

for timestamp, value in ticker.window(ti).items():
    print '[%s]: %s' % (timestamp, value)

[2017-07-06 10:47:46+00:00]: 2017-07-06 10:47:46+00:00
[2017-07-06 10:47:48+00:00]: 2017-07-06 10:47:48+00:00
[2017-07-06 10:47:50+00:00]: 2017-07-06 10:47:50+00:00
[2017-07-06 10:47:52+00:00]: 2017-07-06 10:47:52+00:00
[2017-07-06 10:47:54+00:00]: 2017-07-06 10:47:54+00:00
