# IBM Streams Event Store sample application

This sample demonstrates creating a Streams Python application that ingests data into a Db2 Event Store table and viewing the metrics of the insert operation.

In this notebook, you'll see examples of how to :
 1. [Setup your data connections](#setup)
 2. [Create the application](#create)
 3. [Submit the application](#launch)
 4. [Connect to the running instance to view metrics](#view)
 5. [Stop the application](#cancel)

# Overview

**About the sample**

This application simulates data tuples that are inserted as rows in a Db2 Event Store table.

**How it works**

The Python application created in this notebook is submitted to the IBM Streams service for execution. Once the application is running in the service, you can connect to the service instance from the notebook to retrieve the metrics.

<img src="https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2019/04/how-it-works.jpg" alt="How it works">


### Documentation

- [Streams Python development guide](https://ibmstreams.github.io/streamsx.documentation/docs/latest/python/)
- [Streams Python API](https://streamsxtopology.readthedocs.io/)
- [Db2 Event Store install and set up](https://www.ibm.com/support/knowledgecenter/en/SSGNPV_2.0.0/local/installsetup.html)


<a name="setup"></a>
# 1. Setup
### 1.1 Add credentials for the IBM Streams service

In order to submit a Streams application you need to provide the name of the Streams instance.

1. From the navigation menu, click **My instances**.
2. Click the **Provisioned Instances** tab.
3. Update the value of `streams_instance_name` in the cell below according to your Streams instance name.

In [None]:
from icpd_core import icpd_util
streams_instance_name = "my-instance" ## Change this to Streams instance
cfg=icpd_util.get_service_instance_details(name=streams_instance_name)

### 1.2 Optional: Upgrade the `streamsx.eventstore` Python package

Uncomment and run the cell below if you want to upgrade to the latest version of the streamsx.eventstore package.

In [None]:
#import sys
#!{sys.executable} -m pip install --user --upgrade streamsx.eventstore

In [None]:
import streamsx.eventstore as es
import streamsx.topology.context
print("INFO: streamsx package version: " + streamsx.topology.context.__version__)
print("INFO: streamsx.eventstore package version: " + es.__version__)

### 1.3 (OPTIONAL) Update streamsx.eventstore toolkit with latest release
Get the **latest** streamsx.eventstore toolkit release from GitHub.

The Streams toolkit will be downloaded and added to the Topology.

In this case the latest released IBM Streams EventStore **toolkit** is used for building the Streams application und not the toolkit that it located on the IBM Streams build service.

In [None]:
import streamsx.eventstore as es

# If url is None, then the latest toolkit release will be downloaded.
url=None

# download event store toolkit from GitHub
eventstore_toolkit = es.download_toolkit(url)

### 1.4 Configure the connection to Db2 Event Store

Update the name for the EventStore service/instance name.
Run the cell below to configure the connection for the IBM Streams application.

The connection details and credentials are stored in an application configuration for IBM Streams application. As name for application configuration is the name of Event Store instance used (`eventstore_instance`). The application configuration contains the following information:
* Name of the database (`es_db`)
* SCALA connection string (`es_connection`)
* Event Store user and password (`es_user` and `es_password`)
* Passwords for SSL certifcates (`es_truststore_password` and `es_keystore_password`)

The "EventStoreWriter" in the application is configured with the name of the application configuration for the connection details stored in the `app_cfg` variable.
Furthermore the "EventStoreWriter" requires the location of the truststore and keystore for the SSL connection (`es_truststore` and `es_keystore`).

In [None]:
# Change the name according to your Event Store instance
eventstore_instance='EventStore-1'

In [None]:
from streamsx.rest import Instance
import streamsx.topology.context
import streamsx.eventstore as es

cfg[streamsx.topology.context.ConfigParams.SSL_VERIFY] = False
instance = Instance.of_service(cfg)

# check if application configuration exists
eventstore_app_config = instance.get_application_configurations(name=eventstore_instance)
if eventstore_app_config:
    # retrieve Event Store service details
    eventstore_cfg=icpd_util.get_service_instance_details(name=eventstore_instance)
    # get location of clientkeystore file and set values for es_truststore, es_keystore, app_cfg for later use
    es_truststore, es_keystore = es.get_certificate(eventstore_cfg, name=eventstore_instance)
    app_cfg = eventstore_instance
else:
    # retrieve Event Store service details
    eventstore_cfg=icpd_util.get_service_instance_details(name=eventstore_instance)
    es_db, es_connection, es_user, es_password, es_truststore, es_truststore_password, es_keystore, es_keystore_password = es.get_service_details(eventstore_cfg, name=eventstore_instance)
    # create application configuration
    app_cfg = es.configure_connection(instance, name=eventstore_instance, database=es_db, connection=es_connection, user=es_user, password=es_password, keystore_password=es_keystore_password, truststore_password=es_truststore_password)

print(app_cfg)

<a id="create"></a>
# 2. Create the application
This application is going to ingest simulated tuples.

These simulated tuples are inserted as rows into a table using Db2 Event Store Scala API. This functionality is provided by `streamsx.eventstore.insert()` as "EventStoreWriter".

In this example we create a table of two columns:
**| ID:Long  |  NAME:String |**

In the application we define the type `tuple<int64 ID, rstring NAME>` as IBM Streams schema.

Important: The tuple field types and positions in the IBM Streams schema must match the field names in your IBM Db2 Event Store table schema exactly.

Supported types: [Mapping of Event Store types to SPL types](https://ibmstreams.github.io/streamsx.eventstore/doc/spldoc/html/tk$com.ibm.streamsx.eventstore/op$com.ibm.streamsx.eventstore$EventStoreSink$1.html)


In this example, the sharding key and the primary key are defined on the same column `ID`.

Databases in IBM Db2 Event Store are partitioned into shards. Any given IBM Db2 Event Store node (in a multi-node IBM Db2 Event Store cluster) contains 0, 1, or N shards of the defined database. In addition to the mandatory shard key, you can optionally provide a primary key. When you define a primary key, IBM Db2 Event Store ensures that only a single version of each primary key exists in the database.

**If the table does not exist in the database, then the table is created by the "EventStoreWriter".**

All Streams applications start with  a `Topology` object, so start by creating one:


In [None]:
from streamsx.topology.topology import Topology

topo = Topology(name="EventStoreInsertSample")

<a id="create"></a>
### 2.0.1 (OPTIONAL) Add downloaded streamsx.eventstore toolkit to the topology

In this case the toolkit downloaded in step 1.3.1 used for building the Streams application und not the toolkit that it located on the IBM Streams build service.

In [None]:
if eventstore_toolkit is not None:
    # add event store toolkit to topology
    streamsx.spl.toolkit.add_toolkit(topo, eventstore_toolkit)

## 2.1 Define the application


First step is to define a data source that produces the data being processed.
For this, create a `Stream` called  `pulse` that will contain the simulated data with two attributes.
A `Stream` is a potentially infinite sequence of tuples containing the data to be analyzed.

Next, use the data source `Stream` as input for the "EventStoreWriter".

**Define the table schema**

* The table name is defined in the `table` variable. This is applied as `table` parameter to the `streamsx.eventstore.insert()` function
* The table schema is specified by the Stream type `tuple<int64 ID, rstring NAME>`
* Primary key of the table is applied as `primary_key` parameter to the `streamsx.eventstore.insert()` function

In [None]:
import streamsx.spl.op as op
import streamsx.eventstore as es
from streamsx.topology.schema import StreamSchema
import random
import time


# This application creates a table with the name below
table = 'StreamsEventStoreSampleTable'

def generate_data():
    counter = 0
    while True:
        #yield a random id and name
        yield  {"NAME": "id_" + str(random.randint(0,10)), "ID": counter}
        counter = counter + 1
        time.sleep(0.01)

# convert it to SPL schema for the Event stream operator
tuple_schema = StreamSchema("tuple<int64 ID, rstring NAME>")
# Generates data for a stream of two attributes. Each attribute maps to a column using the same name of the DB2 Event Store table.
pulse = topo.source(generate_data, name="GeneratedData").map(lambda tpl: (tpl["ID"], tpl["NAME"]), schema=tuple_schema)


# configure the number of tuples that are inserted as batch
batch_size=100

# insert tuple data into table as rows
sink=es.insert(pulse, config=app_cfg, table=table, batch_size=batch_size, primary_key='ID', partitioning_key='ID', truststore=es_truststore, keystore=es_keystore, name='ES_Inserter')

<a name="launch"></a>

# 3. Submit the application
A running Streams application is called a *job*. This next cell submits the application for execution and prints the resulting job id.

In [None]:
from streamsx.topology import context

# Disable SSL certificate verification if necessary
cfg[context.ConfigParams.SSL_VERIFY] = False
# submit the topology 'topo'
submission_result = context.submit ("DISTRIBUTED", topo, config = cfg)

# The submission_result object contains information about the running application, or job
if submission_result.job:
    streams_job = submission_result.job
    print ("JobId: ", streams_job.id , "\nJob name: ", streams_job.name)

<a name="view"></a>

# 4. Collect metrics from the job
Now that the job is started, connect to the instance and view the metrics of the "Event Store Writer".

The following metrics are collected: 
* nWriteSuccesses - number of successful inserts
* nWriteFailures - number of failed inserts
* insertTimeMin - minimum time of insert operation in ms
* insertTimeAvg - average time of insert operation in ms
* insertTimeMax - maximum time of insert operation in ms
* nActiveInserts - indicates if inserts is active or not (1: active, 0: inactive)

It prints one line per metrics fetch every 5 seconds for 2 minutes.

**Collecting operator metrics**

Each kind of operator provides different metrics with specific names. Find below example code to retrieve all operator metrics from all jobs in a Streams instance:

---------
```
from streamsx.rest import Instance
from streamsx.topology import context

# Disable SSL certificate verification if necessary
cfg[context.ConfigParams.SSL_VERIFY] = False
# Streams instance object
instance = Instance.of_service(cfg)
# Get list of all running jobs in the Streams instance
job_list = instance.get_jobs()
for j in job_list:
    job_id = j.id
    job_name = j.name
    print("JobId: "+job_id + " Name: "+ job_name)
    # Loop over each operator of the job
    op_list = j.get_operators()
    for op in op_list:
        print("Operator name:" + op.name + " kind: " + op.operatorKind)
        # List all metrics of the operator
        m = op.get_metrics()
        if len(m) > 0:
            print("Metric "+m[0].name + ": "+str(m[0].value))
```
---------

In [None]:
from streamsx.rest import Instance
import streamsx.topology.context
from time import sleep
from datetime import datetime

# retrieve metrics from submitted job
j = submission_result.job

interval = 5 # collect metrics after n seconds
n = 24 # collect n times, -1 infinite
i = 0
metrics_info = ""
sleep(20) # initial delay since job startup takes some time
while True:
    i = i + 1
    op = j.get_operators('ES_Inserter')[0]
    if op.operatorKind == 'com.ibm.streamsx.eventstore::EventStoreSink':
        time_info = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        for metric in ("nWriteSuccesses","nWriteFailures","insertTimeMin","insertTimeAvg","insertTimeMax","nActiveInserts"):
            m = op.get_metrics(name=metric)
            if len(m) > 0:
                metrics_info = metrics_info + " " + m[0].name + ": " + str(m[0].value)

        print(time_info + " " + metrics_info)
    if i == n:
        break
    sleep(interval)
    metrics_info = ""


## 4.1 See job status 

You can view job status and logs by going to **My Instances** > **Jobs**. Find your job based on the id printed above.
Retrieve job logs using the "Download logs" action from the job's context menu.

To view other information about the job such as detailed metrics, access the Streams Console.  Go to **My Instances** > **Provisioned Instances**. Select the Streams instance and open the URL listed under *externalConsoleEndpoint* or *serviceConsoleEndpoint*.

<a name="cancel"></a>

# 5. Cancel the job

This cell generates a widget you can use to cancel the job.

In [None]:
# cancel the job in the IBM Streams service
submission_result.cancel_job_button()

You can also interact with the job through the [Job](https://streamsxtopology.readthedocs.io/en/stable/streamsx.rest_primitives.html#streamsx.rest_primitives.Job) object returned from `submission_result.job`

For example, use `job.cancel()` to cancel the running job directly.

# Summary

We started with a `Stream` called `pulse`, which contained the data we wanted to insert. Next, we used the `pulse` stream as input for the "Event Store Writer" to insert rows in the specified table.  

After submitting the application to the Streams service, we connected to the instance via REST and displayed the "Event Store Writer" metrics to see the progress within the notebook.