# IBM Streams Event Streams sample application

This sample demonstrates how to create a Streams Python application that ingests data into the [IBM Event Streams](https://cloud.ibm.com/catalog?search=Event%20Streams) service, and consumes the data from Event Streams. The IBM Event Streams service is a fully managed Kafka Service within the IBM cloud.

In this notebook, you'll see examples of how to :
 1. [Setup your data connections](#setup)
 2. [Create the application](#create)
 3. [Submit the application](#launch)
 4. [Connect to the running application to view data](#view)
 5. [Stop the application](#cancel)

# Overview

**About the sample**

This application creates artificial sensor data and writes them into a topic in the IBM Event Streams instance, subscribes to the same topic and filters out the data of one sensor.

**How it works**

The Python application created in this notebook is submitted to the IBM Streams service for execution. Once the application is running in the service, you can connect to it from the notebook to retrieve the results.

<img src="https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2019/04/how-it-works.jpg" alt="How it works">


### Documentation

- [Streams Python development guide](https://ibmstreams.github.io/streamsx.documentation/docs/latest/python/)
- [Streams Python API](https://streamsxtopology.readthedocs.io/)



<a name="setup"></a>
# 1. Setup
### 1.1 Add credentials for the IBM Streams service

In order to submit a Streams application you need to provide the name of the Streams instance.

1. From the navigation menu, click **My instances**.
2. Click the **Provisioned Instances** tab.
3. Update the value of `streams_instance_name` in the cell below according to your Streams instance name.

In [1]:
from icpd_core import icpd_util
streams_instance_name = "cp4d-streams-instance" ## Change this to Streams instance
cfg=icpd_util.get_service_instance_details(name=streams_instance_name)

### 1.2 Optional: Upgrade the `streamsx.eventstreams` Python package

Uncomment and run the cell below if you want to upgrade to the latest version of the `streamsx.eventstreams` package.


In [None]:
#!pip install --user --upgrade streamsx.eventstreams

The python packages will be installed in the top of user path.<br/>
If you have problem to get the latest version of python packages you can set the order of python packages manually to user path.<br/>
you can find the user path with this command:<br/>
`
import sys
for e in sys.path:
    print(e)
`

In [None]:
#import sys
#sys.path.insert(0, '/home/wsuser/.local/lib/python3.6/site-packages')

In [2]:
import streamsx.eventstreams as eventstreams
import streamsx.topology.context
print("INFO: streamsx package version: " + streamsx.topology.context.__version__)
print("INFO: streamsx.eventstreams package version: " + eventstreams.__version__)

INFO: streamsx package version: 1.13.14
INFO: streamsx.eventstreams package version: 1.3.1


### 1.3 Configure the connection to the IBM Event Streams service

To connect with the Event Streams cloud service, we need service credentials, and at least one topic within the service instance.

To create the credentials and a topic, do the following steps:

1. Create an Event Streams service instance on IBM cloud.

   You need to have an IBM account to be able to do this.
   
   https://cloud.ibm.com/catalog?search=Event%20Streams
   <br>
   
1. Under *Topics*, create one topic. You can use the default values for all settings. The topic name will be used later in the notebook.
1. Under *Service credentials*, create new credentials. You can leave all settings at their defaults.
1. View the created credentials, and copy them to the clipboard
1. Paste the credential into the `Your Event Streams credentials:` prompt in the next cell.


In [3]:
import getpass
eventstreams_credentials_json = getpass.getpass('Your Event Streams credentials:')

Your Event Streams credentials:········


Create an application configuration in the IBM Streams service for the Event Streams service credentials.
This is the safest way to avoid the credentials being exposed.

In [4]:
# create an application configuration
from streamsx.rest import Instance

cfg[streamsx.topology.context.ConfigParams.SSL_VERIFY] = False
instance = Instance.of_service(cfg)
app_config_name = eventstreams.configure_connection(instance,
                                                    name='eventstreams',
                                                    credentials=eventstreams_credentials_json)
print("INFO: Name of your application configuration: " + app_config_name)



create application configuration: eventstreams
INFO: Name of your application configuration: eventstreams


In the Event Streams service, create the *topic* where you want to publish the data. You can use the default settings for partitions and retention hours. Enter the topic name when you run the next cell.


In [6]:
topic = 'cp4dtest0310'# Enter the topic name here


<a id="create"></a>
# 2. Create the application

This application is going to ingest readings from simulated sensors into a topic in the Event Streams service. Another part of the application subscribes to the topic and filters out one sensor of interest.  

All Streams applications start with  a `Topology` object, so start by creating one:


In [7]:
from streamsx.topology.topology import Topology

topo = Topology(name="EventStreamsSample")

## 2.1 Define sources
Your application needs some data to analyze, so the first step is to define a data source that produces the data being processed. 

Next, use the data source to create a `Stream` object. A `Stream` is a potentially infinite sequence of tuples containing the data to be analyzed.

In this example, we use JSON objects, which are Python dicts. Other supported formats include Strings, structured tuples, and more. [See the doc for all supported formats](http://ibmstreams.github.io/streamsx.topology/doc/pythondoc/streamsx.topology.topology.html#stream).

### 2.1.1 Define a source class

Define a *callable* class that will produce the data to be analyzed.

This example class produces readings from sensors.

In [8]:
import random 
import time
from datetime import datetime, timedelta

# define a callable source 
class SensorReadingsSource(object):
    def __call__(self):
        # This is just an example of using generated data, 
        # Here you could connect to db
        # generate data
        # connect to data set
        # open file
        
        while True:
            time.sleep(0.005)
            sensor_id = random.randint(1,100)
            reading = {}
            reading["sensor_id"] = "sensor_" + str(sensor_id)
            reading["value"] =  random.random() * 3000
            reading["ts"] = int((datetime.now().timestamp()))
            yield reading

### 2.1.2  Create the `Stream `

Create a `Stream` with `CommonSchema.Json` schema called  `Readings` that will contain the simulated data that `SensorReadingsSource` produces:

In [9]:
# create a stream from the data using Topology.source
readings = topo.source(SensorReadingsSource(), name="Readings").as_json()

## 2.2 Publish the tuples in the Event Streams service

Now publish the data of the `readings` stream to the topic you have configured in the `topic` variable.


In [10]:
eventstreams.publish(readings,
                     topic,
                     credentials=app_config_name,
                     name="EventStrPublish")

<streamsx.topology.topology.Sink at 0x7fc994618ac8>

**Summary:**

By now, you have defined a streaming application that generates simulated data and publishes the data in a topic within an Eventstreams service. You could submit the application now, so that any other application could consume the data from the Eventstreams service.

In the next steps, you extend the `topo` topology by a consumer that consumes and analyzes the data.

## 2.3 Subscribe to the Eventstreams topic and consume the data

When you subscribe to the topic, you create a new data source, that connects to the Eventstreams service. The stream of data shall have the  `Json` schema.

In [11]:
from streamsx.topology.schema import CommonSchema
# create a new Json stream in the topology
sensordata = eventstreams.subscribe(topo,
                                    topic,
                                    CommonSchema.Json,
                                    credentials=app_config_name,
                                    name="EventStrSubscribe")

## 2.4 Analyze the data

Use a variety of methods in the `Stream` class to analyze your in-flight data, including applying machine learning models.

See the [common operations section](https://ibmstreams.github.io/streamsx.documentation/docs/python/1.6/python-appapi-devguide-4/) of the developer guide and the [documentation on the Stream class](https://ibmstreams.github.io/streamsx.topology/doc/pythondoc/streamsx.topology.topology.html#streamsx.topology.topology.Stream) for more details.



### 2.2.1 Filter data from the  `Stream`  

Use `Stream.filter()` to pass through only data that match a certain condition.

In [12]:
# in this example, pass through only sensor data from sensor with ID "sensor_3"

sensordata_id3 = sensordata.filter(lambda x: x["sensor_id"] == "sensor_3",
                                   name="SensorsId3")

# you could create another stream of the other sensors:
#sensordata_other = sensordata.filter(lambda x: x["sensor_id"] != "sensor_3", name="OtherSensors")


# 2.3 Create a `View` to preview the tuples on the `Stream` 


A `View` is a connection to a `Stream` that becomes activated when the application is running. We examine the data from within the notebook in section 4, below.


In [13]:
sensor3_view = sensordata_id3.view(name="Sensor3",
                                   description="Sample of sensor with ID sensor_3")

# 2.4 Define output

The `sensordata_id3` stream is our final result. We will use `Stream.publish()` to make this stream available to other Streams applications. 

If you want to send the stream to another database or system, you would use a sink function (similar to the source function) and invoke it using `Stream.for_each`.

You can also the functions of other Python packages to send the stream to other systems, for example the eventstore.

In [14]:
import json
# publish results as JSON
sensordata_id3.publish(topic="SensorData",
                       schema=json,
                       name="PublishSensors")

# other options include:
# invoke another sink function:
#sensordata_id3.for_each(func=send_to_db)


<streamsx.topology.topology.Sink at 0x7fc99462c278>

<a name="launch"></a>

# 3. Submit the application
A running Streams application is called a *job*. This next cell submits the application for execution and prints the resulting job id.

In [15]:
from streamsx.topology import context

# disable SSL certificate verification if necessary
cfg[context.ConfigParams.SSL_VERIFY] = False
# submit the topology 'topo'
submission_result = context.submit("DISTRIBUTED", topo, config=cfg)

# the submission_result object contains information about the running application, or job
if submission_result.job:
    streams_job = submission_result.job
    print("JobId: ", streams_job.id , "\nJob name: ", streams_job.name)

IntProgress(value=0, bar_style='info', description='Initializing', max=10, style=ProgressStyle(description_wid…

Insecure host connections enabled.
Insecure host connections enabled.
Insecure host connections enabled.


JobId:  58 
Job name:  StreamsTutorialandTestbed::EventStreamsSample_58


<a name="view"></a>

# 4. Use a `View` to access data from the job
Now that the job is started, use the `View` object you created in step 2.3 to start retrieving data from a `Stream`.

In [None]:
# connect to the view and display the data
queue = sensor3_view.start_data_fetch()
try:
    for val in range(10):
        print(queue.get())    
finally:
    sensor3_view.stop_data_fetch()

## 4.1 Display the results in real time
Calling `View.display()` from the notebook displays the results of the view in a table that is updated in real-time.

In [None]:
# display the results for 30 seconds
sensor3_view.display(duration=30)


## 4.2 See job status 

You can view job status and logs by going to **My Instances** > **Jobs**. Find your job based on the id printed above.
Retrieve job logs using the "Download logs" action from the job's context menu.

To view other information about the job such as detailed metrics, access the graph. Go to **My Instances** > **Jobs**. Select "View graph" action for the running job.


<a name="cancel"></a>

# 5. Cancel the job

This cell generates a widget you can use to cancel the job.

In [None]:
# cancel the job in the IBM Streams service
submission_result.cancel_job_button()

You can also interact with the job through the [Job](https://streamsxtopology.readthedocs.io/en/stable/streamsx.rest_primitives.html#streamsx.rest_primitives.Job) object returned from `submission_result.job`

For example, use `job.cancel()` to cancel the running job directly.

# Summary

We started with a `Stream` called `readings`, which contained the data that we published in the Event Streams service. Next, we created a new Stream `sensordata` by subscribing to the topic in the EventStreams Service, filtered out one sensor of interest, and `published` the filtered stream for other applications running within our Streams instance to access.

After submitting the application to the IBM Streams service, we connected to the `sensor3_view` view to see the data of sensor 3 within the notebook.

You may have noticed that the application consists of two independent parts: One part generates the data and publishes the them to the Event Streams cloud service. The other part consumes from Event Streams, filters, and publishes the stream within the IBM Streams instance. These two parts can also be declared by using different topologies, and can be submitted as separate jobs.