# IBM Streams database sample application

This sample demonstrates creating a Streams Python application to connect to a DB2 Warehouse database, perform some SQL queries, and viewing the results.

In this notebook, you'll see examples of how to:
 1. [Setup your data connections](#setup)
 2. [Create the application](#create)
     - [Drop and create the table](#drop)
     - [Insert streaming data](#insert)
     - [Retrieve data from the table](#select)
 3. [Submit the application](#launch)
 4. [View the results](#view)
 5. [Job status](#status)
 6. [Stop the application](#cancel)


# Overview
**About the sample**

This application demonstrates how to drop a table, create a table, insert rows into a table and to get (SELECT) rows from a Db2 Warehouse database.

**How it works**
   
The Python application created in this notebook is submitted to the IBM Streams service for execution. Once the application is running in the service, you can connect to it from the notebook to retrieve the results.

<img src="https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2019/04/how-it-works.jpg" alt="How it works">


### Documentation
- [Streams Python development guide](https://ibmstreams.github.io/streamsx.documentation/docs/latest/python/)
- [Streams Python API](https://streamsxtopology.readthedocs.io/)



## <a name="setup"> </a> 1. Setup

### 1.1 Add credentials for the IBM Streams service

With the cell below selected, click the "Connect to instance" button in the toolbar to insert the credentials for the service.

<a target="blank" href="https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2019/02/connect_icp4d.gif">See an example</a>.

### 1.2 Install or upgrade `streamsx.database` package

Run the cell below to upgrade to the latest version of the `streamsx.database` package or to install the package.

In [None]:
import sys
!{sys.executable} -m pip install --user --upgrade streamsx.database

# When you need to install a specific version of the package, run this line instead:
#!pip install --user streamsx.database==somever

In [None]:
import streamsx.database as db
import streamsx.topology.context
print("INFO: streamsx package version: " + streamsx.topology.context.__version__)
print("INFO: streamsx.database package version: " + db.__version__)


### 1.3 Configure the connection to Db2 Warehouse

We need a DB2 credentials as JSON string to connect to a DB2 database.
This JSON string contains the database credentials **username**, **password** and **jdbcurl**.

To create a DB2 credentials, please perform the following steps:

- 1 Create a Db2 Warehouse service on IBM cloud.

  you need to have an IBM account to create a DB2 service.

  https://console.bluemix.net/catalog/?search=db2

- 2 Create a service credential for DB2 service on IBM cloud.
- 3 Copy the credentials in clipboard.
- 4 Paste the credentials into Db2 Warehouse credentials prompt below.

If you want to use another DB2 database, you can create a JSON string with the following attributes:

    {
      "username": "your-db-user-name",
      "password": "your-db-password",
      "jdbcurl": "jdbc:db2://your-db2-hostname:50000/your-database-name"
    }

In [None]:
import getpass
db2_service_credentials=getpass.getpass('Db2 Warehouse credentials:')

## <a name="create"> </a> 2. Create the application
All Streams applications start with a Topology object, so start by creating one:


In [None]:
#Imports
from streamsx.topology.topology import *
from streamsx.topology.context import *
from streamsx.topology.schema import StreamSchema
import streamsx.database as db
import json

# store the databse credentials in db2credentials
db2credentials = json.loads(db2_service_credentials)

# create a Topology object
topo = Topology(name="database")



### How to use the streamsx.database package
To interact with the database from Streams, you pass a SQL statement to the `streamsx.database.run_statement` function.

For example, this application executes SQL statements that: 
- drop the DB2 table, if exists.
- create a new table in a DB2 database.
- insert some rows into the table.
- select all rows from a table.


### Define the SQL statements and table name

In [None]:
table_name = 'RUN_SAMPLE_DEMO'

# SQL statements
sql_drop   = 'DROP TABLE ' + table_name
sql_create = 'CREATE TABLE ' + table_name + ' (ID INT, NAME CHAR(30), AGE INT)'
sql_insert = 'INSERT INTO ' + table_name + ' (ID, NAME, AGE) VALUES (? , ?, ?)'
sql_select = 'SELECT * FROM ' + table_name

## <a name="drop"> </a> 2.1. Create the table
The **run_statement** is the main function of **streamsx.database** package.

The streamsx.database package is the python wrapper for [streamsx.jdbc](https://ibmstreams.github.io/streamsx.jdbc/doc/spldoc/html) toolkit.

It needs at last two mandatory parameters, the first one is a stream as input and the secund parameter is the database credentials in json format.

In the following step the topo.source creates a stream with two sql statements and db.run_statement uses this as input stream to drop the table and create a new one table.

In [None]:
# The streamToCreateTable is the ouput os a topology stream that delivers two strings: sql_drop and sql_create
streamToCreateTable = topo.source([sql_drop, sql_create]).as_string()
# drop the table if exist and create a new table in database
createTableResults = db.run_statement(name="CREATE_TABLE", stream=streamToCreateTable, credentials=db2credentials)
createTableResults.print()


## <a name="insert"> </a> 2.2. Insert streaming data into the table

Generate a stream of data and insert it into the table we created.

The **run_statement** is based on [streamsx.jdbc](https://ibmstreams.github.io/streamsx.jdbc/doc/spldoc/html) toolkit.

The function **generate_data()** generates some data with schema (ID INTEGER, NAME STRING, AGE INTEGER).

We have to convert the output of generate_data() to the streams tuple schema, because the **JDBCRun** opeartor in **streamsx.jdbc** toolkit accepets only the SPL data tapes.

Here is a list of data type in IBM streams: [IBM Streams Data types](https://www.ibm.com/support/knowledgecenter/en/SSCRJU_4.3.0/com.ibm.streams.ref.doc/doc/primitivetypes.html)


The **genData** is also a topology source stream. It gets the data from **generate_data()** and produces a stream of three attributes as input for INSERT. 

The **run_statement** uses in the first step **streamToCreateTable** as input stream, but in the following step, it uses **genData** as input stream.


In [None]:
import random
import time

# generates some data with schema (ID, NAME, AGE)
def generate_data():
    for counter in range(0, 5000):
        #yield a random id, name and age
        yield  {"NAME": "Name_" + str(random.randint(0,500)), "ID": counter, "AGE": random.randint(10,99)}
        time.sleep(0.10)

# convert it to SPL schema for the database operator run_statement
tuple_schema = StreamSchema("tuple<int32 ID, rstring NAME, int32 AGE>")
# Generates data for a stream of three attributes. Each attribute maps to a column using the same name of the DB2 database table.
genData = topo.source(generate_data, name="GeneratedData").map(lambda tpl: (tpl["ID"], tpl["NAME"], tpl["AGE"]), schema=tuple_schema)

genData.print()

# insert generated rows into table
# It uses the statement (sql) and statementParamAttrs (sql_params) of streamsx.jdbc toolkit 
insertResults= db.run_statement(name="INSERT", stream=genData, sql=sql_insert, sql_params="ID, NAME, AGE" ,credentials = db2credentials)
insertResults.print()



## <a name="select"> </a> 2.3. Retrieve data from the table
In this step the **run_statement** runs the SQL statement "SELECT * FROM RUN_SAMPLE_DEMO" and returns the results in tuple schema tuple<int32 ID, rstring NAME, int32 AGE>'

In [None]:
# select all rows from table
selectResults= db.run_statement(name="SELECT", schema='tuple<int32 ID, rstring NAME, int32 AGE>', stream=genData, sql=sql_select, credentials = db2credentials)
selectResults.print()

# create a view to check retrieving data from a table
selectView = selectResults.view(name="selectRecords", description="Sample of selected records")


# <a name="launch"> </a> 3. Submit the application

A running Streams application is called a *job*. This next cell submits the application for execution and prints the resulting job id.

In [None]:
from streamsx.topology import context

# Disable SSL certificate verification if necessary
cfg[context.ConfigParams.SSL_VERIFY] = False
# submit the topology 'topo'
submission_result = context.submit ("DISTRIBUTED", topo, config = cfg)

# The submission_result object contains information about the running application, or job
if submission_result.job:
    streams_job = submission_result.job
    print ("JobId: ", streams_job.id , "\nJob name: ", streams_job.name)

# <a name="view"> </a> 4. Use the View to access data from the job

Now that the job is started, use the View object you have already created to start retrieving data from a table in database.

In [None]:
# Connect to the view and display the selected data
queue = selectView.start_data_fetch()
try:
    for val in range(20):
        print(queue.get())    
finally:
    selectView.stop_data_fetch()

# <a name="status"> </a> 5. See job status

You can view job status and logs by going to My Instances > Jobs. Find your job based on the id printed above. Retrieve job logs using the "Download jobs" action from the job's context menu.

To view other information about the job such as detailed metrics, access the Streams Console. Go to My Instances > Provisioned Instances. Select the Streams instance and open the URL listed under externalConsoleEndpoint or serviceConsoleEndpoint.



# <a name="cancel"></a> 6. Cancel the job

This cell generates a widget you can use to cancel the job.


In [None]:
#cancel the job in the IBM Streams service
submission_result.cancel_job_button()

You can also interact with the job through the Job object returned from submission_result.job

For example, use job.cancel() to cancel the running job directly.

## Summary

We started with a Stream a job, which connected to DB2 database, dropped a table, created a table, inserted some rows into table and reads the rows.

After submitting the application to the Streams service, we checked the application logs to see the progress.

It is also possible to check the conatin of the test table on DB2 console with the follwing command.

      db2 "SELECT * FROM RUN_SAMPLE_DEMO"
