# Introduction to IBM Db2 Event Store API

IBM Db2 Event Store is a hybrid transactional/analytical processing (HTAP) system. It extends the Spark SQL interface to support transactions and accelerate analytics queries. This notebook includes examples of using the Scala client interface to create a database and a table. It also shows how to insert and query data in IBM Db2 Event Store by using Spark SQL.

> When you finish this demo, you will know how to manage and query data using IBM Db2 Event Store.    

This notebook runs on Scala 2.10 (and later versions) with Spark.


## Table of contents
1. [Define a database](#define-database)<br>
   1.1 [Open an existing database](#open-existing-db)<br>
2. [Create your table](#create-table)<br>
   2.1 [Define a schema for the table](#define-schema)<br>
   2.2 [Create the table](#create-table-two)<br>
   2.3 [Get a schema reference for the resolved table](#schema-reference)<br>
3. [Generate and insert data rows](#generate-insert-data)<br>
4. [Query the table](#query-table)<br>
   4.1 [Create sqlContext using EventSession](#create-sqlContext)<br>
   4.2 [Prepare a DataFrame for the query](#prepare-DataFrame)<br>
   4.3 [Run the SQL query](#run-query)<br>
5. [Drop the table](#drop-table)<br>
    


In [None]:
import com.ibm.event.common.ConfigurationReader

<a id="connect-to-es"></a>
### 1. Setup connection to IBM Db2 Event Store

To establish a connection to IBM Db2 Event Store, you need connection endpoints. Use the configuration reader to provide a set of APIs for IBM Db2 Event Store connection and configuration. 

You need specify mulitple jdbc as well as connection endpoints by providing a connection string in following format:

`# ConfigurationReader.setConnectionEndpoints("<ProxyIP>:<JDBCPort>;<HostIP1>:<ConnectionPort1>,<HostIP2>:<ConnectionPort2>,<HostIP3>:<ConnectionPort3>")` 

For ICP4Data users, you can find connection string in the UI.

You also need to set the userID and password that will be used to connect to IBM Db2 Event Store instance.


In [None]:
// Using the configuration reader API, set up the userID and password that will be used to connect to IBM Db2 Event Store.

ConfigurationReader.setEventUser("<userid>")

ConfigurationReader.setEventPassword("<password>")

<a id="define-database"></a>
## 1. Define a database  
Only one database can be concurrently active in IBM Db2 Event Store. If you already have a database, you don't need to create one.


<a id="open-existing-db"></a>
###  1.1 Open an existing database
To use an existing database, use the following call:

In [None]:
import com.ibm.event.oltp.EventContext
val eContext = EventContext.getEventContext("EVENTDB")

<a id="create-table"></a>
## 2. Create your table

<a id="define-schema"></a>
### 2.1 Define a schema for the table
To create a new table, you must first specify a schema for the table.
Specify the columns, sharding key, and primary key, as required.

In [None]:
import org.apache.spark.sql.types._
import com.ibm.event.catalog.TableSchema
val reviewSchema = TableSchema("ReviewTable", 
       StructType(Array(
          StructField("userId", LongType, nullable = false),
          StructField("categoryId", IntegerType, nullable = false),
          StructField("productName", StringType, nullable = false),
          StructField("boolfield", BooleanType, nullable = false),
          StructField("boolfield2", BooleanType, nullable = true),
          StructField("duration", IntegerType, nullable = false ),
          StructField("review", StringType, nullable = false))),
        shardingColumns = Seq("userId"), pkColumns = Seq("userId"))

<b>Tip:</b> Databases in IBM Db2 Event Store are partitioned into shards. Any IBM Db2 Event Store node of a multi-node IBM Db2 Event Store cluster contains 0, 1 or N shards of the defined database. In addition to the mandatory shard key, there is also the option to provide a primary key. When this key is defined, IBM Db2 Event Store ensures that only a single version of each primary key exists in the database.

In the above example, a sharding key and a primary key are defined on some columns. 

<a id="create-table-two"></a>
### 2.2 Create the table
Create the IBM Db2 Event Store table based on the above, unresolved schema.

In [None]:
eContext.createTable(reviewSchema)

<a id="schema-reference"></a>
### 2.3 Get a schema reference for the resolved table
To perform insert operations, a reference to the resolved table is needed. 

A resolved table contains additional metadata that is maintained and used by the IBM Db2 Event Store engine.

In [None]:
val reviewTable = eContext.getTable("ReviewTable")

<a id="generate-insert-data"></a>
## 3. Generate and insert data rows 
You can insert single-rows of data or perform batch inserts to insert multiple rows of data.
A single row insert can be synchronous or asynchronous. Batch inserts are always performed asynchronously.  

In the example below, random data is generated using a data generator. The data is then sent to the IBM Db2 Event Store engine in a batch, asynchronously. 

In [None]:
import sys.process._
import scala.concurrent.{Await, Future}
import scala.concurrent.duration.Duration
import com.ibm.event.example.DataGenerator
import com.ibm.event.oltp.InsertResult

/** Insert generated rows asynchronously in batch */
val numRowsPerBatch = 1000
val numBatches = 1000
var failCount = 0
val startTime = System.currentTimeMillis()
for {i <-1 to numBatches} {
    val batch = DataGenerator.generateRows(reviewSchema.schema, numRowsPerBatch, 0, false).toIndexedSeq
    val future: Future[InsertResult] = eContext.batchInsertAsync(reviewTable, batch)
    val result: InsertResult = Await.result(future, Duration.Inf)
    
    if (result.failed) {
        println(s"batch insert incomplete: $result") 
        failCount += numRowsPerBatch 
    }
    else if (i % 100 == 0) { 
        System.out.println(s"First $i batches successfully inserted")
    }
}
val numRowsInserted = numBatches*numRowsPerBatch
println(s"Ingested $numRowsInserted rows")
val timeInserting = (System.currentTimeMillis()-startTime)/1000.0
println(s"Ingest took $timeInserting seconds - ${(numRowsInserted -failCount)/timeInserting} inserts per second. $failCount inserts failed")


Asynchronous `batchInsert` API is provided on `EventContext` instance. 
The rows are supplied as an `IndexSeq[Row]`, where `Row` is Spark SQL row object that matches the `StructType` of the resolved table schema. The caller can immediately submit new inserts or wait for the operation to complete.

<a id="query-table"></a>
## 4. Query the table 

<a id="create-sqlContext"></a>
### 4.1 Create sqlContext using EventSession

To run a Spark SQL query, you need to establish an IBM Db2 Event Store Spark session using sqlContext.

In [None]:
import java.io.File
import com.ibm.event.oltp.EventContext
import org.apache.log4j.{Level, LogManager, Logger}
import org.apache.spark._
import org.apache.spark.sql.ibm.event.EventSession

val sqlContext = new EventSession(spark.sparkContext, "EVENTDB")

<a id="prepare-DataFrame"></a>
### 4.2 Prepare a DataFrame for the query 
The following API provides a DataFrame that holds the query results on the IBM Db2 Event Store table. 

In [None]:
val table = sqlContext.loadEventTable("ReviewTable")
table.registerTempTable("ReviewTable")
val resultSet = sqlContext.sql("select count(*) as totalRows from ReviewTable")

<a id="run-query"></a>
### 4.3 Run the SQL query
Now you can materialize the dataframe associated with the sql query by using either show() or pretty print %%dataframe

In [None]:
resultSet.show()

In [None]:
%%dataframe resultSet

<a id="drop-table"></a>
## 5. Drop the table 

In [None]:
eContext.dropTable("reviewTable")

<a id="summary"></a>
## Summary
This demo introduced you to the IBM Db2 Event Store API for managing and querying data.

## References
* [IBM Db2 Event Store documentation](https://www.ibm.com/support/knowledgecenter/SSGNPV)

<hr>
Copyright &copy; IBM Corp. 2017. Released as licensed Sample Materials.