# IBM Db2 Event Store - Table creation
IBM Db2 Event Store is a hybrid transactional/analytical processing (HTAP) system that is designed for IoT workloads. It is empowered by the Db2 Common SQL Engine, the most sophisticated SQL-based analytics query engine available. IBM Db2 Event Store can handle complex queries quickly and efficiently.

In this lab, you will explore the best practices fro IBM DB2 Event Store. You will learn:
- Database creation with IBM DB2 Event Store 2.0
- Best practices for table definition
- Best practices for indexing a table 

In [1]:
# import event store's Python client interface libraries
from eventstore.oltp import EventContext
from eventstore.sql import EventSession
from pyspark.sql import SparkSession
from eventstore.common import ConfigurationReader

<a id="connect-to-es"></a>
### 1. Set up connection to IBM Db2 Event Store

**In this demo, we assume your IBM Db2 Event Store is installed with Watson Studio Local (WSL).**

You will need to set the Watson Studio Local's `userID` and `password` that will be used to connect to IBM Db2 Event Store instance.

By default, the connection will be estabilished to the IBM Db2 Event Store instance on the current Watson Studio Local cluster.

For more details on setting up IBM Db2 Event Store connection in Jupyter Notebook, please read the official documentation:
https://www.ibm.com/support/knowledgecenter/en/SSGNPV_2.0.0/dsx/jupyter_prereq.html

In [2]:
# Using the configuration reader API, set up the userID and password that 
# will be used to connect to IBM Db2 Event Store.

ConfigurationReader.setEventUser("admin")
ConfigurationReader.setEventPassword("password")

### 2. Connect to the database
**IBM Event Store 2.0 instance will by default have a database created with name `EVENTDB`, and the default database `EVENTDB` should not be deleted. Each IBM Event Store 2.0 instance only support exact ONE database.**

In [3]:
dbName = "EVENTDB"

To run Spark SQL queries, you must set up a Db2 Event Store Spark session. SparkSession is the entry point to programming Spark with the Dataset and DataFrame API. The EventSession class extends the optimizer of the SparkSession class.

In [4]:
sparkSession = SparkSession.builder.appName("EventStore SQL in Python").getOrCreate()
eventSession = EventSession(sparkSession.sparkContext, dbName)

Now you can execute the command to connect to the database in the event session you created:

In [5]:
eventSession.open_database()

### 3. Exploring the database by retrieving all tables

The following code section retrieves the names of all tables that exist in the database.

In [6]:
with EventContext.get_event_context(dbName) as ctx:
   print("Event context successfully retrieved.")

table_names = ctx.get_names_of_tables()
for idx, name in enumerate(table_names):
   print(name)

Event context successfully retrieved.
ADMIN.DHT_TABLE
ADMIN.SDS_TABLE


### 4. Creating a table with an index

Like you saw above, there are no tables in the database, we are going to come back to those cells after we create a table to see the table was created. The next cell defined the table name we want to create:

In [7]:
tabName = "IOT_TEMP"

In [8]:
from eventstore.catalog import TableSchema
from pyspark.sql.types import *

tabSchema = TableSchema(tabName, StructType([
    StructField("deviceID", IntegerType(), nullable = False),
    StructField("sensorID", IntegerType(), nullable = False),
    StructField("ts", LongType(), nullable = False),
    StructField("ambient_temp", DoubleType(), nullable = False),
    StructField("power", DoubleType(), nullable = False),
    StructField("temperature", DoubleType(), nullable = False)
    ]),
    sharding_columns = ["deviceID", "sensorID"],
    pk_columns = ["deviceID", "sensorID", "ts"]
                       )

And the following cell defines the index schema that includes two equality columns (deviceID and sensorId), the entries are sorted in timestamp descending order, and includes the `reading` column to speed up queries that retrieve readings:

In [9]:
from eventstore.catalog import IndexSpecification, SortSpecification, ColumnOrder

indexSchema = IndexSpecification(
          index_name=tabName + "Index",
          table_schema=tabSchema,
          equal_columns = ["deviceID", "sensorID"],
          sort_columns = [
            SortSpecification("ts", ColumnOrder.DESCENDING_NULLS_LAST)],
          include_columns = ["temperature"]
        )

Finally, the following cell is used to create the table with the index using the create_table_with_index method, passing both the table schema and index schema defined above:

In [10]:
with EventContext.get_event_context(dbName) as ctx:
   res = ctx.create_table_with_index(tabSchema,indexSchema)

To drop a table we use the drop_table command, like in the cell below, but it is commented out and provided here only as a reference:

In [11]:
# with EventContext.get_event_context("EVENTDB") as ctx:
#     ctx.drop_table(tabName)

### 5. Loading the tables and inspecting the table schemas


To manipulate or retrieve data from tables you need to load the corresponding tables and get the data frame references to be able to access the tables with your queries. The following code loads all tables and creates a temp view for each of them.

In [12]:
with EventContext.get_event_context(dbName) as ctx:
    print("tables: ")
    table_names = ctx.get_names_of_tables()
    for idx, name in enumerate(table_names):
        print("\t{}: {}".format(idx, name))

tables: 
	0: ADMIN.DHT_TABLE
	1: ADMIN.IOT_TEMP
	2: ADMIN.SDS_TABLE


Then the following cell can be used to show the schema of the table created. 

** In the IBM Db2 Event Store 2.0, the following objects have case insensitive names by default: schema, column and index.**  
All the user provided names for the above objects are converted to upper cases in the catalog. Users can explicitly enable the case sensitivity by wrapping the object names with single quotes, for example, ` '<Name>' `

In [13]:
try:
    resolved_table_schema = ctx.get_table(tabName)
    print(resolved_table_schema)
except Exception as err:
    print("Table not found")

ResolvedTableSchema(tableName=ADMINIOT_TEMP, schema=StructType(List(StructField(DEVICEID,IntegerType,false),StructField(SENSORID,IntegerType,false),StructField(TS,LongType,false),StructField(AMBIENT_TEMP,DoubleType,false),StructField(POWER,DoubleType,false),StructField(TEMPERATURE,DoubleType,false))), sharding_columns=[u'DEVICEID', u'SENSORID'], pk_columns=[u'DEVICEID', u'SENSORID', u'TS'], partition_columns=None, schema_name=Some(ADMIN))


## Summary
In this notebook, you learned:
- how to connect to a local and remote IBM Db2 Event Store
- how to open a database
- how to define a table schema and index schema
- how to create a database table with an index
- how to list the tables in a database and their schemas

### Next Step: Load data into the table
With the newly created database and table, you will need to insert some data into the table before starting analysis.
To load data into the table, you will have to:
- Copy the `NFS_setup.sh` and `./load.sh` scripts located under `db2eventstore-IoT-Analytics/data/` in this repo to the installation node on your cluster.
- Run the `NFS_setup.sh` script to setup NFS server that will be used as external table source for your Event Store instance.  
  The script will also download the sample csv data file located at `db2eventstore-IoT-Analytics/data/sample_IOT_table.csv` to the NFS server.  
  Example: `./NFS_setup.sh --namespace dsx`
- Run "load.sh" to load sample data to the `ADMIN.IOT_TEMP` table.  
  Example: `./load.sh --namespace dsx`