# Database and table creation

In this lab, you will explore the best practices fro IBM DB2 Event Store. You will learn:
- Database creation with IBM DB2 Event Store
- Best practices for table definition
- Best practices for indexing a table 

## Setting basic import clauses used by this notebook


In [1]:
from eventstore.oltp import EventContext
from eventstore.sql import EventSession
from pyspark.sql import SparkSession

## Setting the IP address to connect to your IBM Db2 Event Store cluster

For this, you will need to find out the connection string to your IBM Db2 Event Store cluster.

Perform the following steps:

- Replace the IP address in the below program code with the public IP address of your local IBM Db2 Event Store one-node host.
- Then execute the program cell below. It will override the configuration in the notebook environment to connect to the IBM Db2 Event Store cluster in the provided connection string.

In [2]:
from eventstore.common import ConfigurationReader

ip = "9.30.167.102"

endpoint = ip + ':1101'

print("Endpoint: "+ endpoint)

ConfigurationReader.setConnectionEndpoints(endpoint)

Endpoint: 9.30.167.102:1101


## Opening a database


The following code is used to open a database and populate it with tables and data.
Run the command in the next program cell to define the database name. 

In [3]:
dbName = "TESTDB"

To run Spark SQL queries, you must set up a Db2 Event Store Spark session. The EventSession class extends the optimizer of the SparkSession class.

In [4]:
sparkSession = SparkSession.builder.appName("EventStore SQL in Python").getOrCreate()
eventSession = EventSession(sparkSession.sparkContext, dbName)

Run the following cell to try to create the database. If the database with same name already exists, we will drop it and create a new one.

In [5]:
try:
    EventContext.create_database(dbName)
except:
    EventContext.drop_database(dbName)
    EventContext.create_database(dbName)

Now you can execute the command to open the database in the event session you created:

In [6]:
eventSession.open_database()

## Exploring the database by retrieving all tables

The following code section retrieves the names of all tables that exist in the database.

In [7]:
with EventContext.get_event_context(dbName) as ctx:
   print("Event context successfully retrieved.")

table_names = ctx.get_names_of_tables()
for idx, name in enumerate(table_names):
   print(name)

Event context successfully retrieved.


## Creating a table with an index

Like you saw above, there are no tables in the database, we are going to come back to those cells after we create a table to see the table was created. The next cell defined the table name we want to create:

In [8]:
tabName = "IOT_TEMP"

In [9]:
from eventstore.catalog import TableSchema
from pyspark.sql.types import *

tabSchema = TableSchema(tabName, StructType([
    StructField("deviceID", IntegerType(), nullable = False),
    StructField("sensorID", IntegerType(), nullable = False),
    StructField("ts", LongType(), nullable = False),
    StructField("ambient_temp", DoubleType(), nullable = False),
    StructField("power", DoubleType(), nullable = False),
    StructField("temperature", DoubleType(), nullable = False)
    ]),
    sharding_columns = ["deviceID", "sensorID"],
    pk_columns = ["deviceID", "sensorID", "ts"]
                       )

And the following cell defines the index schema that includes two equality columns (deviceID and sensorId), the entries are sorted in timestamp descending order, and includes the reading column to speed up queries that retrieve readings:

In [10]:
from eventstore.catalog import IndexSpecification, SortSpecification, ColumnOrder

indexSchema = IndexSpecification(
          index_name=tabName + "Index",
          table_schema=tabSchema,
          equal_columns = ["deviceID", "sensorID"],
          sort_columns = [
            SortSpecification("ts", ColumnOrder.DESCENDING_NULLS_LAST)],
          include_columns = ["temperature"]
        )

Finally, the following cell is used to create the table with the index using the create_table_with_index method, passing both the table schema and index schema defined above:

In [11]:
with EventContext.get_event_context(dbName) as ctx:
   res = ctx.create_table_with_index(tabSchema,indexSchema)

To drop a table we use the drop_table command, like in the cell below, but it is commented out and provided here only as a reference:

In [12]:
# with EventContext.get_event_context(dbName) as ctx:
#     ctx.drop_table(tabName)

## Loading the tables and inspecting the table schemas


To manipulate or retrieve data from tables you need to load the corresponding tables and get the data frame references to be able to access the tables with your queries. The following code loads all tables and creates a temp view for each of them.

In [13]:
table_names = ctx.get_names_of_tables()
for tab_name in table_names:
    tab = eventSession.load_event_table(tab_name)
    tab.createOrReplaceTempView(tab_name)
    print("Table "+tab_name+" successfully loaded and temp view created.")

Table IOT_TEMP successfully loaded and temp view created.


Then the following cell can be used to show the schema of the table created:

In [14]:
try:
    resolved_table_schema = ctx.get_table(tabName)
    print(resolved_table_schema)
except Exception as err:
    print("Table not found")

ResolvedTableSchema(tableName=IOT_TEMP, schema=StructType(List(StructField(deviceID,IntegerType,false),StructField(sensorID,IntegerType,false),StructField(ts,LongType,false),StructField(ambient_temp,DoubleType,false),StructField(power,DoubleType,false),StructField(temperature,DoubleType,false))), sharding_columns=[u'deviceID', u'sensorID'], pk_columns=[u'deviceID', u'sensorID', u'ts'], partition_columns=None)


## Summary
In this notebook, you learned:
- how to connect to a local and remote IBM Db2 Event Store
- how to create a new database
- how to open a database
- how to define a table schema and index schema
- how to create a database table with an index
- how to list the tables in a database and their schemas

### Next Step: Load data into the table
With the newly created database and table, you will need to insert some data into the table before starting analysis.
To load data into the table, you will have to:
- Prepare a sample dataframe stored in an csv file called "sample_IOT_table.csv", either use the sample csv file provided under `/data` directory, or generate one using "generator.py" script located at the same directory as above.
- Run "load.sh" to load the data in csv file  
(Note: If your Event Store is installed on a Fyre cluster, then the load process should be also executed on an Fyre VM, preferably on the one-node cluster where the Event Store is installed.)