# Database and table creation

In this notebook, you will explore the best practices for IBM Db2 Event Store. You will learn:
- Database creation with IBM Db2 Event Store
- Best practices for table definition
- Best practices for indexing a table
- How to insert data from a CSV file

## Connect to IBM Db2 Event Store

### Determine the IP address of your host

Obtain the IP address of the host that you want to connect to by running the appropriate command for your operating system:

* On Mac, run: `ifconfig`
* On Windows, run: `ipconfig`
* On Linux, run: `hostname -i`

Edit the `HOST = "XXX.XXX.XXX.XXX"` value in the next cell to provide the IP address.

In [6]:
# Set your host IP address
HOST = "XXX.XXX.XXX.XXX"

# Port will be 1100 for version 1.1.2 or later (5555 for version 1.1.1)
PORT = "1100"

# Database name
DB_NAME = "TESTDB"

# Table name
TABLE_NAME = "IOT_TEMPERATURE"

## Import Python modules


In [7]:
from eventstore.common import ConfigurationReader
from eventstore.oltp import EventContext
from eventstore.sql import EventSession
from pyspark.sql import SparkSession

## Connect to Event Store

In [8]:
endpoint = HOST + ":" + PORT
print("Event Store connection endpoint:", endpoint)
ConfigurationReader.setConnectionEndpoints(endpoint)

Event Store connection endpoint: 192.168.0.106:1100


To run Spark SQL queries, you must set up a Db2 Event Store Spark session. The EventSession class extends the optimizer of the SparkSession class.

In [9]:
sparkSession = SparkSession.builder.appName("EventStore SQL in Python").getOrCreate()
eventSession = EventSession(sparkSession.sparkContext, DB_NAME)

## Create the database
Run the following cell to try to create the database.

> Only one database can be active in Event Store Developer Edition. If you already have a database, you don't need to create one. To create a database in Event Store, you can use the createDatabase function. If you want to drop an existing database to create a new one, use the dropDatabase function first.

In [10]:
# Run this cell if you need to (DROP and/or) CREATE the database.

# EventContext.drop_database(DB_NAME)   # Uncomment this if you want to drop an existing database
EventContext.create_database(DB_NAME)   # Comment this out (or skip this cell) to re-use an existing database

<eventstore.oltp.context.EventContext at 0x7fc33c045e10>

Now you can execute the command to open the database in the event session you created:

In [11]:
eventSession.open_database()

## Explore the database by retrieving all table names

The following cell retrieves and prints the names of all tables in the database.
Run it now. You can come back and run it again after you create a table.

In [12]:
with EventContext.get_event_context(DB_NAME) as ctx:
   print("Event context successfully retrieved.")

print("Table names:")
table_names = ctx.get_names_of_tables()
for name in table_names:
   print(name)

Event context successfully retrieved.
Table names:


## Create a table with an index

The next cell defines the schema for the table we want to create.

In [14]:
from eventstore.catalog import TableSchema
from pyspark.sql.types import *

tabSchema = TableSchema(TABLE_NAME, StructType([
    StructField("deviceID", IntegerType(), nullable = False),
    StructField("sensorID", IntegerType(), nullable = False),
    StructField("ts", LongType(), nullable = False),
    StructField("ambient_temp", DoubleType(), nullable = False),
    StructField("power", DoubleType(), nullable = False),
    StructField("temperature", DoubleType(), nullable = False)
    ]),
    sharding_columns = ["deviceID", "sensorID"],
    pk_columns = ["deviceID", "sensorID", "ts"]
                       )

The following cell defines the index schema which includes two equality columns -- *deviceID* and *sensorId*. The entries are sorted by *timestamp* in descending order. The *temperature* column is included to speed up queries that retrieve temperature:

In [15]:
from eventstore.catalog import IndexSpecification, SortSpecification, ColumnOrder

indexSchema = IndexSpecification(
          index_name=TABLE_NAME + "Index",
          table_schema=tabSchema,
          equal_columns = ["deviceID", "sensorID"],
          sort_columns = [
            SortSpecification("ts", ColumnOrder.DESCENDING_NULLS_LAST)],
          include_columns = ["temperature"]
        )

Finally, the following cell is used to create the table with the index using the `create_table_with_index()` method, passing both the table schema and the index schema defined above:

In [20]:
with EventContext.get_event_context(DB_NAME) as ctx:
   res = ctx.create_table_with_index(tabSchema, indexSchema)

To drop a table we use the drop_table command, like in the cell below, but it is commented out and provided here only as a reference.

In [21]:
# with EventContext.get_event_context(DB_NAME) as ctx:
#     ctx.drop_table(TABLE_NAME)

## Load the tables and inspect the table schemas

Get the DataFrame references to be able to access the table with your queries. The following code loads the table and creates a temporary view for it.

In [22]:
tab = eventSession.load_event_table(TABLE_NAME)
tab.createOrReplaceTempView(TABLE_NAME)
print("Table " + TABLE_NAME + " successfully loaded and temporary view created.")

Table IOT_TEMPERATURE successfully loaded and temporary view created.


The following cell can be used to show the schema of the table created:

In [23]:
try:
    resolved_table_schema = ctx.get_table(TABLE_NAME)
    print(resolved_table_schema)
except Exception as err:
    print("Table " + TABLE_NAME + " not found")

ResolvedTableSchema(tableName=IOT_TEMPERATURE, schema=StructType(List(StructField(deviceID,IntegerType,false),StructField(sensorID,IntegerType,false),StructField(ts,LongType,false),StructField(ambient_temp,DoubleType,false),StructField(power,DoubleType,false),StructField(temperature,DoubleType,false))), sharding_columns=['deviceID', 'sensorID'], pk_columns=['deviceID', 'sensorID', 'ts'], partition_columns=None)


## Load the sample data

Let's insert some sample data into the table before starting analysis.
To load data into the table, you will have to:

- Use the Event Store UI to add the "sample_IOT_table.csv" file as a project data asset.
- Run the following cells to load the data from **assets**.

### Read the CSV into a pandas DataFrame

In [24]:
import pandas as pd
df = pd.read_csv("assets/sample_IOT_table.csv")
df

Unnamed: 0,1,48,1541019342393,25.983183481618322,14.65874116573845,48.908846094198
0,1,24,1541019343497,22.545444,9.834895,39.065559
1,2,39,1541019344356,24.324654,14.100638,44.398837
2,2,1,1541019345216,25.658281,14.243132,45.291255
3,2,20,1541019346515,26.836546,12.841558,48.700130
4,1,24,1541019347200,24.960868,11.773728,42.161830
5,1,35,1541019347966,23.702428,7.518410,40.792013
6,1,32,1541019348864,24.041499,10.201932,41.664663
7,2,47,1541019349485,27.085396,7.805625,45.607395
8,1,12,1541019349819,20.633590,10.344878,37.514075
9,1,4,1541019350783,23.012288,2.447689,34.796619


### Examine the data types

In [25]:
df.dtypes

1                       int64
48                      int64
1541019342393           int64
25.983183481618322    float64
14.65874116573845     float64
48.908846094198       float64
dtype: object

### Batch insert the data from the DataFrame into the table

In [26]:
ctx.batch_insert(resolved_table_schema, df.to_records(index=False).tolist())

## Summary
In this notebook, you learned how to:
- Connect to IBM Db2 Event Store
- Create a new database
- Open a database
- Define a table schema and an index schema
- Create a database table with an index
- List the tables in a database and their schemas
- Insert data into a table from a CSV file

<p><font size=-1 color=gray>
&copy; Copyright 2019 IBM Corp. All Rights Reserved.
<p>
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file
except in compliance with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the
License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied. See the License for the specific language governing permissions and
limitations under the License.
</font></p>