# BasicTick V3: Create Everything
This notebook will use the AWS Python boto3 APIs to create the needed resources for a basic tick application. This application will simulate a market data capture system. 

## Architecture
<img src="images/Deepdive Diagrams-BasicTick V3.drawio.png"  width="80%">

## Abbreviations
- CEP: Complex Event Processor    
- FH: Feedhandler    
- HDB: Historical Dastabase
- RDB: Realtime Database    
- TP: Tickerplant    

## AWS Resources Created
- Database   
- Changeset (adds data to database)   
- Scaling Group in which all clusters are run   
- Shared Volume used by database view and clusters  
- Dataview of database on the shared volume
  - option: view can be auto-updating or static
- Clusters: TP, CEP, HDB, Gateway, and RDB   

### Non AWS
For this demonstration application the FH is run locally and publishes data to the TP.    

# References

In [1]:
import os
import subprocess
import boto3
import json
import datetime

import pykx as kx

from env import *
from managed_kx import *

# Cluster names and database
from basictick_setup import *

# ----------------------------------------------------------------

# Source data directory
SOURCE_DATA_DIR="hdb"

# Code directory
CODEBASE="basictick"

# S3 Destinations
S3_CODE_PATH="code"
S3_DATA_PATH="data"

#NODE_TYPE="kx.sg.4xlarge"
NODE_TYPE="kx.sg.2xlarge"

CODE_CONFIG={ 's3Bucket': S3_BUCKET, 's3Key': f'{S3_CODE_PATH}/{CODEBASE}.zip' }

NAS1_CONFIG= {
        'type': 'SSD_250',
        'size': 1200
}

# Realtime Database (RDB) Configs
RDB_INIT_SCRIPT='rdbmkdb.q'
RDB_CMD_ARGS=[
    { 'key': 's', 'value': '2' }, 
    { 'key': 'g', 'value': '1' }, 
    { 'key': 'tp', 'value': TP_CLUSTER_NAME }, 
    { 'key': 'procName', 'value': RDB_CLUSTER_NAME }, 
    { 'key': 'volumeName', 'value': VOLUME_NAME }, 
    { 'key': 'hdbProc', 'value': HDB_CLUSTER_NAME }, 
    { 'key': 'dbView', 'value': DBVIEW_NAME }, 
    { 'key': 'AWS_ZIP_DEFAULT', 'value': '17,2,6' },
]

# CEP Configs
CEP_INIT_SCRIPT='cepmkdb.q'
CEP_CMD_ARGS = [
    { 'key': 's', 'value': '2' }, 
    { 'key': 'g', 'value': '1' }, 
    { 'key': 'tp', 'value': TP_CLUSTER_NAME }, 
    { 'key': 'AWS_ZIP_DEFAULT', 'value': '17,2,6' },
]

# Tickerplant (TP) Configs
TP_INIT_SCRIPT='tick.q'
TP_CMD_ARGS=[
    { 'key': 'procName', 'value': TP_CLUSTER_NAME }, 
    { 'key': 'volumeName', 'value': VOLUME_NAME }, 
    { 'key': 'AWS_ZIP_DEFAULT', 'value': '17,2,6' },
    { 'key': 'g', 'value': '1' }, 
]

# Historical Database (HDB) Configs
HDB_INIT_SCRIPT='hdbmkdb.q'
HDB_CMD_ARGS=[
    { 'key': 's', 'value': '2' }, 
    { 'key': 'g', 'value': '1' }, 
]

# Gateway Configs
GW_INIT_SCRIPT='gwmkdb.q'
GW_CMD_ARGS=[
    { 'key': 's', 'value': '2' }, 
    { 'key': 'g', 'value': '1' }, 
    { 'key': 'rdb_name', 'value': RDB_CLUSTER_NAME}, 
    { 'key': 'hdb_name', 'value': HDB_CLUSTER_NAME}, 
]

# Feedhandler configs
FEED_TIMER=10000
FH_PORT=5030 


In [2]:
# triggers credential get
session=None

if AWS_ACCESS_KEY_ID is None:
    print("Using Defaults ...")
    # create AWS session: using access variables
    session = boto3.Session()
else:
    print("Using variables ...")
    session = boto3.Session(
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
        aws_session_token=AWS_SESSION_TOKEN
    )

# create finspace client
client = session.client(service_name='finspace', endpoint_url=ENDPOINT_URL)

Using Defaults ...


# Create a Sample Database
Create a synthetic database using kxtaqdb.q (takes 1-2 minutes)

In [3]:
os.system(f"rm -rf {SOURCE_DATA_DIR}")

# call local q (using pykx) to create the database
kx.q("\l basictick/kxtaqdb.q")

# Database size
print( os.system(f"du -sh {SOURCE_DATA_DIR}") )


"Generated trade|quote records: 872530 4356637"
"Generated trade|quote records: 899400 4495478"
"Generated trade|quote records: 879672 4401306"
"Generated trade|quote records: 894169 4471510"
"Generated trade|quote records: 941313 4711942"
"Generated trade|quote records: 924403 4619618"
"Generated trade|quote records: 907938 4544274"
"Generated trade|quote records: 867065 4333345"
438M	hdb
0


## Stage Database Files to S3
Using AWS cli, copy hdb to staging S3 bucket

In [4]:
S3_DEST=f"s3://{S3_BUCKET}/{S3_DATA_PATH}/{SOURCE_DATA_DIR}/"

if AWS_ACCESS_KEY_ID is not None:
    cp = f"""
export AWS_ACCESS_KEY_ID={AWS_ACCESS_KEY_ID} --quiet
export AWS_SECRET_ACCESS_KEY={AWS_SECRET_ACCESS_KEY}
export AWS_SESSION_TOKEN={AWS_SESSION_TOKEN}

aws s3 rm --recursive {S3_DEST} --quiet
aws s3 sync --exclude .DS_Store {SOURCE_DATA_DIR} {S3_DEST} --quiet
"""
else:
    cp = f"""
aws s3 rm --recursive {S3_DEST} --quiet
aws s3 sync --exclude .DS_Store {SOURCE_DATA_DIR} {S3_DEST} --quiet
"""
    
# execute the S3 copy
os.system(cp)

print( f"Destination: {S3_DEST}" )
print( os.system(f"aws s3 ls {S3_DEST}") )

Destination: s3://kdb-demo-829845998889-kms/data/hdb/
                           PRE 2024.07.29/
                           PRE 2024.07.30/
                           PRE 2024.07.31/
                           PRE 2024.08.01/
                           PRE 2024.08.02/
                           PRE 2024.08.05/
                           PRE 2024.08.06/
                           PRE 2024.08.07/
2024-08-08 18:44:17         75 sym
0


## Create A Managed Database
Using the AWS APIs, create a managed database in Managed kdb Insights. The database is initially empty and is populated using changesets.

### Reference
[Managed kdb Insights databases](https://docs.aws.amazon.com/finspace/latest/userguide/finspace-managed-kdb-db.html)

### APIs used
[get_kx_database](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/get_kx_database.html)  
[create_kx_database](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/create_kx_database.html)  


In [5]:
# assume it exists
create_db=False

try:
    resp = client.get_kx_database(environmentId=ENV_ID, databaseName=DB_NAME)
    resp.pop('ResponseMetadata', None)
except:
    # does not exist, will create
    create_db=True

if create_db:
    print(f"CREATING Database: {DB_NAME}")
    resp = client.create_kx_database(environmentId=ENV_ID, databaseName=DB_NAME, description="Basictick kdb database")
    resp.pop('ResponseMetadata', None)

    print(f"CREATED Database: {DB_NAME}")

print(json.dumps(resp,sort_keys=True,indent=4,default=str))

CREATING Database: basictickdb
CREATED Database: basictickdb
{
    "createdTimestamp": "2024-08-08 18:44:19.065000+00:00",
    "databaseArn": "arn:aws:finspace:us-east-1:829845998889:kxEnvironment/jlcenjvtkgzrdek2qqv7ic/kxDatabase/basictickdb",
    "databaseName": "basictickdb",
    "description": "Basictick kdb database",
    "environmentId": "jlcenjvtkgzrdek2qqv7ic",
    "lastModifiedTimestamp": "2024-08-08 18:44:19.065000+00:00"
}


## Add Data to Database
Add the created database data copied earlier to S3 to the created managed database using create_kx_changeset. 

### APIs used
[create_kx_changeset](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/create_kx_changeset.html)  


In [6]:
c_set_list = list_kx_changesets(client, environmentId=ENV_ID, databaseName=DB_NAME)

if len(c_set_list) == 0:
    print("Adding Changeset to Empty database")
    changes=[]

    for f in os.listdir(f"{SOURCE_DATA_DIR}"):
        if os.path.isdir(f"{SOURCE_DATA_DIR}/{f}"):
            changes.append( { 'changeType': 'PUT', 's3Path': f"{S3_DEST}{f}/", 'dbPath': f"/{f}/" } )
        else:
            changes.append( { 'changeType': 'PUT', 's3Path': f"{S3_DEST}{f}", 'dbPath': f"/" } )

    resp = client.create_kx_changeset(environmentId=ENV_ID, databaseName=DB_NAME, 
        changeRequests=changes)

    resp.pop('ResponseMetadata', None)
    changeset_id = resp['changesetId']

    print("Changeset...")
    print(json.dumps(resp,sort_keys=True,indent=4,default=str))
else:
    changeset_id = c_set_list[0]['changesetId']    

Adding Changeset to Empty database
Changeset...
{
    "changeRequests": [
        {
            "changeType": "PUT",
            "dbPath": "/",
            "s3Path": "s3://kdb-demo-829845998889-kms/data/hdb/sym"
        },
        {
            "changeType": "PUT",
            "dbPath": "/2024.07.29/",
            "s3Path": "s3://kdb-demo-829845998889-kms/data/hdb/2024.07.29/"
        },
        {
            "changeType": "PUT",
            "dbPath": "/2024.07.30/",
            "s3Path": "s3://kdb-demo-829845998889-kms/data/hdb/2024.07.30/"
        },
        {
            "changeType": "PUT",
            "dbPath": "/2024.07.31/",
            "s3Path": "s3://kdb-demo-829845998889-kms/data/hdb/2024.07.31/"
        },
        {
            "changeType": "PUT",
            "dbPath": "/2024.08.01/",
            "s3Path": "s3://kdb-demo-829845998889-kms/data/hdb/2024.08.01/"
        },
        {
            "changeType": "PUT",
            "dbPath": "/2024.08.02/",
            "s3Path": "s

In [7]:
# Wait for the changeset to be added to the database
wait_for_changeset_status(client, environmentId=ENV_ID, databaseName=DB_NAME, changesetId=changeset_id, show_wait=True)

print("**Done**")

Status is IN_PROGRESS, total wait 0:00:00, waiting 10 sec ...
Status is IN_PROGRESS, total wait 0:00:10, waiting 10 sec ...
**Done**


### Contents of the Managed Database
Display the changesets of the managed database.

### APIs used
[list_kx_changesets](https://botocore.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/list_kx_changesets.html)   
[get_kx_changeset](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/get_kx_changeset.html)  


In [8]:
note_str = ""

c_set_list = list_kx_changesets(client, environmentId=ENV_ID, databaseName=DB_NAME)

if len(c_set_list) == 0:
    note_str = "<<Could not get changesets>>"
    
print(100*"=")
print(f"Database: {DB_NAME}, Changesets: {len(c_set_list)} {note_str}")
print(100*"=")

# sort by create time
c_set_list = sorted(c_set_list, key=lambda d: d['createdTimestamp']) 

for c in c_set_list:
    c_set_id = c['changesetId']
    print(f"  Changeset: {c_set_id}: Created: {c['createdTimestamp']} ({c['status']})")
    c_rqs = client.get_kx_changeset(environmentId=ENV_ID, databaseName=DB_NAME, changesetId=c_set_id)['changeRequests']

    chs_pdf = pd.DataFrame.from_dict(c_rqs).style.hide(axis='index')
    display(chs_pdf)

Database: basictickdb, Changesets: 1 
  Changeset: 4MiZpzH05M5DPUBfmFz59w: Created: 2024-08-08 18:44:21.099000+00:00 (COMPLETED)


changeType,s3Path,dbPath
PUT,s3://kdb-demo-829845998889-kms/data/hdb/sym,/
PUT,s3://kdb-demo-829845998889-kms/data/hdb/2024.07.29/,/2024.07.29/
PUT,s3://kdb-demo-829845998889-kms/data/hdb/2024.07.30/,/2024.07.30/
PUT,s3://kdb-demo-829845998889-kms/data/hdb/2024.07.31/,/2024.07.31/
PUT,s3://kdb-demo-829845998889-kms/data/hdb/2024.08.01/,/2024.08.01/
PUT,s3://kdb-demo-829845998889-kms/data/hdb/2024.08.02/,/2024.08.02/
PUT,s3://kdb-demo-829845998889-kms/data/hdb/2024.08.05/,/2024.08.05/
PUT,s3://kdb-demo-829845998889-kms/data/hdb/2024.08.06/,/2024.08.06/
PUT,s3://kdb-demo-829845998889-kms/data/hdb/2024.08.07/,/2024.08.07/


# Create Scaling Group
The scaling group represents the total compute avilable to the application. All clusters will be placed into the scaling group and will share the compute and memory of the scaling group.

## Reference
[Managed kdb scaling groups](https://docs.aws.amazon.com/finspace/latest/userguide/finspace-managed-kdb-scaling-groups.html)

## APIs used
[create_kx_scaling_group](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/create_kx_scaling_group.html)  

In [9]:
# Check if scaling group exits, only create if it does not
resp = get_kx_scaling_group(client=client, environmentId=ENV_ID, scalingGroupName=SCALING_GROUP_NAME)

if resp is None:
    resp = client.create_kx_scaling_group(
        environmentId = ENV_ID, 
        scalingGroupName = SCALING_GROUP_NAME,
        hostType=NODE_TYPE,
        availabilityZoneId = AZ_ID
    )

#    display(resp)
else:
    print(f"Scaling Group {SCALING_GROUP_NAME} exists")    

# Create Shared Volume
The shared volume is a common storage device for the application. Every cluster using the shared volume will have a writable directory named after the cluster, can read the directories named after other clusters in the application using the volume. Also, there is a common directory for every shared volume as well, all clusters using a volumes can read/write to the common directory.

## Directory Structure
Any shared volumes will appear in the /opt/kx/app/shared directory of clusters using the volume, with a path is named for shared volume (/opt/kx/app/shared/VOLUME_NAME). Each cluster using the volume will have a directory named for the cluster that only the cluster can write to (/opt/kx/app/shared/VOLUME_NAME/CLUSTER_NAME) and others using the volumes can read from. Last each shared volume has a directory that is read/write to all clusters using the volume (/opt/kx/app/shared/VOLUME_NAME/common)

**Root:** /opt/kx/app/shared   
**Each Volume:** /opt/kx/app/shared/VOLUME_NAME   
**Write per cluster (read otherwise):** /opt/kx/app/shared/VOLUME_NAME/CLUSTER_NAME   
**common read/write:** /opt/kx/app/shared/VOLUME_NAME/common   

## Reference
[FinSpace Managed kdb Volumes](https://docs.aws.amazon.com/finspace/latest/userguide/finspace-managed-kdb-volumes.html)

## APIs used
[create_kx_volume](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/create_kx_volume.html) 


In [10]:
# Check if volume already exists before trying to create one
resp = get_kx_volume(client=client, environmentId=ENV_ID, volumeName=VOLUME_NAME)

if resp is None:
    resp = client.create_kx_volume(
        environmentId = ENV_ID, 
        volumeType = 'NAS_1',
        volumeName = VOLUME_NAME,
        description = 'Shared volume between TP and RDB',
        nas1Configuration = NAS1_CONFIG,
        azMode='SINGLE',
        availabilityZoneIds=[ AZ_ID ]    
    )

#    display(resp)
else:
    print(f"Volume {VOLUME_NAME} exists")        

# Create Dataview
Create a dataview of the database and have all of its data presented (cached) on the shared volume. Customers can also choose to cache only a portion of the database and can also shoose to tier storage on different volumes as well.

### Reference
[Dataviews for querying data](https://docs.aws.amazon.com/finspace/latest/userguide/finspace-managed-kdb-dataviews.html)

### APIs used
[create_kx_dataview](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/create_kx_dataview.html) 


In [11]:
# before creating the dataview, be sure the volume is created and ready
wait_for_volume_status(client=client, environmentId=ENV_ID, volumeName=VOLUME_NAME, show_wait=True)

print("** VOLUME is READY **")

Volume: RDB_TP_SHARED status is CREATING, total wait 0:00:00, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:00:30, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:01:00, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:01:30, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:02:00, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:02:30, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:03:00, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:03:30, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:04:00, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:04:30, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:05:00, waiting 30 sec ...
Volume: RDB_TP_SHARED status is CREATING, total wait 0:05:30, waiting 30 sec ...
Volume: RDB_TP_SHARED status

In [12]:
# do changesets exist?
c_set_list = list_kx_changesets(client, environmentId=ENV_ID, databaseName=DB_NAME)

if len(c_set_list) != 0:
    # sort by create time
    c_set_list = sorted(c_set_list, key=lambda d: d['createdTimestamp']) 
    latest_changeset = c_set_list[-1]['changesetId']

    # Check if dataview already exists and is set to the requested changeset_id
    resp = get_kx_dataview(client=client, environmentId=ENV_ID, databaseName=DB_NAME, dataviewName=DBVIEW_NAME)

    if resp is None:
        resp = client.create_kx_dataview(
            environmentId = ENV_ID, 
            databaseName=DB_NAME, 
            dataviewName=DBVIEW_NAME,
            azMode='SINGLE',
            availabilityZoneId=AZ_ID,
            segmentConfigurations=[
                { 
                    'volumeName': VOLUME_NAME,
                    'dbPaths': ['/*'],  # cache all of database
                }
            ],
            autoUpdate=False,
            changesetId=latest_changeset, # latest changeset_id for static view
#            autoUpdate=True,
            description = f'Dataview of database'
        )
    elif resp['changesetId'] != latest_changeset:
        print(f"Dataview {DBVIEW_NAME} exists but needs updating...")
        resp = client.update_kx_dataview(environmentId=ENV_ID, 
            databaseName=DB_NAME, 
            dataviewName=DBVIEW_NAME, 
            changesetId=latest_changeset, 
            segmentConfigurations=[
                {'dbPaths': ['/*'], 'volumeName': VOLUME_NAME}
            ]
        )
    else:
        print(f"Dataview {DBVIEW_NAME} exists with current changeset: {latest_changeset}")
else:
    # no changesets, do NOT create view
    print(f"No changeset in database: {DB_NAME}, Dataview {DBVIEW_NAME} not created")        


# Create Clusters
Create the needed clusters for the application. 

Code to be used in this application must be staged to an S3 bucket the service can read from, that code will be deployed to each cluster as part of the cluster creation process.

## Reference
[Managed kdb Insights clusters](https://docs.aws.amazon.com/finspace/latest/userguide/finspace-managed-kdb-clusters.html)   
[Cluster types](https://docs.aws.amazon.com/finspace/latest/userguide/kdb-cluster-types.html)

In [13]:
# code that will be deployed
os.system(f"ls -lrtha {CODEBASE}")

# create zipfile of the code
os.system(f"cd {CODEBASE}; zip -r -X ../{CODEBASE}.zip . -x '*.ipynb_checkpoints*';")

# Copy command with credentials
if AWS_ACCESS_KEY_ID is not None:
    cp = f"""
export AWS_ACCESS_KEY_ID={AWS_ACCESS_KEY_ID}
export AWS_SECRET_ACCESS_KEY={AWS_SECRET_ACCESS_KEY}
export AWS_SESSION_TOKEN={AWS_SESSION_TOKEN}

aws s3 cp  --exclude .DS_Store {CODEBASE}.zip s3://{S3_BUCKET}/code/{CODEBASE}.zip
"""
else:
    cp = f"""
aws s3 cp  --exclude .DS_Store {CODEBASE}.zip s3://{S3_BUCKET}/code/{CODEBASE}.zip
"""
    
# Copy the code
os.system(cp)

# Code on S3
os.system(f"aws s3 ls s3://{S3_BUCKET}/code/{CODEBASE}.zip")

total 60K
-rw-rw-r-- 1 ec2-user ec2-user  274 Jul 30 21:05 funcDownHandle.q
-rw-rw-r-- 1 ec2-user ec2-user 3.1K Jul 30 21:05 connectmkdb.q
-rw-rw-r-- 1 ec2-user ec2-user 3.0K Jul 30 21:05 gwmkdb.q
-rw-rw-r-- 1 ec2-user ec2-user 4.5K Jul 30 21:05 kxtaqfeed.q
-rw-rw-r-- 1 ec2-user ec2-user 2.7K Jul 30 21:05 kxtaqdb.q
-rw-rw-r-- 1 ec2-user ec2-user  695 Jul 30 21:05 query.q
-rw-rw-r-- 1 ec2-user ec2-user  212 Jul 30 21:05 taq.schema.q
drwxrwxr-x 3 ec2-user ec2-user   43 Aug  5 20:15 tick
-rw-rw-r-- 1 ec2-user ec2-user  752 Aug  5 20:32 kxtaqsubscriber.q
-rw-rw-r-- 1 ec2-user ec2-user  655 Aug  6 14:59 hdbmkdb.q
-rw-rw-r-- 1 ec2-user ec2-user 3.2K Aug  6 14:59 cepmkdb.q
-rw-rw-r-- 1 ec2-user ec2-user 2.8K Aug  7 12:40 tick.q
-rw-rw-r-- 1 ec2-user ec2-user 4.8K Aug  7 17:59 rdbmkdb.q
drwxrwxr-x 2 ec2-user ec2-user  213 Aug  8 18:37 .ipynb_checkpoints
drwxrwxr-x 4 ec2-user ec2-user  266 Aug  8 18:37 .
drwxrwxr-x 8 ec2-user ec2-user 4.0K Aug  8 18:53 ..
updating: connectmkdb.q (deflated 63%)


0

## Wait for Scaling Group to be Ready
Before creating clusters in a scaling group, be sure the scaling group is ready.

In [14]:
# wait for the scaling group to create
wait_for_scaling_group_status(client=client, environmentId=ENV_ID, scalingGroupName=SCALING_GROUP_NAME, show_wait=True)

print("** DONE **")

Scaling Group: SCALING_GROUP_basictickdb status is now ACTIVE, total wait 0:00:00
** DONE **


## Create Tickerplant (TP) Cluster
Tickerplant will deliver data from feedhandler to subscribing RDB.

### APIs used
[create_kx_cluster](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/create_kx_cluster.html) 

#### Notes
- No database used by TP, databases argument is not used   
- Use tickerplantLogConfiguration **not** savedownStorageConfiguration   
  - tickerplantLogVolumes uses the same shared volume as other clusters

In [15]:
# does cluster already exist?
resp = get_kx_cluster(client, environmentId=ENV_ID, clusterName=TP_CLUSTER_NAME)

if resp is not None:
    print(f"Cluster: {TP_CLUSTER_NAME} already exists")
else:
    print(f"Creating: {TP_CLUSTER_NAME}")

    resp = client.create_kx_cluster(
        environmentId=ENV_ID, 
        clusterName=TP_CLUSTER_NAME,
        clusterType='TICKERPLANT',
        releaseLabel = '1.0',
        executionRole=EXECUTION_ROLE,
        scalingGroupConfiguration={
#            'memoryLimit': 1*1024,
            'memoryReservation': 6,
            'nodeCount': 1,
            'scalingGroupName': SCALING_GROUP_NAME,
        },
        tickerplantLogConfiguration ={ 'tickerplantLogVolumes': [ VOLUME_NAME ] },
        clusterDescription="Created with create_all notebook",
        code=CODE_CONFIG,
        initializationScript=TP_INIT_SCRIPT,
        commandLineArguments=TP_CMD_ARGS,
        azMode=AZ_MODE,
        availabilityZoneId=AZ_ID,
        vpcConfiguration={ 
            'vpcId': VPC_ID,
            'securityGroupIds': SECURITY_GROUPS,
            'subnetIds': SUBNET_IDS,
            'ipAddressType': 'IP_V4' 
        }
    )

Creating: TP_basictickdb


## Create Historical Database (HDB) Cluster
A multi-node HDB cluster will serve up queries for T+1 and older data. 

### APIs used
[create_kx_cluster](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/create_kx_cluster.html) 

#### Notes
- **databases**: defines which database and view to use
  - View used by the HDB cluster must be up and running   
- No a tickerplant, no tickerplantLogConfiguration argument   
- No savedown needed, no savedownStorageConfiguration argument  

In [16]:
# Dataview must be ready before creating the HDB Cluster
wait_for_dataview_status(client=client, environmentId=ENV_ID, databaseName=DB_NAME, dataviewName=DBVIEW_NAME, show_wait=True)

print("** Dataview is READY **")

Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:00:00, waiting 30 sec ...
Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:00:30, waiting 30 sec ...
Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:01:00, waiting 30 sec ...
Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:01:30, waiting 30 sec ...
Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:02:00, waiting 30 sec ...
Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:02:30, waiting 30 sec ...
Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:03:00, waiting 30 sec ...
Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:03:30, waiting 30 sec ...
Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:04:00, waiting 30 sec ...
Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:04:30, waiting 30 sec ...
Dataview: basictickdb_DBVIEW status is CREATING, total wait 0:05:00, waiting 30 sec ...
Dataview: basictickdb_DBVIEW sta

In [17]:
# does cluster already exist?
resp = get_kx_cluster(client, environmentId=ENV_ID, clusterName=HDB_CLUSTER_NAME)

if resp is not None:
    print(f"Cluster: {HDB_CLUSTER_NAME} already exists")
else:
    print(f"Creating: {HDB_CLUSTER_NAME}")

    resp = client.create_kx_cluster(
        environmentId=ENV_ID, 
        clusterName=HDB_CLUSTER_NAME,
        clusterType='HDB',
        releaseLabel = '1.0',
        executionRole=EXECUTION_ROLE,
        databases=[{ 'databaseName': DB_NAME, 'dataviewName': DBVIEW_NAME }],
        scalingGroupConfiguration={
#            'memoryLimit': 1*1024,
            'memoryReservation': 6,
            'nodeCount': 2,
            'scalingGroupName': SCALING_GROUP_NAME,
        },
        clusterDescription="Created with create_all notebook",
        code=CODE_CONFIG,
        initializationScript=HDB_INIT_SCRIPT,
        commandLineArguments=HDB_CMD_ARGS,
        azMode=AZ_MODE,
        availabilityZoneId=AZ_ID,
        vpcConfiguration={ 
            'vpcId': VPC_ID,
            'securityGroupIds': SECURITY_GROUPS,
            'subnetIds': SUBNET_IDS,
            'ipAddressType': 'IP_V4' 
        }
    )

Creating: HDB_basictickdb


## Create Gateway (GW) Cluster
The Gateway will handle client queries for data in the RDB and HDB. Gateways act as single API access point for data queries and will query both the RDB and HDB and aggregate results back to requestor.

### APIs used
[create_kx_cluster](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/create_kx_cluster.html) 

#### Notes
- Gateways connect to other clusters and aggregate results   
  - No databases, tickerplantLogConfiguration, or savedownStorageConfiguration arguments
- execution role required, role is used when connecting to other clusters  


In [18]:
# does cluster already exist?
resp = get_kx_cluster(client, environmentId=ENV_ID, clusterName=GW_CLUSTER_NAME)

if resp is not None:
    print(f"Cluster: {GW_CLUSTER_NAME} already exists")
else:
    print(f"Creating: {GW_CLUSTER_NAME}")

    resp = client.create_kx_cluster(
        environmentId=ENV_ID, 
        clusterName=GW_CLUSTER_NAME,
        clusterType='GATEWAY',
        releaseLabel = '1.0',
        scalingGroupConfiguration={
#            'memoryLimit': 1*1024,
            'memoryReservation': 6,
            'nodeCount': 1,
            'scalingGroupName': SCALING_GROUP_NAME,
        },
        clusterDescription="Created with create_all notebook",
        executionRole=EXECUTION_ROLE,
        code=CODE_CONFIG,
        initializationScript=GW_INIT_SCRIPT,
        commandLineArguments=GW_CMD_ARGS,
        azMode=AZ_MODE,
        availabilityZoneId=AZ_ID,
        vpcConfiguration={ 
            'vpcId': VPC_ID,
            'securityGroupIds': SECURITY_GROUPS,
            'subnetIds': SUBNET_IDS,
            'ipAddressType': 'IP_V4' 
        }
    )

Creating: GATEWAY_basictickdb


## Create Realtime Database (RDB)
The RDB will subscribe to the tickerplant and capture real time data published by the tickerplant (as published by the feedhandler).

Since the RDB clusters depend on the TP cluster, will check that its up before creating the RDBs.

### APIs used
[create_kx_cluster](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/create_kx_cluster.html) 

####  Notes
- **databases:**  must include database and view   
  - RDB will update the dbview of the database as part of end of day processing
- **savedownStorageConfiguration:** defines storage used   
  - End of day data is first saved to this location before updating the database 

In [19]:
# TP must be running before creating the RDBs
wait_for_cluster_status(client, environmentId=ENV_ID, clusterName=TP_CLUSTER_NAME, show_wait=True)

print("TP is running")

Cluster: TP_basictickdb status is CREATING, total wait 0:00:00, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:00:30, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:01:00, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:01:30, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:02:00, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:02:30, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:03:00, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:03:30, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:04:00, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:04:30, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:05:00, waiting 30 sec ...
Cluster: TP_basictickdb status is CREATING, total wait 0:05:30, waiting 30 sec ...
Clus

In [20]:
# does cluster already exist?
resp = get_kx_cluster(client, environmentId=ENV_ID, clusterName=RDB_CLUSTER_NAME)

if resp is not None:
    print(f"Cluster: {RDB_CLUSTER_NAME} already exists")
else:
    print(f"Creating: {RDB_CLUSTER_NAME}")

    resp = client.create_kx_cluster(
        environmentId=ENV_ID, 
        clusterName=RDB_CLUSTER_NAME,
        clusterType='RDB',
        releaseLabel = '1.0',
        executionRole=EXECUTION_ROLE,
        databases=[{ 'databaseName': DB_NAME }], #, 'dataviewName': DBVIEW_NAME }],
        scalingGroupConfiguration={
#            'memoryLimit': 1*1024,
            'memoryReservation': 6,
            'nodeCount': 1,
            'scalingGroupName': SCALING_GROUP_NAME,
        },
        savedownStorageConfiguration ={ 'volumeName': VOLUME_NAME },
        clusterDescription="Created with create_all notebook",
        code=CODE_CONFIG,
        initializationScript=RDB_INIT_SCRIPT,
        commandLineArguments=RDB_CMD_ARGS,
        azMode=AZ_MODE,
        availabilityZoneId=AZ_ID,
        vpcConfiguration={ 
            'vpcId': VPC_ID,
            'securityGroupIds': SECURITY_GROUPS,
            'subnetIds': SUBNET_IDS,
            'ipAddressType': 'IP_V4' 
        }
    )

Creating: RDB_basictickdb


## Create Complex Event Processor (CEP)
The CEP is similar to the RDB, and will subscribe to the tickerplant to capture real time data, however it will also produce/publish derived data that can be subscribed to.

### APIs used
[create_kx_cluster](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/finspace/client/create_kx_cluster.html) 

#### Notes
- Connects to TP clusters, subscribesfor data and publishes its calculations   
  - No databases, tickerplantLogConfiguration, or savedownStorageConfiguration needed
- execution role required, role is used when connecting to TP cluster   


In [21]:
# does cluster already exist?
resp = get_kx_cluster(client, environmentId=ENV_ID, clusterName=CEP_CLUSTER_NAME)

if resp is not None:
    print(f"Cluster: {CEP_CLUSTER_NAME} already exists")
else:
    print(f"Creating: {CEP_CLUSTER_NAME}")
    
    resp = client.create_kx_cluster(
        environmentId=ENV_ID, 
        clusterName=CEP_CLUSTER_NAME,
        clusterType='RDB',
        releaseLabel = '1.0',
        executionRole=EXECUTION_ROLE,
        scalingGroupConfiguration={
#            'memoryLimit': 1*1024,
            'memoryReservation': 6,
            'nodeCount': 1,
            'scalingGroupName': SCALING_GROUP_NAME,
        },
        clusterDescription="Created with create_all notebook",
        code=CODE_CONFIG,
        initializationScript=CEP_INIT_SCRIPT,
        commandLineArguments=CEP_CMD_ARGS,
        azMode=AZ_MODE,
        availabilityZoneId=AZ_ID,
        vpcConfiguration={ 
            'vpcId': VPC_ID,
            'securityGroupIds': SECURITY_GROUPS,
            'subnetIds': SUBNET_IDS,
            'ipAddressType': 'IP_V4' 
        }
    )

#display(resp)

Creating: CEP_basictickdb


# List All Clusters
List all clusters, but first be sure all are in running state.

In [22]:
# Wait for all clusters be in running state
for c in all_clusters.values():
    wait_for_cluster_status(client, environmentId=ENV_ID, clusterName=c, show_wait=True)

print("** ALL CLUSTERS DONE **")

Cluster: TP_basictickdb status is now RUNNING, total wait 0:00:00
Cluster: RDB_basictickdb status is PENDING, total wait 0:00:00, waiting 30 sec ...
Cluster: RDB_basictickdb status is CREATING, total wait 0:00:30, waiting 30 sec ...
Cluster: RDB_basictickdb status is CREATING, total wait 0:01:00, waiting 30 sec ...
Cluster: RDB_basictickdb status is CREATING, total wait 0:01:30, waiting 30 sec ...
Cluster: RDB_basictickdb status is CREATING, total wait 0:02:00, waiting 30 sec ...
Cluster: RDB_basictickdb status is CREATING, total wait 0:02:30, waiting 30 sec ...
Cluster: RDB_basictickdb status is CREATING, total wait 0:03:00, waiting 30 sec ...
Cluster: RDB_basictickdb status is CREATING, total wait 0:03:30, waiting 30 sec ...
Cluster: RDB_basictickdb status is CREATING, total wait 0:04:00, waiting 30 sec ...
Cluster: RDB_basictickdb status is CREATING, total wait 0:04:30, waiting 30 sec ...
Cluster: RDB_basictickdb status is CREATING, total wait 0:05:00, waiting 30 sec ...
Cluster: RD

In [23]:
cdf = get_clusters(client, environmentId=ENV_ID)

if cdf is not None:
    cdf = cdf[cdf['clusterName'].isin(all_clusters.values())]

display(cdf)

Unnamed: 0,clusterName,status,clusterType,capacityConfiguration,commandLineArguments,clusterDescription,lastModifiedTimestamp,createdTimestamp,databaseName,cacheConfigurations
0,CEP_basictickdb,RUNNING,RDB,,"[{'key': 's', 'value': '2'}, {'key': 'g', 'value': '1'}, {'key': 'tp', 'value': 'TP_basictickdb'}, {'key': 'AWS_ZIP_DEFAULT', 'value': '17,2,6'}]",Created with create_all notebook,2024-08-08 19:30:09.273000+00:00,2024-08-08 19:14:36.519000+00:00,,
1,GATEWAY_basictickdb,RUNNING,GATEWAY,,"[{'key': 's', 'value': '2'}, {'key': 'g', 'value': '1'}, {'key': 'rdb_name', 'value': 'RDB_basictickdb'}, {'key': 'hdb_name', 'value': 'HDB_basictickdb'}]",Created with create_all notebook,2024-08-08 19:13:43.443000+00:00,2024-08-08 19:01:48.261000+00:00,,
2,HDB_basictickdb,RUNNING,HDB,,"[{'key': 's', 'value': '2'}, {'key': 'g', 'value': '1'}]",Created with create_all notebook,2024-08-08 19:13:43.587000+00:00,2024-08-08 19:01:45.488000+00:00,basictickdb,
3,RDB_basictickdb,RUNNING,RDB,,"[{'key': 's', 'value': '2'}, {'key': 'g', 'value': '1'}, {'key': 'tp', 'value': 'TP_basictickdb'}, {'key': 'procName', 'value': 'RDB_basictickdb'}, {'key': 'volumeName', 'value': 'RDB_TP_SHARED'}, {'key': 'hdbProc', 'value': 'HDB_basictickdb'}, {'key': 'dbView', 'value': 'basictickdb_DBVIEW'}, {'key': 'AWS_ZIP_DEFAULT', 'value': '17,2,6'}]",Created with create_all notebook,2024-08-08 19:30:09.801000+00:00,2024-08-08 19:14:33.629000+00:00,basictickdb,[]
4,TP_basictickdb,RUNNING,TICKERPLANT,,"[{'key': 'procName', 'value': 'TP_basictickdb'}, {'key': 'volumeName', 'value': 'RDB_TP_SHARED'}, {'key': 'AWS_ZIP_DEFAULT', 'value': '17,2,6'}, {'key': 'g', 'value': '1'}]",Created with create_all notebook,2024-08-08 19:13:59.200000+00:00,2024-08-08 18:55:06.046000+00:00,,


# Start FeedHandler
With all clusters running start a feedhandler to send data to the running tickerplant (TP).

### Environment/Configuration Details
Environment variable QHOME is set to where q is locally to this notebook.   
Environment varuable SSL_VERIFY_SERVER is set to 0 so as not to verify the connection to the TP.

### Console Example
```
$ TP_CONN="<connection string to cluster>"
$ cd basictick
$ q kxtaqfeed.q -g 1 -p 5030 -tp $TP_CONN
```

Here we use Python to get the connection string, set environment variables, and run the feedhandler.

In [24]:
# get the connection string to the TP cluster
conn_str = get_kx_connection_string(client, environmentId=ENV_ID, clusterName=TP_CLUSTER_NAME, userName=KDB_USERNAME, boto_session=session)

# populate the environment variable with connection string
os.putenv("CONN_STR", conn_str)

feed_debug = 0

# start q process kxtaqfeed to connect to the TP at $TP_CONN

if os.getenv('QHOME') is not None:
    pid=subprocess.Popen(f'cd {CODEBASE}; nohup $QHOME/l64/q kxtaqfeed.q -g 1 -p {FH_PORT} -tp $CONN_STR -debug {feed_debug} -t {FEED_TIMER}', shell=True)

else:
    print("Environment variable QHOME is not set, please set to where kdb is installed")

# wait for feedhandler to start doing its thing
time.sleep(2)

"connected to tp"
()


In [25]:
# Check Feedhandler connections, should show connected (1b, e.g. True)
fh=kx.QConnection(port=FH_PORT)

display( fh ("select process,connected,handle,address from .conn.procs") )

Unnamed: 0,process,connected,handle,address
,,,,
0.0,tp,1b,5i,:tcps://vpce-096c514436f930ded-zvyfok7h.vpce-svc-023944f523c6e1c21.us-east-1.vpce.amazonaws.com:443:bob:Host=vpce-096c514436f930ded-zvyfok7h.vpce-svc-023944f523c6e1c21.us-east-1.vpce.amazonaws.com&Port=443&User=bob&Action=finspace%3AConnectKxCluster&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEDwaCXVzLWVhc3QtMSJHMEUCIQDJTbcWftvFGCT%2BJf%2B86PKTaz4uBcmqPdKfy0eSTIRxwwIgbJUw6WQ9o%2BaCHoKWXwUNM6HND0ARA0aC9Xpn7gO%2BQAkq9gIINRAAGgw4Mjk4NDU5OTg4ODkiDN8t0pyJpUaWRThr7yrTAoGvP00rSHjLVoc%2BBEd3E7LHVVamuUngALX0duNQOUIpl8L0%2Fu3eeqQaywe%2Bs0jYH02g03vNcso%2BsQbrmqEz8v4L%2Fx3lPfMiRxR80eAscALWP%2F0aPSgMJqlSOPzOjpRfeDKpvgUf%2FmpGFvWi8tzo2WXUWKKndJ6xyKqq0s1SXtZs%2Be982xBsyvbRv9RG6qsB1MNBlFCLXm0LdaODd1vFYuiss%2FFCN2hCN65lkFKw%2Fu1xvWEPfWYwfHppuip1CqgNwW%2BVNCOPQBvuaFRYe%2BO90gzHx7QaZTtB%2BP39GoXNOLIWoWjOQa9q%2Fz%2F5NMyL7p9rWgWza4wtzSo7MQymmLb2gefgC0LyOKplMdhAv%2Fmr0%2BYE51d7mkOCGzt8UZsF6I%2BCnH0p58DYGrdITfXVWIqYeUPl6iiS51XPtMlSDoqqQZzIb3HHx7wAhEYfXHy3GMLaC5D1izDPudS1Bjq%2FAQ0kf9%2BH0i5yanZ520dyO%2Fm9gOGM87if7HnUWjRbwTABNFaMUSQLSpRfff1%2FV1q0dJZ7fKHDQyj8vF57q8ct%2FCsUKBXRV8RC3zmvAQ%2FCHkQYfU1MFi%2BXjd2VGKlbuby6xOUc5RivFbgsZ08yajs2PQPhn9Ym6sOojkYFz0ti0BSkcfbSaPpmbPBD6S7Gk8uRIJ6aUdw7F2JhFcGRWqbeOvfHiY1xmuBWoUzN75XyTB24pRCNe4ruJdmB19cJKlk6&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240808T193023Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA4CNVNBUU6N6MHID6%2F20240808%2Fus-east-1%2Ffinspace-apricot%2Faws4_request&X-Amz-Signature=2e2ac20ab1dae7b444e79909fe3c04ca39a1205190f032935fa9f5e4a3fa9641


# All Processes Running
All completed.

In [26]:
print( f"Last Run: {datetime.datetime.now()}" )

Last Run: 2024-08-08 19:30:25.559770
