# YCQL Intermediate Development Lab

In this lab, you will learn about how to use indexes to optimize the data model, validate data, and improve the performance of the database. You will first learn how to measure the performance of queries, then how to create indexes to make the queries more efficient and performant at scale.

## Setup steps
Here are the steps to setup this lab:
- Import the notebook variables
- Create the `db_ybu` database
- Import the data sql scripts

### Install missing dependencies and restart the notebook
Run the following cell to ensure that the notebook dependencies are available to the notebook. 

### Create the notebook variables 

> IMPORTANT!
> 
> Do NOT skip running this cell. 
> 

The following Python cell creates and stores variables that all the notebooks in this lab will use. You can view these variables in the Jupyter tab.

- To run the script, select Execute Cell (Play Arrow) in the left gutter of the cell.
- Verify the accuracy of the output values

In [1]:
%store MY_DB_NAME
%store MY_YB_PATH
%store MY_GITPOD_WORKSPACE_URL
%store MY_HOST_IPv4_01
%store MY_HOST_IPv4_02
%store MY_HOST_IPv4_03
%store MY_NOTEBOOK_DIR
%store MY_TSERVER_WEBSERVER_PORT
%store MY_NOTEBOOK_DATA_FOLDER
%store MY_DATA_DDL_FILE
%store MY_DATA_DML_FILE

Stored 'MY_DB_NAME' (str)
Stored 'MY_YB_PATH' (str)
Stored 'MY_GITPOD_WORKSPACE_URL' (NoneType)
Stored 'MY_HOST_IPv4_01' (str)
Stored 'MY_HOST_IPv4_02' (str)
Stored 'MY_HOST_IPv4_03' (str)
Stored 'MY_NOTEBOOK_DIR' (str)
Stored 'MY_TSERVER_WEBSERVER_PORT' (str)
Stored 'MY_NOTEBOOK_DATA_FOLDER' (str)


UsageError: Unknown variable 'MY_NOTEBOOK_UTILS_FOLDER'


#### Create Keyspace: db_ybu
Now that the environment has been properly configured and the cluster has been created, you can begin by creating a keyspace.

In [None]:
# import Path and DB name
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"  
YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

# drop and create
./bin/ycqlsh --execute "DROP KEYSPACE IF EXISTS "${DB_NAME}";"
 
./bin/ycqlsh --execute "CREATE KEYSPACE "${DB_NAME}" 
  WITH REPLICATION = 
  { 'class': 'SimpleStrategy', 'replication_factor': 1 };" 


# Connect to Keyspace
./bin/ycqlsh --execute "USE "${DB_NAME}";"

# List keyspaces, validate keyspace creation
./bin/ycqlsh --execute "DESCRIBE TABLES"

In the cell above, the YCQL shell was accessed by using the ycqlsh binary to create the YCQL shell. Once connected YCQL DDL, DML, and shell commands can be used to create the keyspaces and tables. The USE keyword assigns the active keyspace. The DESCRIBE keyword displays the attribute names, data types, and other relations associated with th 

## Load DDL and Data from File
Load SQL statements from a script
This will load 

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_DATA_FOLDER" "$MY_DATA_DDL_FILE" "$MY_DATA_DML_FILE"   
# Wishlist
YB_PATH=${1}
DB_NAME=${2}
DATA_FOLDER=${3}
DATA_DDL_FILE=${4}
DATA_DML_FILE=${5}

WISHLIST_DDL_PATH=${DATA_FOLDER}/${DATA_DDL_FILE}
WISHLIST_DML_PATH=${DATA_FOLDER}/${DATA_DML_FILE}
echo $WISHLIST_DDL_PATH
echo $WISHLIST_DML_PATH
cd $YB_PATH

# DDL file
./bin/ycqlsh -k ${DB_NAME} -f ${WISHLIST_DDL_PATH} 
sleep 1;

# # DML file
./bin/ycqlsh -k ${DB_NAME} -f ${WISHLIST_DML_PATH} 
sleep 1;

# # Describe relations
./bin/ycqlsh --execute "DESCRIBE TABLES"

Validate the data was loaded properly from the SQL scripts.

### Query Plans

Evaluate the different scan types that evaluate the efficiency of database operations.

In [None]:
%%bash -s "$MY_YB_PATH"   # Sequential Scan
YB_PATH=${1}
cd $YB_PATH

# drop and create
./bin/ycqlsh --execute "EXPLAIN SELECT * FROM db_ybu.tbl_products_by_category;"  
# ./bin/ycqlsh --execute "EXPLAIN SELECT * FROM db_ybu.tbl_products_by_category WHERE product_name=?;"  
# ./bin/ycqlsh --execute "EXPLAIN SELECT * FROM db_ybu.tbl_products_by_category WHERE product_name=?"  
#
# ./bin/ycqlsh --execute "EXPLAIN SELECT SUM(quantity) as subtotal FROM db_ybu.tbl_products_by_wishlist where wishlist_id = ?;"  

### Secondary Indexes 

In this demo, you will create an index and compare the difference between query plans.

In [None]:
%%bash -s "$MY_YB_PATH"   # Sequential Scan
YB_PATH=${1}
cd $YB_PATH

# drop and create
./bin/ycqlsh --execute "CREATE INDEX idx_products_by_name ON db_ybu.tbl_products_by_category (product_name) INCLUDE (price, description);"   

# ./bin/ycqlsh --execute "DESC db_ybu.tbl_products_by_category"

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "EXPLAIN SELECT * FROM db_ybu.tbl_products_by_category WHERE product_name=?;"  
# ./bin/ycqlsh --execute "EXPLAIN SELECT product_name, product_name, price, description  FROM db_ybu.tbl_products_by_category WHERE product_name=?;"  
# ./bin/ycqlsh --execute "DESC db_ybu.tbl_products_by_category"  

### Unique Indexes
Enable a unique constraint on a attribute's values by creating a secondary index.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "CREATE UNIQUE INDEX idx_unique_product_by_id 
  ON db_ybu.tbl_products_by_category(product_id)
  INCLUDE(description);"  
  
# ./bin/ycqlsh --execute "DESC db_ybu.tbl_products_by_category"  

# ./bin/ycqlsh --execute "EXPLAIN SELECT product_name as name, description, price, category FROM db_ybu.tbl_products_by_category WHERE price > 30 AND category = 'Office Supplies';"

#### Partial Indexes

Use a range partition to reduce the amount of data that requires scanning.
In this index, you will partition all products over $30 in the Office Supply category. 

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "CREATE INDEX ON db_ybu.tbl_products_by_category(price) INCLUDE (description, product_name) WHERE price > 30 and category = 'Office Supplies';"  
# ./bin/ycqlsh --execute "DESC db_ybu.tbl_products_by_category"  
# ./bin/ycqlsh --execute "EXPLAIN SELECT product_name as name, description, price, category FROM db_ybu.tbl_products_by_category WHERE %%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "CREATE INDEX ON db_ybu.tbl_products_by_category(price) INCLUDE (description, product_name) WHERE price > 30 and category = 'Office Supplies';"  
# ./bin/ycqlsh --execute "DESC db_ybu.tbl_products_by_category"  
# ./bin/ycqlsh --execute "EXPLAIN SELECT product_name as name, description, price, category FROM db_ybu.tbl_products_by_category WHERE price > 30 AND category = 'Office Supplies';"

#### Collections
More complex data structures allow YCQL to store data sets that offers more flexibility in its data model capabilities. Very important when a trying to reduce the amount of tables that need to be queried.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "ALTER TABLE db_ybu.tbl_products_by_category ADD warehouse_ids LIST<TEXT>;"
# ./bin/ycqlsh --execute "ALTER TABLE db_ybu.tbl_products_by_category ADD tags SET<TEXT>;"
# ./bin/ycqlsh --execute "DESC db_ybu.tbl_products_by_category"
# ./bin/ycqlsh --execute "ALTER TABLE db_ybu.tbl_products_by_category ADD store_locations FROZEN<LIST<TEXT>>;"
# ./bin/ycqlsh --execute "DESC db_ybu.tbl_products_by_category"

#### JSONB Index

JSONB is considered the best way to utilize complex data structures since in YCQL, JSONB is searchable.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "CREATE INDEX idx_sku_details_name ON db_ybu.tbl_products_by_category((sku_details->>'Name'));;"
# ./bin/ycqlsh --execute "DESC  db_ybu.tbl_products_by_category"
# ./bin/ycqlsh --execute "EXPLAIN SELECT category, price, product_id FROM db_ybu.tbl_products_by_category WHERE (sku_details->>'Name') = ?;"

### Time to Live

YCQL offers data expiration. In the context of data modelling, removing "old" or deprecated data can improve database operational costs as well as storage costs.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "CREATE TABLE db_ybu.tbl_todolists_by_user (
user_id BIGINT, todolist_name TEXT, todolist_id UUID, is_public BOOLEAN, 
PRIMARY KEY((user_id), todolist_name)) 
WITH CLUSTERING ORDER BY (todolist_name DESC);"  
# ./bin/ycqlsh --execute "INSERT INTO db_ybu.tbl_todolists_by_user(user_id, todolist_name, todolist_id, is_public) VALUES (9490243, 'Grocery', 2a70494e-6b68-4739-b3e0-ff06aa0a2d67, true) USING TTL 5;"  
# ./bin/ycqlsh --execute "SELECT * FROM db_ybu.tbl_todolists_by_user;"  
# ./bin/ycqlsh --execute "INSERT INTO db_ybu.tbl_wishlists_by_user(user_id, wishlist_id, name, is_public) VALUES ('Mark', 'Grocery', 2a70494e-6b68-4739-b3e0-ff06aa0a2d67, true) USING TTL 5;"  

---
# All done!
In this lab, you completed the following:

- Setup
  - Created the `db_ybu` database with `ycqlsh`
  - Created tables and loaded data using DDL and DML scripts