<div style="width:100%; background-color: #000041"><a target="_blank\" href="http://university.yugabyte.com\"><img src="assets/YBU_Logo.webp" /></a></div>

> Free YugabyteDB YCQL Development course at Yugabyte University
>
> You can sign up for at [YugabyteDB YCQL Development](https://university.yugabyte.com/courses/yugabytedb-ycql-development).
>


# Query-driven data model
In this notebook, you will learn about how to use indexes to optimize the data model, validate data, and improve query performance. First, you will learn how to measure the performance of queries, then you will learn how to create indexes to make the queries more efficient and performant at scale.

### Import the notebook variables 

> Requirements:
>
> You must first create the variables in the `01_Setup.ipynb` notebook.
>

The following Python cell reads the stored variables created in the `01_Setup.ipynb` notebook. 

- To run the script, select Execute Cell (Play Arrow) in the left gutter of the cell.

In [None]:
%store -r MY_DB_NAME
%store -r MY_YB_PATH
%store -r MY_YB_PATH_DATA
%store -r MY_GITPOD_WORKSPACE_URL
%store -r MY_HOST_IPv4_01
%store -r MY_HOST_IPv4_02
%store -r MY_HOST_IPv4_03
%store -r MY_NOTEBOOK_DIR
%store -r MY_TSERVER_WEBSERVER_PORT
%store -r MY_NOTEBOOK_DATA_FOLDER
%store -r MY_YB_MASTER_HOST_GITPOD_URL
%store -r MY_YB_TSERVER_HOST_GITPOD_URL
%store -r MY_DATA_DDL_FILE
%store -r MY_DATA_DML_FILE

### Create the `ks_ybu` keyspace
Run the following cells to connect to the YugabyteDB cluster using `ycqlsh`. Then, complete the following tasks:
- Create the `ks_ybu` keyspace if it does not exists


Create the keyspace, `ks_ybu`.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" # Create the keyspace, ks_ybu. DB_NAME=ks_ybu.
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH/bin

# shell variable sustituton, DB_NAME=ks_ybu
./ycqlsh -r -e "
  create keyspace if not exists $DB_NAME;
  "

Confirm the keyspace creation.

In [None]:
%%bash -s "$MY_YB_PATH"  "$MY_DB_NAME" # Create the keyspace, ks_ybu. DB_NAME=ks_ybu.
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH/bin

# shell variable sustituton, DB_NAME=ks_ybu
./ycqlsh -r -e "
  describe keyspace $DB_NAME;
  "

Drop the existing tables in the keyspace.


In [None]:
%%bash -s "$MY_YB_PATH"  "$MY_DB_NAME" # Drop the table
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH/bin

#  DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  drop table if exists tbl_employees;
  "

### Run the DDL and DML scripts
Run the following cells to run both DDL and DML files

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_DATA_FOLDER" "$MY_DATA_DDL_FILE"  "$MY_DATA_DML_FILE"  
# Wishlist
YB_PATH=${1}
DB_NAME=${2}
DATA_FOLDER=${3}
DATA_DDL_FILE=${4}
DATA_DML_FILE=${5}

WISHLIST_DDL_PATH=${DATA_FOLDER}/${DATA_DDL_FILE}
WISHLIST_DML_PATH=${DATA_FOLDER}/${DATA_DML_FILE}

cd $YB_PATH/bin

# DDL file
./ycqlsh -k ${DB_NAME} -f ${WISHLIST_DDL_PATH} 
sleep 1;

# DML file
./ycqlsh -k ${DB_NAME} -f ${WISHLIST_DML_PATH} 
sleep 1;

Describe the tables.

In [None]:
%%bash -s "$MY_YB_PATH"  "$MY_DB_NAME" # describe the tables from the DDL script
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH/bin

# shell variable sustituton, DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  describe tables;
  "

Validate the data was loaded properly from the SQL scripts.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_DATA_FOLDER" "$MY_DATA_DML_FILE"   
# Wishlist
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH/bin

# shell variable sustituton, DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_products_by_brand;
  "

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_DATA_FOLDER" "$MY_DATA_DML_FILE"   
# Wishlist
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH/bin

# shell variable sustituton, DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_products_by_category;
  "

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_DATA_FOLDER" "$MY_DATA_DML_FILE"   
# Wishlist
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH/bin

# shell variable sustituton, DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_products_by_wishlist;
  "

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_DATA_FOLDER" "$MY_DATA_DML_FILE"   
# Wishlist
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH/bin

# shell variable sustituton, DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_wishlists_by_user;
  "

---
## Execution plan

In YCQL, an execution plan, known as a query plan, is similar in style to a YSQL query plan. A query plan consists of execution nodes. A plan node represents an action and may include one or more sub-actions. An action refers to a specific, internal operation. Nodes can be nested. Nested nodes are executed from the inside out. This means that the innermost node is executed before an outer node. This can be best thought of as a nested function call where the inner node returns its results to the outer node, often in a loop.

To see an example oq query plan, run the following cell:

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # # Query plan: Sequential scan  with filter
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  explain select sum(quantity) as total_items
  from tbl_products_by_wishlist
  where quantity > 1;
"    

In the above output, the query plan shows that this is an aggregate query. An aggregate query exhibits a nested node. The innermost node is the sequence scan operation. The sequence scan applies a filter sub-action for a conditional expression. The sequence scan operation returns the subtotal value to the outermost operation, the aggregation operation.


## Query plan: Sequence scan
A query that results in a sequential scan of a very large table can be very costly in YugabyteDB. For a sequential scan operation, each tablet must perform a seek operation. A seek operation requires CPU, memory, and disk operations. A seek operation consists of reading data from a SST file. The SST file contains the DocKey and document value. The SST file format consists of data and meta blocks. A sequential scan reads all the data blocks within a SST file. 

Coordinating the tablet operations and gathering tablet results require additional network, CPU, and memory consumption. The topology of a cluster can increase the network latency related to coordinating tablet operations and gathering tablet results. For these reasons, a sequential scan consumes numerous computing resources and often results in very poor query performance, especially for very large tables with billions of rows.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # Query plan: Sequential scan  
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  explain select * from tbl_products_by_category;
"  

### Query plan: Primary key lookup
In this following example, the where expression contains three conditional expressions. Each conditional expression contains a primary key column and an equality operator. Because all three parts of the primary key are in the where expression and include equality operators, the query is a primary key lookup query. A primary key lookup is guaranteed to return zero or one row.


In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # # Query plan: Sequential scan  with filter
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  explain select category, product_name, product_id, brand, price, discount, description, gtin
  from tbl_products_by_category 
  where category = 'H20'
    and product_name = 'Talc 5'
    and product_id = 62362;
"    

### Primary key lookup: Tablet
A primary key lookup operation uses the DocKey to determine the location of the row. The partition key hash encoded value of the DocKey informs the operation. The query reads from the tablet that contains the partition key hash. The clustering keys order the data on disk for the given encoded hash. The primary key lookup performs a single seek in the SST file of the related tablet in RocksDB. This is known as a single-key query. The Key-Conditions and Filter sub-actions are semantic artifacts of the query planner. There are no related operations for these sub-actions. 

#### Select a YB-TServer host
Set the host variable for one of the nodes. All three nodes in the cluster are running a Tablet Server (YB-TServer). You can comment/uncomment lines 7-9 as needed.


In [13]:
%%bash -s "$MY_HOST_IPv4_01" "$MY_HOST_IPv4_02" "$MY_HOST_IPv4_03" --out MY_HOST_IPv4
HOST_IPv4_01=$( echo "${1}" | tr -d " ")
HOST_IPv4_02=$( echo "${2}" | tr -d " ")
HOST_IPv4_03=$( echo "${3}" | tr -d " ")

# change the hosts for different tablet leaders
#MY_HOST_IPv4=$HOST_IPv4_01
MY_HOST_IPv4=$HOST_IPv4_02
#MY_HOST_IPv4=$HOST_IPv4_03

echo ${MY_HOST_IPv4}

Store the select host variable.

In [None]:
%store MY_HOST_IPv4
print(MY_HOST_IPv4)

Save the table name as a variable.

In [None]:
MY_OBJECT_NAME="tbl_products_by_category"
%store MY_OBJECT_NAME
print(MY_OBJECT_NAME)

Grep the Table_ID for the the table using `curl` and `jq`.

> Note:
> If your are running locally, this cell requires `jq`. 
> To install for your local OS, try the following:
> - Ubuntu: 
>   - `sudo apt-get install jq`
> - OS X:
>   - `brew install jq`
> 

In [None]:
%%bash -s "$MY_OBJECT_NAME" "$MY_HOST_IPv4"  "$MY_DB_NAME"  "$MY_TSERVER_WEBSERVER_PORT"  --out MY_TABLE_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
DB_NAME=$( echo "${3}" | tr -d " ")
TSERVER_WEBSERVER_PORT=$( echo "${4}" | tr -d " ")


MY_URL="http://${HOST_IPv4}:${TSERVER_WEBSERVER_PORT}/metrics"

MY_TABLE_ID=`curl -s --compressed ${MY_URL} | jq -r 'limit(1;  .[] | select(.attributes.namespace_name=="'${DB_NAME}'" and .type=="tablet" and .attributes.table_name=="'${OBJECT_NAME}'") |  .attributes.table_id) '`

echo ${MY_TABLE_ID}

Store the table_id for the table.

In [None]:
%store MY_TABLE_ID
print(MY_TABLE_ID)

Get the tablet_id for the tablet leader for the select node host.

In [None]:
%%bash -s "$MY_OBJECT_NAME" "$MY_HOST_IPv4" --out MY_TABLET_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")

MY_URL="http://${HOST_IPv4}:8200/metrics"

TABLET_ID=`curl -s --compressed ${MY_URL} | jq --raw-output ' .[] | select(.attributes.namespace_name=="ks_ybu" and .type=="tablet" and .attributes.table_name=="'$OBJECT_NAME'") | {tablet_id: .id, metrics: .metrics[] | select(.name == ("is_raft_leader") ) | select(.value == 1) } | select(.tablet_id) | {tablet_id} | .tablet_id '`

echo ${TABLET_ID}

Store the tablet_id for the tablet leader.

In [None]:
%store MY_TABLET_ID
print(MY_TABLET_ID)

Flush the WAL file to a SST file for the given table_id.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_HOST_IPv4" "$MY_TABLE_ID"  # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
TABLE_ID=$( echo "${3}" | tr -d " ")
cd $YB_PATH/bin

./yb-admin -init_master_addrs ${HOST_IPv4}:7100 flush_table_by_id ${TABLE_ID} 600

Dump and decode the SST file in human-readable form.

> Note:
>
> If the following does dump the SST file, it is most likely that there are not any rows written to this tablet. To resolve this issue, you need to select a different Tablet Server host. Return back to [Select a YB-TServer host] and select a different node host.

In [18]:
%%bash -s "$MY_YB_PATH" "$MY_YB_PATH_DATA" "$MY_TABLE_ID" "$MY_TABLET_ID" # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
YB_PATH_DATA=$( echo "${2}" | tr -d " ")
TABLE_ID=$( echo "${3}" | tr -d " ")
TABLET_ID=$( echo "${4}" | tr -d " ")

cd $YB_PATH/bin/

TABLE_ID_PATH=${YB_PATH_DATA}/node-1/disk-1/yb-data/tserver/data/rocksdb/table-${TABLE_ID}/tablet-${TABLET_ID}
#ls -l  ${TABLE_ID_PATH}

./sst_dump --command=scan --file=${TABLE_ID_PATH} --output_format=decoded_regulardb | grep 62362

SubDocKey(DocKey(0xa09c, ["H20"], ["Talc 5", 62362]), [SystemColumnId(0); HT{ physical: 1672260278581154 }]) -> null
SubDocKey(DocKey(0xa09c, ["H20"], ["Talc 5", 62362]), [ColumnId(3); HT{ physical: 1672260278581154 w: 1 }]) -> "Yeah"
SubDocKey(DocKey(0xa09c, ["H20"], ["Talc 5", 62362]), [ColumnId(4); HT{ physical: 1672260278581154 w: 2 }]) -> 9.99
SubDocKey(DocKey(0xa09c, ["H20"], ["Talc 5", 62362]), [ColumnId(5); HT{ physical: 1672260278581154 w: 3 }]) -> 7
SubDocKey(DocKey(0xa09c, ["H20"], ["Talc 5", 62362]), [ColumnId(6); HT{ physical: 1672260278581154 w: 4 }]) -> "1 liter"
SubDocKey(DocKey(0xa09c, ["H20"], ["Talc 5", 62362]), [ColumnId(7); HT{ physical: 1672260278581154 w: 5 }]) -> "006236226326"


> Question:
>
> The query predicate is:
> 
> ```where category = 'H20' and product_name = 'Talc 5' and product_id = 62362```
> 
> How can you find the partition key hash in the SST dump for the query predicate?
> 
> Hint:
>
> Modify the "Select a YB-TServer host" cell to use a a different YB-TServer. Then, rerun the cells.

TODO: Continue from here!

In [None]:
%%bash -s "$MY_YB_PATH"  "$MY_DB_NAME"  # Query plan: aggregate function
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_products_by_wishlist;
  
"  

### Secondary Indexes 
In this section, you will create an index and determine how this affects the efficiency of a query by comparing the query plans.

Run the following cell to create and describe an index by describing the tbl_products_by_category table.

In [None]:
%%bash -s "$MY_YB_PATH" 
YB_PATH=${1}
cd $YB_PATH

# Create a secondary index, product_name
./bin/ycqlsh --execute "
create index idx_products_by_name 
  on ks_ybu.tbl_products_by_category (product_name) 
  include (description);
"   

# Verfiy the index has been created for the tbl_products_by_category
./bin/ycqlsh --execute "
  DESC ks_ybu.tbl_products_by_category
"

In [None]:
%%bash -s "$MY_YB_PATH" 
YB_PATH=${1}
cd $YB_PATH


# Verfiy the index has been created for the tbl_products_by_category
./bin/ycqlsh --execute "
  describe ks_ybu.tbl_products_by_category
"

In [None]:
%%bash -s "$MY_YB_PATH" 
YB_PATH=${1}
cd $YB_PATH

# Verfiy the index has been created for the tbl_products_by_category
./bin/ycqlsh --execute "
  select partition_hash(category), category from ks_ybu.tbl_products_by_category;
"

In [None]:
%%bash -s "$MY_YB_PATH" 
YB_PATH=${1}
cd $YB_PATH

# Verfiy the index has been created for the tbl_products_by_category
./bin/ycqlsh --execute "
  select partition_hash(name), category from ks_ybu.tbl_products_by_name;
"

#### Verify the query plan for a secondary index
In the previous cell, a secondary index was created so that the product name can be efficiently queried on the tbl_products_by_category table. The DESC keyword is used to verify if the index was created correctly. 

> **Important:** The attributes in the primary key remain a part of the secondary index and can be retrieved from the index without an extra trip to the primary table, improving operational efficiency. 

The description attribute was added using the `INCLUDE` keyword to the secondary index to create a covering index.

Run the following cell to verify if adding a secondary index for the `product_name` can make the product_name query on the `tbl_products_by_category` more efficient.

In [None]:
%%bash -s "$MY_YB_PATH"   # Sequential Scan
YB_PATH=${1}
cd $YB_PATH

# Expose the query plan for a covering index
./bin/ycqlsh --execute "
  EXPLAIN SELECT product_name, category, price, description, product_id 
  FROM ks_ybu.tbl_products_by_category where product_name=?;
"  

#### Index Only Scan
By running the previous cell, you were able to determine that by adding a secondary index for the condition attribute, `product_name`, the query plan has changed from a sequential scan to an index only scan. This has improved this query's performance by orders of magnitude when accounting for large scale workloads. 

The `INCLUDE` clause plays an essential role in making queries more efficient and indexes more useful.  In the previous cell, the `product_name` was added to the index by using the `INCLUDE` clause. Now, the secondary index contains the values of the `product_name` attribute, creating a covering index, reducing a trip to the primary table, also known as the heap in PostgreSQL.

For example, if you had not added the `description` attribute in the `INCLUDE` clause, that would have made the query plan an index scan. This means that in our example, although the index was used to locate product_name, the primary table was still accessed to retrieve the data for the `description` column.

### Unique Indexes
You can disallow a column from having duplicate values by using a unique constraint as shown in the following cell.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "
  CREATE UNIQUE INDEX idx_unique_product_by_id 
  ON ks_ybu.tbl_products_by_category(product_id)
  INCLUDE(description);
"  
  
./bin/ycqlsh --execute "
  DESC ks_ybu.tbl_products_by_category
"  
# This statement will cause an error since duplicate values for the 
# product_id are not allowed, 87c7624a-4af5-4347-922d-ab43ab32476b.
./bin/ycqlsh --execute "
  INSERT INTO ks_ybu.tbl_products_by_category (
    product_name, 
    description, 
    price, 
    category,
    product_id 
  ) VALUES (
    'Guard dogs',
    'Doberman Pinchers',
    643.99,
    'Security',
    87c7624a-4af5-4347-922d-ab43ab32476b
  );
"

>**Important:** Notice the error in the result set: `Execution Error. Duplicate value disallowed by unique index idx_unique_product_by_id`. This verifies that the unique index that was created for the `product_id` is preventing a duplicate value for this attribute. Currently the same `product_id` identitifies a backpack and cannot be used again for a different product.

#### Partial Indexes

Use a range partition to reduce the amount of data that requires scanning.
In this index, you will partition all products over $30 in the Office Supply category. 

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "
  CREATE INDEX ON ks_ybu.tbl_products_by_category(price) 
  INCLUDE (description, product_name) 
  WHERE price > 30 and category = 'Office Supplies';
"

./bin/ycqlsh --execute "
  DESC ks_ybu.tbl_products_by_category
"  

### Specific Indexes
An index can be created that specifies a particular value of an attribute. For instance, suppose that a popular item in our ecommerce site is searched for constantly. It may be beneficial to create an index that satisfies this query quickly and efficiently. In the following cell, there is an example of an index for a popular product, batteries.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Specific Index
./bin/ycqlsh --execute "
  CREATE INDEX idx_batteries_desc 
  ON ks_ybu.tbl_products_by_category (description) 
  INCLUDE (product_name) 
  WHERE description='Batteries';
"
#  Index Only Scan
./bin/ycqlsh --execute "
  EXPLAIN SELECT 
    description, 
    product_name, 
    product_id, 
    category, 
    price 
  FROM ks_ybu.tbl_products_by_category 
  WHERE description='Batteries';
"  

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

#  Seq Scan
./bin/ycqlsh --execute "
  EXPLAIN SELECT description, product_name, product_id, category, price FROM ks_ybu.tbl_products_by_category 
  WHERE description='Hotdogs';
"  

>**Important:** Note that in the previous cell, an index only scan will occur only if the WHERE clause predicate returns the same expression as originally stated in the index creation statement.

Run the following statement to verify that a query to a different expression will not use the index to satisfy the query.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

#  Seq Scan
./bin/ycqlsh --execute "
  EXPLAIN SELECT description, product_name, product_id, category, price FROM ks_ybu.tbl_products_by_category 
  WHERE description=?;
"  

#### Collections
More complex data structures allow YCQL to store data sets that offers more flexibility in its data model capabilities. Very important when a trying to reduce the amount of tables that need to be queried. Run the following cell to add a new column to the category table whose data type is an ordered list with text elements.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Collection: Create a column with a list containing text elements
./bin/ycqlsh --execute "
  ALTER TABLE ks_ybu.tbl_products_by_category 
  ADD warehouse_ids LIST<TEXT>;
"

./bin/ycqlsh --execute "
  DESC ks_ybu.tbl_products_by_category
"

#### Collection: SET
Run the following cell to create a new column on the category table. This new column will be a SET with text elements. 

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Collection: Create a column for a set containing text elements
./bin/ycqlsh --execute "
  ALTER TABLE ks_ybu.tbl_products_by_category 
  ADD tags SET<TEXT>;
"

./bin/ycqlsh --execute "
  DESC ks_ybu.tbl_products_by_category
"

#### Frozen Collections
Run the following cell to crate a frozen collection. In this example a list was frozen. While this value in this column can be dropped, its elements cannot be manipulated.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Frozen Collection
./bin/ycqlsh --execute "
  ALTER TABLE ks_ybu.tbl_products_by_category 
  ADD store_locations FROZEN<LIST<TEXT>>;
"

./bin/ycqlsh --execute "
  DESC ks_ybu.tbl_products_by_category
"

#### JSON

JSONB is considered the best way to utilize complex data structures since in YCQL, JSONB is searchable. This is not true of the other collections in YCQL. Also note that collections can be used in JSON as well. Run the following cell and note the pattern necessary to write a JSON object to a table in YCQL.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "
  INSERT INTO ks_ybu.tbl_products_by_category (
    category, 
    price, 
    product_id, 
    sku_details
  ) VALUES (
    'Grocery',
    9.99,
    6eb8d774-8b03-4457-a8e9-710339ca7165,
    '{
      \"product_id\": \"6eb8d774-8b03-4457-a8e9-710339ca7165\",
      \"warehouse_sku\": \"8jk39d03-8b03-4457-a8e9-710339ca7165\"
    }'
  );
"

./bin/ycqlsh --execute "
  SELECT * FROM ks_ybu.tbl_products_by_category;
"

In the preceding cell, note the syntax necessary to insert a JSON object into the table. Outside the curly brackets are single quotes, where inside the brackets there are double quotes.

#### JSONB Index

JSONB is considered the best way to utilize complex data structures since in YCQL, JSONB is searchable.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "
  CREATE INDEX idx_sku_details_name 
  ON ks_ybu.tbl_products_by_category((sku_details->>'Name'));
"

./bin/ycqlsh --execute "
  DESC  ks_ybu.tbl_products_by_category
"

./bin/ycqlsh --execute "
  EXPLAIN SELECT category, price, product_id FROM ks_ybu.tbl_products_by_category 
  WHERE (sku_details->>'Name')=?;
"

In the preceding cell, note the syntax that is required to create an index using a JSON key. Also note the syntax used to search by the key in a JSON object.

### Time to Live

YCQL offers data expiration. In the context of data modelling, removing "old" or deprecated data can improve database operational costs as well as storage costs.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Query Plan
./bin/ycqlsh --execute "
  CREATE TABLE ks_ybu.tbl_todolists_by_user (
    user_id BIGINT, 
    todolist_name TEXT, 
    todolist_id UUID, 
    is_public BOOLEAN, 
    PRIMARY KEY((user_id), todolist_name)
  ) 
  WITH CLUSTERING ORDER BY (todolist_name DESC);
"

In the following cell, a row is added to the table with a time-to-live of 5 seconds. Once the time has expired, notice that the row is no longer in the table. This is a TTL scoped to the data row.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "
  INSERT INTO ks_ybu.tbl_wishlists_by_user(
      user_id,
      name,  
      wishlist_id,
      is_public
    ) VALUES (
      'Mark', 
      'Grocery', 
      2a70494e-6b68-4739-b3e0-ff06aa0a2d67, 
      true
    ) 
    USING TTL 5;
  "  

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "
  INSERT INTO ks_ybu.tbl_wishlists_by_user(
      user_id, 
      name, 
      wishlist_id,
      is_public
    ) VALUES (
      'Mark', 
      'Grocery', 
      2a70494e-6b68-4739-b3e0-ff06aa0a2d67, 
      true
    ) 
    USING TTL 5;
  "  

  ./bin/ycqlsh --execute "
    SELECT * FROM ks_ybu.tbl_todolists_by_user;
  "  

Wait five seconds before running the following cell. This will verify that the row level data expiration property run successfully.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH
   
./bin/ycqlsh --execute "
  SELECT * FROM ks_ybu.tbl_todolists_by_user;
" 

If the row is still visible, wait a few more seconds to run the preceding cell. This will verify that the row has expired as expected.

---
# All done!
In this lab, you completed the following:

- Setup
  - Created the `ks_ybu` database with `ycqlsh`
  - Created tables and loaded data using DDL and DML scripts