<div style="width:100%; background-color: #121017;"><a target="_blank" href="http://university.yugabyte.com?utm_source=gitpod&utm_medium=notebook"><img src="assets/YBU_Logo.png" /></a></div><br>

> **YugabyteDB YCQL Development**
>
> Enroll for free at [Yugabyte University](https://university.yugabyte.com/courses/yugabytedb-ycql-development?utm_source=gitpod&utm_medium=notebook).
>
<br>
This notebook file is:

`03_QDDM_query_plans.ipynb`

# Query plans
In this notebook, you will learn about an essential aspect of creating a query-drive data model in YCQL: query plans.

## 🛠️ Requirements
Here are the requirements for this notebook:
- ✅ Create the notebook variables in `01_Introduction.ipynb`, which you previously did
- ✅ Create the `ks_ybu` keyspace in `02_Language_fundamentals.ipynb`, which you previously did
- ☑️ Select the **Python 3.11.8** for the notebook, *which you need to select right now!!!*
- ☑️ Import the notebook variables, *which you must do next*
- ☑️ Confirm the existence of the `ks_ybu` keyspace, *which you must do next*
- ☑️ Run the DDL and DML scripts, *which you must do next*


### Select your notebook kernel
- In the Notebook toolbar, click **Select Kernel**.
<br>
<img width=50% src="assets/01_01_Select_Kernel_Toolbar.png" />

- Next, in the dropdown, select **Python 3.11.8**.
<br>
<img width=50% src="assets/01_02_Select_Kernel_Dropdown.png" />

> 👉 **IMPORTANT!** 👈
> 
> You must select Python **Python 3.11.8**. 
> 
>  Do **NOT** select _Python 3.12_ or higher!!! 
>


That's it!

## ⛑️ Getting help
The best way to get help from the Yugabyte University team is to post your question on YugabyteDB Community Slack in the #training channels. To sign up, visit [YugabyteDB Community Slack](https://join.slack.com/t/yugabyte-db/shared_invite/zt-xbd652e9-3tN0N7UG0eLpsace4t1d2A?utm_source=gitpod&utm_medium=notebook).

## 👣 Setup steps
Here are the steps to setup this lab:
- Create the notebook styles
- Import the notebook variables
- Confirm the existence of the  `ks_ybu` keyspace

### 👇 Create the notebook styles

In [None]:
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

### 👇 Create the notebook variables 

> 👉 **IMPORTANT!** 👈
> 
> Do **NOT** skip running this cell. 
> 

The following Python cell creates and stores variables that all the notebooks in this lab will use. You can view these variables in the Jupyter tab.

- To run the script, select Execute Cell (Play Arrow) in the left gutter of the cell.
- Verify the accuracy of the output values

👇 👇 👇 

In [None]:
# Use %store -r to read 01_Lab_Setup variables
%store -r

### Confirm the existence of the  `ks_ybu` keyspace
You created this in the `02_Language_fundamentals.ipynb` notebook. Confirm that it exists.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # describe the keysapce
YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -e "
  describe keyspace $DB_NAME;
"

> 🤔 Question:
>  
> Does the keyspace exist?
> 
> If not, go back and  `02_Language_fundamentals.ipynb` notebook and create the `ks_ybu` keyspace!
>

Drop the existing tables in the keyspace.


In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME" # Drop the table
YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  drop table if exists tbl_employees;
"

> About `ycsqlsh` flags
>
> `-e` or `--execute` allows you to execute a given statement and then exit. This is useful for running YCQL commands non-interactively from scripts or the command line.
>
>  `-k` or `--keyspace` allows you to specify a keyspace to authenticate to when connecting to YugabyteDB. It should be used in conjunction with the --user flag when authentication is required.
>
>
> `-r` or `--refresh_on_describe` allows you to force a refresh of the schema metadata when using the DESCRIBE command in ycqlsh. This ensures you get the most up-to-date schema information when describing keyspaces, tables, or other database objects.
>


### Run the DDL and DML scripts
The DDL script creates tables for the Query-drive data model (QQDM). The DML script populates the tables.

Run the following cells to create the tables and load the data.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME" "$NB_NOTEBOOK_DATA_FOLDER" "$NB_DATA_DDL_FILE"  "$NB_DATA_DML_FILE"  
# data directory
YB_PATH=${1}
DB_NAME=${2}
DATA_FOLDER=${3}
DATA_DDL_FILE=${4}
DATA_DML_FILE=${5}

WISHLIST_DDL_PATH=${DATA_FOLDER}/${DATA_DDL_FILE}
WISHLIST_DML_PATH=${DATA_FOLDER}/${DATA_DML_FILE}

cd $YB_PATH

# DDL file
./ycqlsh -k ${DB_NAME} -f ${WISHLIST_DDL_PATH} 

sleep 1;

# DML file
./ycqlsh -k ${DB_NAME} -f ${WISHLIST_DML_PATH} 

sleep 1;

Show the tables for the keyspace.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME" # describe the tables from the DDL script
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  describe tables;
"

Validate the data was loaded properly from the SQL scripts by running the following cells.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME" "$NB_NOTEBOOK_DATA_FOLDER" "$NB_DATA_DML_FILE"   
# query
YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

# DB_NAME=ks_ybu , count = 15
./ycqlsh -r -k $DB_NAME -e "
  select count(*) from tbl_products_by_brand;
"

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME" "$NB_NOTEBOOK_DATA_FOLDER" "$NB_DATA_DML_FILE"   
# query
YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

# DB_NAME=ks_ybu , count = 15
./ycqlsh -r -k $DB_NAME -e "
  select count(*) from tbl_products_by_category;
"

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME" "$NB_NOTEBOOK_DATA_FOLDER" "$NB_DATA_DML_FILE"   

YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu , count = 9
./ycqlsh -r -k $DB_NAME -e "
  select count(*) from tbl_products_by_wishlist;
"

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME" "$NB_NOTEBOOK_DATA_FOLDER" "$NB_DATA_DML_FILE"   

YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu , count = 7
./ycqlsh -r -k $DB_NAME -e "
  select count(*) from tbl_wishlists_by_user;
"

---
## Execution plans

In YCQL, an execution plan, known as a query plan, is similar in style to a YSQL query plan. A query plan consists of execution nodes. A plan node represents an action and may include one or more sub-actions. An action refers to a specific, internal operation. Nodes can be nested. Nested nodes are executed from the inside out. This means that the innermost node is executed before an outer node. This can be best thought of as a nested function call where the inner node returns its results to the outer node, often in a loop.

To see an example of a query plan, run the following cell:

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Query plan: Sequential scan with filter
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

./ycqlsh -r -k $DB_NAME -e "
  select sum(quantity) as total_items
  from tbl_products_by_wishlist
  where quantity > 1;
"

./ycqlsh -r -k $DB_NAME -e "
  explain select sum(quantity) as total_items
  from tbl_products_by_wishlist
  where quantity > 1;
"    

In the above output, the query plan shows that this is an aggregate query. An aggregate query illustrates a nested node. The innermost node is the sequence scan operation. The sequence scan applies a filter sub-action for a conditional expression. The sequence scan operation returns the subtotal value to the outermost operation, the aggregation operation.


### Query plan: Sequence scan
A query that results in a sequential scan of a very large table can be very costly in YugabyteDB. For a sequential scan operation, each tablet must perform a seek operation. A seek operation requires CPU, memory, and disk operations. A seek operation consists of reading data from a SST file. The SST file contains the DocKey and document value. The SST file format consists of data and meta blocks. A sequential scan reads all the data blocks within a SST file. 

Coordinating the tablet operations and gathering tablet results require additional network, CPU, and memory consumption. The topology of a cluster can increase the network latency related to coordinating tablet operations and gathering tablet results. For these reasons, a sequential scan consumes numerous computing resources and often results in very poor query performance, especially for very large tables with billions of rows.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Query plan: Sequential scan  
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_products_by_category;
"  

./ycqlsh -r -k $DB_NAME -e "
  explain select * from tbl_products_by_category;
"  

### Query plan: Range scan
A range scan operation often returns multiple rows and is not guaranteed to return zero rows or one row. The equality operator indicates that operation is for a single partition query. All products that belong to a specific category reside on a single tablet. A seek operation consists of reading data from the SST files of a tablet. The number of seeks for a range scan is often far less than that of a sequential scan. In this case, the number of seeks may vary by the number of products related to the category. Each seek operation requires CPU, memory, and disk usage. In most cases, a range scan query is efficient and scalable. However, for a popular product category with millions of related products, this query may be problematic. One tablet may be accessed significantly more than the other tablets. This is known as a hot tablet or hot shard query. Depending on query frequency and the volume of data, a hot tablet query can quickly turn into a hot node problem. A hot node is a node that consistently utilizes a high percentage of computing resources.


In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query plan: range function
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

./ycqlsh -r -k $DB_NAME -e "
  select category, product_name, product_id, brand, price, discount, description, gtin
  from tbl_products_by_category 
  where category = 'H20';
" 

./ycqlsh -r -k $DB_NAME -e "
  explain select category, product_name, product_id, brand, price, discount, description, gtin
  from tbl_products_by_category 
  where category = 'H20';
"  

### Query plan: Primary key lookup
In this following example, the where expression contains three conditional expressions. Each conditional expression contains a primary key column and an equality operator. Because all three parts of the primary key are in the where expression and include equality operators, the query is a primary key lookup query. A primary key lookup is guaranteed to return zero or one row.


In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Query plan: PK lookup
YB_PATH=${1}

DB_NAME=${2}  
cd $YB_PATH

./ycqlsh -r -k $DB_NAME -e "
  select category, product_name, product_id, brand, price, discount, description, gtin
  from tbl_products_by_category 
  where category = 'H20'
    and product_name = 'Talc 5'
    and product_id = 62362;
"  

./ycqlsh -r -k $DB_NAME -e "
  explain select category, product_name, product_id, brand, price, discount, description, gtin
  from tbl_products_by_category 
  where category = 'H20'
    and product_name = 'Talc 5'
    and product_id = 62362;
"    

### Primary key lookup: Tablet
A primary key lookup operation uses the DocKey to determine the location of the row. The partition key hash encoded value of the DocKey informs the operation. The query reads from the tablet that contains the partition key hash. The clustering keys order the data on disk for the given encoded hash. The primary key lookup performs a single seek in the SST file of the related tablet in RocksDB. This is known as a single-key query. The Key-Conditions and Filter sub-actions are semantic artifacts of the query planner. There are no related operations for these sub-actions. 

#### Select a YB-TServer host
<a id="select-a-yb-tserver-host-1"> </a>
Set the host variable for one of the nodes. All three nodes in the cluster are running a Tablet Server (YB-TServer). You can comment/uncomment lines 7-9 as needed.

In [None]:
%%bash -s "$NB_HOST_IPv4_01" "$NB_HOST_IPv4_02" "$NB_HOST_IPv4_03" --out NB_HOST_IPv4
HOST_IPv4_01=$( echo "${1}" | tr -d " ")
HOST_IPv4_02=$( echo "${2}" | tr -d " ")
HOST_IPv4_03=$( echo "${3}" | tr -d " ")

# change the hosts for different tablet leaders by commenting out a line and removing a comment for a line
NB_HOST_IPv4=$HOST_IPv4_01
#NB_HOST_IPv4=$HOST_IPv4_02
#NB_HOST_IPv4=$HOST_IPv4_03

echo ${NB_HOST_IPv4}

Store the select host variable.

In [None]:
%store NB_HOST_IPv4
print(NB_HOST_IPv4)

Save the **OBJECT_NAME** as a variable.

In [None]:
NB_OBJECT_NAME="tbl_products_by_category"
%store NB_OBJECT_NAME
print(NB_OBJECT_NAME)

Grep the **TABLE_ID** for the the table using `curl` and `jq`.

In [None]:
%%bash -s "$NB_OBJECT_NAME" "$NB_HOST_IPv4"  "$NB_DB_NAME"  "$NB_TSERVER_WEBSERVER_PORT"  --out NB_TABLE_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
DB_NAME=$( echo "${3}" | tr -d " ")
TSERVER_WEBSERVER_PORT=$( echo "${4}" | tr -d " ")


MY_URL="http://${HOST_IPv4}:${TSERVER_WEBSERVER_PORT}/metrics"

TABLE_ID=`curl -s --compressed ${MY_URL} | jq -r 'limit(1;  .[] | select(.attributes.namespace_name=="'${DB_NAME}'" and .type=="tablet" and .attributes.table_name=="'${OBJECT_NAME}'") |  .attributes.table_id) '`

echo ${TABLE_ID}

Store the **TABLE_ID** for the table.

In [None]:
%store NB_TABLE_ID
print(NB_TABLE_ID)

Get the **TABLET_ID** for the tablet leader for the select node host.

In [None]:
%%bash -s "$NB_OBJECT_NAME" "$NB_HOST_IPv4" --out NB_TABLET_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")

MY_URL="http://${HOST_IPv4}:8200/metrics"

TABLET_ID=`curl -s --compressed ${MY_URL} | jq --raw-output ' .[] | select(.attributes.namespace_name=="ks_ybu" and .type=="tablet" and .attributes.table_name=="'$OBJECT_NAME'") | {tablet_id: .id, metrics: .metrics[] | select(.name == ("is_raft_leader") ) | select(.value == 1) } | select(.tablet_id) | {tablet_id} | .tablet_id '`

echo ${TABLET_ID}

Store the **TABLET_ID** for the tablet leader.

In [None]:
%store NB_TABLET_ID
print(NB_TABLET_ID)

Flush the WAL file to a SST file for the given table_id.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_HOST_IPv4" "$NB_TABLE_ID"  # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
TABLE_ID=$( echo "${3}" | tr -d " ")
cd $YB_PATH

./yb-admin -init_master_addrs ${HOST_IPv4}:7100 flush_table_by_id ${TABLE_ID} 600

Dump and decode the SST file in human-readable form.

> 📝 Note 
>
> If the following does **NOT** dump the SST file, it is most likely that there are not any rows written to this tablet. To resolve this issue, you need to select a different Tablet Server host. 
> 
> Return back to [Select a YB-TServer host](#select-a-yb-tserver-host-1) and select a different node host by commenting out (add a `#` sign) to line 7 and uncomment out (remove the`#` sign) line 8 or line 9.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_YB_PATH_DATA" "$NB_TABLE_ID" "$NB_TABLET_ID"  # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
YB_PATH_DATA=$( echo "${2}" | tr -d " ")
TABLE_ID=$( echo "${3}" | tr -d " ")
TABLET_ID=$( echo "${4}" | tr -d " ")

cd $YB_PATH/

TABLE_ID_PATH=${YB_PATH_DATA}/node1/data/yb-data/tserver/data/rocksdb/table-${TABLE_ID}/tablet-${TABLET_ID}
ls -l  ${TABLE_ID_PATH}

./sst_dump --command=scan --file=${TABLE_ID_PATH} --output_format=decoded_regulardb

> 🤔 Question:
>
> The query predicate is:
> 
> ```where category = 'H20' and product_name = 'Talc 5' and product_id = 62362```
> 
> How can you find the partition key hash in the SST dump for the query predicate?
> 
> 🧩 Hint:
>
> Return back to [Select a YB-TServer host](#select-a-yb-tserver-host-1) and select a different node host by commenting out (add a `#` sign) to line 7 and uncomment out (remove the`#` sign) line 8 or line 9.
> 
> Then, rerun the cells to dump and decode the SST file for a different tablet leader.


---
# 🌟🌟🌟 Well done! 
In this notebook, you completed the following:
- Query plans
  - Requirements
  - Executions plans


## 😊 Next up!
Continue your learning by opening the next notebook, `04_Secondary_indexes.ipynb`. 

You can either open the file from the Explorer or simply run the following cell:

In [None]:
%%bash
gp open '04_QDDM_secondary_indexes.ipynb'