<div style="width:100%; background-color: #121017;"><a target="_blank" href="http://university.yugabyte.com?utm_source=gitpod&utm_medium=notebook"><img src="assets/YBU_Logo.png" /></a></div><br>

> **YugabyteDB YCQL Development**
>
> Enroll for free at [Yugabyte University](https://university.yugabyte.com/courses/yugabytedb-ycql-development?utm_source=gitpod&utm_medium=notebook).
>
<br>
This notebook file is:

`05_JSONB.ipynb`


# JSONB
The Yugabyte Cloud Query Language API supports the JSONB data type. 

The JSONB data type stores valid JavaScript Object Notation (JSO) data in a serialized, binary format. 

A table column of type JSONB cannot be part of the primary key, including the partition key or clustering key.
For a JSONB column value, YugabyteDB automatically sorts the JSON keys. A column of the type JSONB exists in DocDB as a subdocument just like any other non-primary key, regular column. A value of the type JSONB is not implicitly convertible to another data type. However, you can convert a text value that is in a valid JSON format to JSONB. It is also possible to compare a valid JSON text value to a JSONB value. 

In this lab, you will learn about how to perform common DML operations with JSONB as well as how to create a secondary index for JSONB data.

## 🛠️ Requirements
Here are the requirements for this notebook:
- ✅ Create the notebook variables in `01_Introduction.ipynb`, which you did previously
- ✅ Create the `ks_ybu` keyspace in `02_Language_fundamentals.ipynb`, which you did previously
- ✅ Complete `03_QDDM_query_plans.ipynb`, which you did previously
- ✅ Complete `04_QDDM_secondary_indexes.ipynb`, which you did previously
- ☑️ Select the **Python 3.11.8** for the notebook, *which you need to select right now!!!*
- ☑️ Import the notebook variables, *which you must do next*
- ☑️ Confirm the existence of the `ks_ybu` keyspace and the child tables, *which you must do next*


### Select your notebook kernel
- In the Notebook toolbar, click **Select Kernel**.
<br>
<img width=50% src="assets/01_01_Select_Kernel_Toolbar.png" />

- Next, in the dropdown, select **Python 3.11.8**.
<br>
<img width=50% src="assets/01_02_Select_Kernel_Dropdown.png" />

> 👉 **IMPORTANT!** 👈
> 
> You must select Python **Python 3.11.8**. 
> 
> Do **NOT** select _Python 3.12_ or higher!!! 
>


That's it!

## ⛑️ Getting help
The best way to get help from the Yugabyte University team is to post your question on YugabyteDB Community Slack in the #training channels. To sign up, visit [YugabyteDB Community Slack](https://join.slack.com/t/yugabyte-db/shared_invite/zt-xbd652e9-3tN0N7UG0eLpsace4t1d2A?utm_source=gitpod&utm_medium=notebook).

## 👣 Setup steps
Here are the steps to setup this lab:
- Create the notebook styles
- Import the notebook variables
- Confirm the existence of the  `ks_ybu` keyspace

### 👇 Create the notebook styles

In [None]:
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

### 👇 Create the notebook variables 

> 👉 **IMPORTANT!** 👈
> 
> Do **NOT** skip running this cell. 
> 

The following Python cell creates and stores variables that all the notebooks in this lab will use. You can view these variables in the Jupyter tab.

- To run the script, select Execute Cell (Play Arrow) in the left gutter of the cell.
- Verify the accuracy of the output values

👇 👇 👇 

In [None]:
# Use %store -r to read 01_Lab_Setup variables
%store -r

### Confirm the existence of the `ks_ybu` keyspace and the child tables
You created...
- the keyspace in the `02_Language_fundamentals.ipynb` notebook
- tables in the  `03_QDDM_query_plans.ipynb` notebook

Run the following cell to describe the keyspace.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME" # describe the keyspace
YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

# sDB_NAME=ks_ybu
./ycqlsh -r -e "
  describe keyspace $DB_NAME;
"

> 🤔 Question:
>  
> Does the `ks_ybu` keyspace exist?
> 
> If not, go back to the  `02_Language_fundamentals.ipynb` notebook and create the `ks_ybu` keyspace!
>

> 🤔 Question:
>  
> Does the `ks_ybu` keyspace have tables for products and wishlists?
> 
> If not, go back to the `03_QDDM_query_plans.ipynb` notebook and create the tables!
 

---
## JSONB key path operators

Using JSONB path operators, you can select JSONB key-values, insert new values, and modify existing values. This makes it easy to change the JSON schema.

There are two types of key path operators available to YCQL. The key name in a key path is case sensitive. The `->` operator returns the JSON for the key. The `->>` operator returns the value for the key. There are two ways you can use key path operators in a select statement: the select clause and the where clause.

Review the following cell to see how query utilizes the JSONB operators in the select clause and the where clause. Then, run the following cell to view the query results.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Query using key path operators  
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select category, product_name, product_id, price, 
    (sku_details->>'sku') as sku, 
    (sku_details->'colors'->>0) as color0,
    (sku_details->'colors'->>1) as color2,
    (sku_details->>'kid_friendly') as kid_friendly
  from tbl_products_by_category
  where  (sku_details->'tags'->>0) = 'baby'
  and (sku_details->>'kid_friendly')= 'true'
  ;
"

## DML: `insert` a row with a JSONB value
You can use an insert statement to add JSON data to a table. You need to provide a table name, a list of columns, and a list of values. The column list must contain the primary key columns for the table and the JSONB column. This includes one or more partition keys and any clustering keys. 

The JSONB column must be a non-primary key, regular column. The value list must match the column list in terms of both the number of values and the data type. The value of the JSON must be in a valid JSON data format. 

By default, an insert statement exhibits upsert behavior. An upsert does one of two things. If the row does not exist, it adds a new row to the table. If the row already exists, it updates the existing row. A primary key collision determines that the row already exists. 

Depending on the application and related drivers, the JSON data may need to include backslash escaping for double quotes and other special characters. 

Run the following cell to insert a row into the table.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # insert
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  insert into tbl_products_by_category (category, product_name, product_id, brand, price, discount, description, gtin, sku_details) 
  values ('H20','Talc 10',62373,'Yeah',19.99,7,'2 liter ','006237337326','{\"sku\":\"YH_62373\",\"country\":\"UK\",\"tags\":[\"water\",\"bottle\",\"everyday\",\"workout\"],\"colors\":[\"blue\",\"green\",\"orange\",\"red\"],\"dimensions\":{\"dm_unit\":\"in\",\"dm_length\":\"19\",\"dm_width\":\"19\",\"dm_height\":\"19\"}, \"kid_friendly\":true}');
 "

Confirm the insertion of the new row.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Query plan using key path operators
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select category, product_name, product_id, brand, sku_details
  from tbl_products_by_category
  where (sku_details->>'sku')='YH_62373' 
  ;
"

> 🤔 Question:
>
> What is the order of the keys in the `sku_details` column?
>
> 🙋 Answer:
>
> The keys are lexicographically ordered.

## DML: `update` JSONB with a new key-value pair
With an update statement, you can modify a JSONB value for a single row in a table. The update statement requires a table name, a set clause, and a where clause. The set clause specifies the JSONB column to update with a given value. An update statement requires a where clause. The where clause contains a conditional expression. The expression must contain all the primary keys in the table. 

Using an update statement, the set clause can specify the entire JSON value. You can also use the key path operators in the set clause. With the key path operators, you can either add a new key-value pair to the JSON object or modify an existing value of a key.

Run the following to add a new key-value pair.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Update and add new key-value pair
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
 update tbl_products_by_category
   set sku_details->'rating' = '3'
   where category = 'H20'
     and product_name = 'Talc 10'
     and product_id = 62373 
    ;
 "

Confirm the new key-value pair.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Query using key path operators 
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select category, product_name, product_id, brand, sku_details->'rating' as rating
  from tbl_products_by_category
  where (sku_details->>'sku')='YH_62373' 
  ;
"

## DML: `update` a value for a key in JSONB
Using an update statement, the set clause can specify the entire JSON value. You can also use the key path operators in the set clause. With the key path operators, you can either add a new key-value pair to the JSON object or modify an existing value of a key.

Run the following to add a modify a value for a key.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Update a value for a key 
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
 update tbl_products_by_category
   set sku_details->'rating' = '5'
   where category = 'H20'
     and product_name = 'Talc 10'
     and product_id = 62373 
    ;
 "

Confirm the update.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Query with key path operators
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select category, product_name, product_id, brand, sku_details->'rating' as rating
  from tbl_products_by_category
  where (sku_details->>'sku')='YH_62373' 
  ;
"

## DML: `update` the entire JSONB value
Using an update statement, the set clause can specify the entire JSON value. 

Run the following to modify the entire column value.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Update entire row JSONB with escape for double quotes
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "

  update tbl_products_by_category
   set sku_details = '{\"sku\":\"YH_62373\",\"country\":\"UK\",\"tags\":[\"water\",\"bottle\",\"everyday\",\"workout\"],\"colors\":[\"purple\",\"green\",\"orange\",\"red\"],\"dimensions\":{\"dm_unit\":\"in\",\"dm_length\":\"18\",\"dm_width\":\"18\",\"dm_height\":\"18\"}, \"kid_friendly\":false,  \"rating\":4}'
   where category = 'H20'
     and product_name = 'Talc 10'
     and product_id = 62373  
  ;
 "

Confirm the update. 

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Query with key path operators
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select category, product_name, product_id, brand, sku_details
  from tbl_products_by_category
  where (sku_details->>'sku')='YH_62373' 
  ;
"

---
## Secondary index for JSONB

To see how a secondary index for JSONB can improve query performance, begin by creating an query plan for a query that uses a JSONB key path operator in the predicate where expression.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Query plan: Sequential scan  
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  explain select category, 
     product_name, 
     product_id,
     brand,
     description,
     price, 
     gtin, 
     sku_details->> 'sku' as sku
  from tbl_products_by_category
  where
    sku_details ->> 'sku' = '?'
  ;
"

A query that results in a sequential scan of a very large table can be very costly in YugabyteDB. For a sequential scan operation, each tablet must perform a seek operation. A seek operation requires CPU, memory, and disk operations. A seek operation consists of reading data from a SST file. The SST file contains the DocKey and document value. 

Coordinating the tablet operations and gathering tablet results require additional network, CPU, and memory consumption. The topology of a cluster can increase the network latency related to coordinating tablet operations and gathering tablet results. For these reasons, a sequential scan consumes numerous computing resources and often results in very poor query performance, especially for very large tables with billions of rows.

> 📝 Note
>
> Unlike a where expression with a regular data type column, the query plan does note show a *Filter* operation.

### JSONB index with key path operators
For a table with the transactions property enabled, you can create a secondary index for a JSONB column using key path operators. You can only use key path operators for the index key including both the definition of the partition key and any clustering keys. The operators must return a value, not a JSON object. It is not possible to declare a composite partition key. It is also not possible to include a JSONB column in an include column or where expression, even when using key path operators to return a value.

Using key path operators, create an index for the required keys in `sku_details`.

First, drop the index if it exists.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Drop the index 
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  drop index if exists idx_products_by_catgegory_jsonb
  ;
"

Next, create the index.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # create the index for jsonb
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  create index if not exists idx_products_by_catgegory_jsonb
  on tbl_products_by_category ( (
     sku_details->>'sku'
      ) )
  include (
    brand,
    price,
    description,
    gtin
    )
  ;
"

The above index is a covering index for the initial query plan. The term, covering index, describes a secondary index that a query uses to only access the index and not the table. 

View the query plan again to validate the use of the index.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Query with key path operators uses index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  explain select category, 
     product_name, 
     product_id,
     brand,
     description,
     price, 
     gtin, 
     sku_details->> 'sku' as sku
  from tbl_products_by_category
  where
    sku_details ->> 'sku' = '?'
  ;
"

This query plan contains one node. The action is an index only scan. Key Conditions is a sub-action that specifies the use of the partition key for the index. The equality operator indicates that the internal operation is to locate a specific partition key on a single tablet. The index contains the partition key for the index, the primary key columns of the index table, and any included columns from the index table.

A query plan with an index-only-scan operation accesses a single tablet for the index. The seek operation reads data from the SST file for the index tablet. Depending on the number of products for a given brand, there may be multiple seeks. However, because brand is the partition key for this index, the query locates the tablet with the partition key hash almost immediately. The partition key hash for the index is the DocKey hash. The DocKey consists of the index partition key hash, the index partition key, and clustering keys. The clustering keys are the primary key columns of the index table. The DocKey maps to subdocuments. The subdocuments in this example are for the included columns: price and description. Each subdocument contains a column value. The covering index efficiently processes the query.

To learn why the secondary index for JSONB is efficient for this query, you can perform an SST dump for the index itself. 

#### Select a YB-TServer host
<a id="select-a-yb-tserver-host-1"> </a>
Set the host variable for one of the nodes. All three nodes in the cluster are running a Tablet Server (YB-TServer). You can comment/uncomment lines 7-9 as needed.

In [None]:
%%bash -s "$NB_HOST_IPv4_01" "$NB_HOST_IPv4_02" "$NB_HOST_IPv4_03" --out NB_HOST_IPv4
HOST_IPv4_01=$( echo "${1}" | tr -d " ")
HOST_IPv4_02=$( echo "${2}" | tr -d " ")
HOST_IPv4_03=$( echo "${3}" | tr -d " ")

# change the hosts for different tablet leaders by commenting out a line and removing a comment for a line
MY_HOST_IPv4=$HOST_IPv4_01
#MY_HOST_IPv4=$HOST_IPv4_02
#MY_HOST_IPv4=$HOST_IPv4_03

echo ${MY_HOST_IPv4}

Store the select host variable.

In [None]:
%store NB_HOST_IPv4
print(NB_HOST_IPv4)

Save the `OBJET_NAME` as a variable.

In [None]:
NB_OBJECT_NAME="idx_products_by_catgegory_jsonb"
%store NB_OBJECT_NAME
print(NB_OBJECT_NAME)

Grep the `INDEX_ID` for the index using `curl` and `jq`.

In [None]:
%%bash -s "$NB_OBJECT_NAME" "$NB_HOST_IPv4"  "$NB_DB_NAME"  "$NB_TSERVER_WEBSERVER_PORT"  --out NB_INDEX_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
DB_NAME=$( echo "${3}" | tr -d " ")
TSERVER_WEBSERVER_PORT=$( echo "${4}" | tr -d " ")


MY_URL="http://${HOST_IPv4}:${TSERVER_WEBSERVER_PORT}/metrics"

MY_INDEX_ID=`curl -s --compressed ${MY_URL} | jq -r 'limit(1;  .[] | select(.attributes.namespace_name=="'${DB_NAME}'" and .type=="tablet" and .attributes.table_name=="'${OBJECT_NAME}'") |  .attributes.table_id) '`

echo ${MY_INDEX_ID}

Store the `INDEX_ID` for the index.

In [None]:
%store NB_INDEX_ID
print(NB_INDEX_ID)

Get the `TABLET_ID` for the tablet leader for the select node host.

In [None]:
%%bash -s "$NB_OBJECT_NAME" "$NB_HOST_IPv4" --out NB_INDEX_TABLET_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")

MY_URL="http://${HOST_IPv4}:8200/metrics"

MY_INDEX_TABLET_ID=`curl -s --compressed ${MY_URL} | jq --raw-output ' .[] | select(.attributes.namespace_name=="ks_ybu" and .type=="tablet" and .attributes.table_name=="'$OBJECT_NAME'") | {tablet_id: .id, metrics: .metrics[] | select(.name == ("is_raft_leader") ) | select(.value == 1) } | select(.tablet_id) | {tablet_id} | .tablet_id '`

echo ${MY_INDEX_TABLET_ID}

Store the `TABLET_ID` for the tablet leader.

In [None]:
%store NB_INDEX_TABLET_ID
print(NB_INDEX_TABLET_ID)

Flush the WAL file to a SST file for the given index_id.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_HOST_IPv4" "$NB_INDEX_ID"  # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
INDEX_ID=$( echo "${3}" | tr -d " ")
cd $YB_PATH

./yb-admin -init_master_addrs ${HOST_IPv4}:7100 flush_table_by_id ${INDEX_ID} 600

Dump and decode the SST file in human-readable form.

> 📝 Note
>
> If the following does **NOT** dump the SST file, it is most likely that there are not any rows written to this tablet. To resolve this issue, you need to select a different Tablet Server host. 
> 
> Return back to [Select a YB-TServer host](#select-a-yb-tserver-host-1) and select a different node host by commenting out (add a `#` sign) to line 7 and uncomment out (remove the`#` sign) line 8 or line 9.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_YB_PATH_DATA" "$NB_INDEX_ID" "$NB_INDEX_TABLET_ID" # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
YB_PATH_DATA=$( echo "${2}" | tr -d " ")
INDEX_ID=$( echo "${3}" | tr -d " ")
INDEX_TABLET_ID=$( echo "${4}" | tr -d " ")

cd $YB_PATH

INDEX_ID_PATH=${YB_PATH_DATA}/node1/data/yb-data/tserver/data/rocksdb/table-${INDEX_ID}/tablet-${INDEX_TABLET_ID}

# ls -l  ${TABLE_ID_PATH}

./sst_dump --command=scan --file=${INDEX_ID_PATH} --output_format=decoded_regulardb 

> 🤔 Question:
>
> What is in the DocKey for this index tablet?
>
> 🙋 Answer:
>
> The DocKey contains the partition key hash, the partition key, and the clustering keys. The partition key is the `sku` value for the index key path of the JSONB column, `sku_details`. The partition key hash is the hash of this value. The clustering keys are the primary key columns of the index table.
>
>
> 🤔 Question:
>
> Why does the query plan show an Index Only Scan?
>
> 🙋 Answer:
>
> The query plan shows an Index Only Scan because the secondary index covers all the columns in the index. In addition to the index partition key and index table primary key columns, the SST dump of the index shows the included columns. Together, these columns cover the query.

---
# 🌟🌟🌟🌟🌟 All done!

You completed this notebook:

- JSONB
  - Requirements
  - JSONB key path operators
  - DML: `insert` a row with a JSONB value
  - DML: `update` a value for a key in JSONB
  - DML: `update` the entire JSONB value
  - Secondary index for JSONB

And, you completed all the notebooks in this YCQL lab. 

You are now ready for the YCQL Development Certification exam!


## 😊 What's Next ???

That's it! 

You're done, but make sure to give yourself credit. 

Return to the course player, and mark this lab as complete!

> **YugabyteDB YCQL Development**
>
> Enroll for free at [Yugabyte University](https://university.yugabyte.com/courses/yugabytedb-ycql-development?utm_source=gitpod&utm_medium=notebook).
>