<div style="width:100%; background-color: #121017;"><a target="_blank" href="http://university.yugabyte.com?utm_source=gitpod&utm_medium=notebook"><img src="assets/YBU_Logo.png" /></a></div><br>

> **YugabyteDB YCQL Development**
>
> Enroll for free at [Yugabyte University](https://university.yugabyte.com/courses/yugabytedb-ycql-development?utm_source=gitpod&utm_medium=notebook).
>

<br>
This notebook file is:

`02_Language_fundamentals.ipynb`

# Language fundamentals
This notebook showcases various Data Definition Language (DDL) and Data Manipulation Language (DML) statements for Yugabyte Cloud Query Language (YCQL).

YCQL is inspired by Apache Cassandra. YCQL is one of two Yugabyte Query Language APIs for YugabyteDB, the world's #1, open source, distributed SQL database.

In addition to learning about YCQL, a learning goal for this notebook is to also demystify how YugabyteDB stores data in its distributed document store, DocDB.

## 🛠️ Requirements
Here are the requirements for this notebook:
- ✅ Create the notebook variables in `01_Lab_Setup.ipynb`, which you previously did
- ☑️ Select the **Python 3.11.8** for the notebook, *which you need to select right now!!!*
- ☑️ Import the notebook variables, *which you must do next*
- ☑️ Create the `ks_ybu` keyspace, *which you must do next*

### Select your notebook kernel
- In the Notebook toolbar, click **Select Kernel**.
<br>
<img width=50% src="assets/01_01_Select_Kernel_Toolbar.png" />

- Next, in the dropdown, select **Python 3.11.8**.
<br>
<img width=50% src="assets/01_02_Select_Kernel_Dropdown.png" />

> 👉 **IMPORTANT!** 👈
> 
> You must select **Python 3.11.8**. 
> 
> Do **NOT** select _Python 3.12_ or higher!!! 
>


## ⛑️ Getting help
The best way to get help from the Yugabyte University team is to post your question on YugabyteDB Community Slack in the #training channels. To sign up, visit [YugabyteDB Community Slack](https://join.slack.com/t/yugabyte-db/shared_invite/zt-xbd652e9-3tN0N7UG0eLpsace4t1d2A?utm_source=gitpod&utm_medium=notebook).

## 👣 Setup steps
Here are the steps to setup this lab:
- Create the notebook styles
- Import the notebook variables

### 👇 Create the notebook styles

In [None]:
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

### 👇 Create the notebook variables 

> 👉 **IMPORTANT!** 👈
> 
> Do **NOT** skip running this cell. 
> 

The following Python cell creates and stores variables that all the notebooks in this lab will use. You can view these variables in the Jupyter tab.

- To run the script, select Execute Cell (Play Arrow) in the left gutter of the cell.
- Verify the accuracy of the output values

👇 👇 👇 

In [None]:
# Use %store -r to read 01_Lab_Setup variables
%store -r

---
## DDL: Commands for YCQL
Data definition language commands in YCQL allow you to create, modify, and delete objects. The parent object for YCQL is a keyspace. You create tables to store data.

### Create the `ks_ybu` keyspace
Run the following cells to connect to the YugabyteDB cluster using `ycqlsh`. Then, complete the following tasks:
- Create the `ks_ybu` keyspace 
- Create the `tbl_employees` table
- Describe the  `tbl_employees` table

Drop `ks_ybu` if it exists. If the keyspace has a table in it, the following will throw an error:

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"   # Drop the keyspace, ks_ybu
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -e "
  drop keyspace if exists $DB_NAME;
  "

Create the keyspace, `ks_ybu`.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"  # Create the keyspace, ks_ybu. 
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -e "
  create keyspace if not exists $DB_NAME;
  "

Confirm the keyspace creation.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME" # describe the keyspace
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -e "
  describe keyspace $DB_NAME;
"

### Create the `tbl_employees` table

If the table already exists, the command will drop the table.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Drop the table if it exists
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  drop table if exists tbl_employees;
  "

Create the table, `tbl_employees`. 

> Note: A primary key is required for all user tables in a keyspace.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Create the table
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME  -e "
  create table if not exists tbl_employees ( 
    id int, 
    full_name text, 
    email text,
    year int,
    primary key (id, full_name) 
    );
  "

Describe `tbl_employees`. 

>🤔  Question:
>   
> What is different about the table description and the create table statement?

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"  # Describe the table
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH
# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  desc tbl_employees;
"

> 🙋 Answer:
>   
> The description of the table reveals that the primary key consists of two key columns. The first key column is `id`. The second key column is `full_name`. 
> 
> `id` is the partition key for the table. The clustering key column is `full_name`. 
> 
> Ascending is the order of the clustering key column.

---
## DML: Write with `insert`

Insert rows into the `tbl_employees` table.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"   # Populate the employees table
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH
# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME  -e "
  insert into tbl_employees (id, full_name, email, year) values (1, 'Bruce Wayne', 'batman@yb.com', 2020);
  insert into tbl_employees (id, full_name, email, year) values (2, 'Dick Grayson', 'robin@yb.com', 2020);
  insert into tbl_employees (id, full_name, email, year) values (4, 'Clark Kent', 'superman@yb.com', 2021);
  insert into tbl_employees (id, full_name, email, year) values (5, 'Kara Zor-El' ,'supergirl@yb.com', 2023); 
  insert into tbl_employees (id, full_name, email, year) values (6, 'Natalie Reed','ladyblackhawk@yb.com',2020);
  insert into tbl_employees (id, full_name, email, year) values (7, 'Peter Parker', 'spiderman@yb.com', 2021);
  insert into tbl_employees (id, full_name, email, year) values (8, 'Diana Prince', 'wonderwoman@yb.com', 2021);
  insert into tbl_employees (id, full_name, email, year) values (9, 'Harold Jordan','greenlantern@yb.com', 2022);
  insert into tbl_employees (id, full_name, email, year) values (10, 'Carter Hal', 'hawkman@yb.com', 2022);
  insert into tbl_employees (id, full_name, email, year) values (11, 'Michael Holt', 'mrterrific@yb.com',2022);
"

Verify that the preceding inserted rows into the table.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"   # Query the table
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME  -e "
  select * from tbl_employees;
"

> 🤔 Question:
>
> In terms of the result order, what's interesting about the results above? How does it differ from the insert statement order?
>
> 🙋 Answer:
>
> The result order differs from the insertion statement.
>

---
## Partition key, partition key hash, and the `partition_hash()` built-in function
For every row in a table, the distributed document store of YugabyteDB saves a partition key, partition key hash, and any clustering keys as a `DocKey`. In just a few more notebook cells, you will learn more about a `DocKey` and `DocDB`.

The saved partition key hash is a hexadecimal value. The `partition_hash()` function is a special function that only works with partition hash columns. The function converts the hexadecimal value for a partition key hash into an integer value. The range of possible integer values for a partition key hash is from `0` to `65,535`.

In this exercise, using the built-in function, `partition_hash()`, you will examine the partition key and partition key hash for `tbl_employees`. 

To begin, review the DLL for `tbl_employees`:

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"  # Describe the table
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  desc tbl_employees;
"

The primary key for the table has two columns. The first column is the partition key column. The second column is the clustering key column. Using the  `partition_hash()` built-in function, you can view the integer value of the partition key hash.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_DB_NAME"  # query using partition_hash() 
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select partition_hash(id) as partition_key_hash, id, full_name from tbl_employees;
"

> 🤔 Question:
>
> In terms of the result order, what's interesting about the results above? How does it differ from the insert statement order?
>
> 🙋 Answer:
>
> The rows are sorted by the partition key hash! 
>

The `partition_hash()` function reveals, in part, how YugabyteDB stores data. As mentioned previously, for every row in a table, the distributed document store of YugabyteDB saves a partition key, partition key hash, and any clustering keys as a `DocKey`. In just a few more notebook cells, you will learn more about a `DocKey` and `DocDB`.

---
## DML: Upsert with `insert`

An `insert` statement exhibit an upsert behavior for a row with an existing primary key. 

A row already exists in `tbl_employees` for the primary key of `id=2` and `full_name='Dick Grasyon'`. The following statement will update the `email` and `year` columns for the row:

```
insert into tbl_employees (id, full_name, email, year) values (2, 'Dick Grayson', 'nightwingc@yb.com', 2022);
```

To begin, run the following `select` statement:

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"   # Query by id
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME  -e "
  select * from tbl_employees where id = 2;
"

Next, run the following `insert` statement:

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"   # insert into the table 
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME  -e "
  insert into tbl_employees (id, full_name, email, year) values (2, 'Dick Grayson', 'nightwingc@yb.com', 2022);
"

To view the results of the upsert, run the previous `select` statement.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"   # Query by id 
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

# DB_NAME=ks_ybu.
./ycqlsh -r -k $DB_NAME  -e "
  select * from tbl_employees where id = 2;
"

---
## Distributed Document Store (DocDB)
For every row in a table, the distributed document store of YugabyteDB saves a partition key, partition key hash, and any clustering keys as a `DocKey`. The partition key for the table determines how the consistent hash sharding algorithm of YugabyteDB distributes rows into tablets. A tablet is a customized version of RocksDB, a persistent key-value store.

A tablet in YCQL has at least one tablet leader per YB-TServer node in the cluster. The replication factor for the cluster determines the number of followers per tablet leader. 

Your YugabyteDB cluster in this Gitpod instance is a three node cluster. On each node there is a YB-Master services and a YB-TServer service. 

The cluster is also running with a replication factor of 3. This means that for each tablet leader, there is are two tablet followers.

Run the following cell to confirm this configuration:

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  # Status of cluster with yb-ctl.
YB_PATH=${1}
cd $YB_PATH

./yb-ctl status | grep -A 1 -B 1  "Replication Factor"

Your cluster also is running with a global flag that automatically configures the number of tablets per node per table at `1`.

In [None]:
%%bash -s "$NB_HOST_IPv4_01" 
HOST_IPv4=$( echo "${1}" | tr -d " ")
MY_URL="http://${HOST_IPv4}:7000/varz"

curl -s  ${MY_URL} | html2text | grep  yb_num_shards_per_tserver 

This means that the table, `tbl_employees`, automatically has 3 tablet leaders (one for each cluster node) and 6 tablet followers (2 for each leader). 

### DocDB, DocKey, and SubDocKey
A tablet leader persists data to disk in two forms: as a Write Ahead Log (WAL) file and then as a Sorted Sequence Table (SST) file. The WAL file for a tablet leader replicates to the tablet followers in its tablet-peer, raft consensus group. At a configurable size, the WAL file flushes to disk and persists as a SST file. As a SST file grows, it will undergo compaction using a universal compaction strategy. By default, tablet data grows to about 100 GB in size before a tablet splits.

The SST file stores row data in a particular schema for RocksDB. This schema for YugabyteDB is known as DocDB. It consists of a DocKey and document values.  A DocKey is made up of a partition key hash, a partition key value, and any cluster key values. Document values are sub-keys of the DocKey. A SubDocKey contains a column id, a hybrid-logical clock time, a write order, and the actual value. Here is an example:

```
SubDocKey(DocKey(0x0a73, [5], ["Kara Zor-El"]), [SystemColumnId(0); HT{ physical: 1672165185893549 }]) -> null
SubDocKey(DocKey(0x0a73, [5], ["Kara Zor-El"]), [ColumnId(2); HT{ physical: 1672165185893549 w: 1 }]) -> "supergirl@yb.com"
SubDocKey(DocKey(0x0a73, [5], ["Kara Zor-El"]), [ColumnId(3); HT{ physical: 1672165185893549 w: 2 }]) -> 2023
```

In this exercise, you will decode a SST file for a tablet leader for `tbl_employees`.

#### Select a YB-TServer host
<a id="select-a-yb-tserver-host-1"> </a>
Set the host variable for one of the nodes. All three nodes in the cluster are running a Tablet Server (YB-TServer). You can comment/uncomment lines 7-9 as needed.

In [None]:
%%bash -s "$NB_HOST_IPv4_01" "$NB_HOST_IPv4_02" "$NB_HOST_IPv4_03" --out NB_HOST_IPv4
HOST_IPv4_01=$( echo "${1}" | tr -d " ")
HOST_IPv4_02=$( echo "${2}" | tr -d " ")
HOST_IPv4_03=$( echo "${3}" | tr -d " ")

# change the hosts for different tablet leaders by commenting out a line and removing a comment for a line
HOST_IPv4=$HOST_IPv4_01
#NB_HOST_IPv4=$HOST_IPv4_02
#NB_HOST_IPv4=$HOST_IPv4_03

echo ${HOST_IPv4}

Store the select host variable.

In [None]:
%store NB_HOST_IPv4
print(NB_HOST_IPv4)

Save the table name as a variable.

In [None]:
NB_OBJECT_NAME="tbl_employees"
%store NB_OBJECT_NAME
print(NB_OBJECT_NAME)

Grep the Table_ID for the the table using `curl` and `jq`.

In [None]:
%%bash -s "$NB_OBJECT_NAME" "$NB_HOST_IPv4"  "$NB_DB_NAME"  "$NB_TSERVER_WEBSERVER_PORT"  --out NB_TABLE_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
DB_NAME=$( echo "${3}" | tr -d " ")
TSERVER_WEBSERVER_PORT=$( echo "${4}" | tr -d " ")

MY_URL="http://${HOST_IPv4}:${TSERVER_WEBSERVER_PORT}/metrics"

TABLE_ID=`curl -s --compressed ${MY_URL} | jq -r 'limit(1;  .[] | select(.attributes.namespace_name=="'${DB_NAME}'" and .type=="tablet" and .attributes.table_name=="'${OBJECT_NAME}'") |  .attributes.table_id) '`

echo ${TABLE_ID}

Store the table_id for the table.

In [None]:
%store NB_TABLE_ID
print(NB_TABLE_ID)

Get the tablet_id for the tablet leader for the select node host.

In [None]:
%%bash -s "$NB_OBJECT_NAME" "$NB_HOST_IPv4" --out NB_TABLET_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")

MY_URL="http://${HOST_IPv4}:8200/metrics"

TABLET_ID=`curl -s --compressed ${MY_URL} | jq --raw-output ' .[] | select(.attributes.namespace_name=="ks_ybu" and .type=="tablet" and .attributes.table_name=="'$OBJECT_NAME'") | {tablet_id: .id, metrics: .metrics[] | select(.name == ("is_raft_leader") ) | select(.value == 1) } | select(.tablet_id) | {tablet_id} | .tablet_id '`

echo ${TABLET_ID}

Store the tablet_id for the tablet leader.

In [None]:
%store NB_TABLET_ID
print(NB_TABLET_ID)

Flush the WAL file to a SST file for the given table_id.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_HOST_IPv4" "$NB_TABLE_ID"  # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
TABLE_ID=$( echo "${3}" | tr -d " ")
cd $YB_PATH

./yb-admin -init_master_addrs ${HOST_IPv4}:7100 flush_table_by_id ${TABLE_ID} 600

Dump and decode the SST file in human-readable form.

> Note:
>
> If the following does dump the SST file, it is most likely that there are not any rows written to this tablet. To resolve this issue, you need to select a different Tablet Server host. 
> 
> Return back to [Select a YB-TServer host](#select-a-yb-tserver-host-1) and select a different node host by commenting out (add a `#` sign) to line 7 and uncomment out (remove the`#` sign) line 8 or line 9.

In [None]:
%%bash -s "$NB_YB_PATH_BIN" "$NB_YB_PATH_DATA" "$NB_TABLE_ID" "$NB_TABLET_ID"  # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
YB_PATH_DATA=$( echo "${2}" | tr -d " ")
TABLE_ID=$( echo "${3}" | tr -d " ")
TABLET_ID=$( echo "${4}" | tr -d " ")

cd $YB_PATH/

TABLE_ID_PATH=${YB_PATH_DATA}/node-1/disk-1/yb-data/tserver/data/rocksdb/table-${TABLE_ID}/tablet-${TABLET_ID}
#ls -l  ${TABLE_ID_PATH}

./sst_dump --command=scan --file=${TABLE_ID_PATH} --output_format=decoded_regulardb

---

## DML: Query with `select` 
Run the following cells to observe the differences between:
- query all rows of the table 
- query the table with a where-expression that contains equality operators for all the columns in the primary key
- query the table with a where-expression that is for a range for a regular, non-primary key column

Query all the rows from `tbl_employees` using a `select statement` and a wildcard.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query wildcard all rows
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_employees;
"

Query the table with a where-expression that contains equality operators for all the columns in the primary key.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # query by PK
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_employees
  where id = 2 and full_name='Dick Grayson';
"

Query the table with a where-expression that is for a range for a regular, non-primary key column.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query by range
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_employees 
  where year > 2020 and year < 2023;
"

---
## DML: Modify with  `update`
You can easily update a column value for a primary key value with an `update` statement. To begin, first confirm the content of the row with a `select` query.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query by  id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_employees where id = 2;
  "

Execute the `update` statement.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Update a column value for a primary key
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  update tbl_employees
   set email='robin@yb.com'
     , year =2020
  where id = 2
    and full_name = 'Dick Grayson';
"

Confirm the change.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query by id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_employees where id = 2;
"

---
## DML: Upsert with `update`
The following `update` statement exhibits upsert behavior when there is no row to update based on the where clause predicate. As a result, the `update`  statement  will insert a new row into the table.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Upsert with update
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  update tbl_employees
    set email='captainamerica@yb.com'
      , year = 2022
  where id = 12
    and full_name = 'Steven Rogers';
"

Confirm the new row.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query by id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_employees 
  where id = 12;
"

---
## DML: Deletions with `delete`
There are various actions you can take using `delete` statements such as:
- single-row deletion
- multi-row deletion
- column value deletion

### Single-row deletion
You can delete a single row by with a `delete` statement. You must specify equality operators for the columns of the primary key in the where clause.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Delete by pk
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  delete from tbl_employees 
  where id = 12
    and full_name = 'Steven Rogers';
"

Confirm the row deletion.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query for id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_employees 
  where id = 12;
"

### Multi-row deletion
You can delete multiple rows that hae the same partition key. To begin, first insert another row so that there are two rows with the same partition key value, `id=2`.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # insert if not exists
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  insert into tbl_employees (id, full_name, email, year) values (2, 'Richard Grayson', 'nightwingc@yb.com', 2022) if not exists;
"

Next, verify that there are two rows.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query for id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_employees 
  where id = 2;
"

Now, delete the two rows.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # delete multiple rows using the partition key
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  delete from tbl_employees 
  where id = 2;
"

Verify the row deletion.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query for id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_employees 
  where id = 2;
"

### Column value deletion
A `delete` statement must give the entire primary key if specifying non-static columns. In following cell, `year` is a regular column.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # delete a column value for a primary key
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  delete year from tbl_employees
  where id = 11
    and full_name = 'Michael Holt';
"

Confirm the deletion of year value from the row.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query for id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select * from tbl_employees 
  where id = 11;
"

## DDL: Advanced data types
As a semi-relational database, YCQL stores relational data using advanced data types. YCQL offers a variety of advanced data types such as collections, user-defined data types, and JSONB.

### Collections
Collections are advanced data types that describe relational data for the table entity. There are three types of collections in YCQL: 
- list
- map
- set

A list is similar to an array data structure. All elements in a list must be of the same primitive type. 

A map is a sorted collection of key-value pairs where the key and value elements each have a data type, and the values of the key element determine the sort order. 

A set is a sorted collection of elements.

In general, collections are for storing small sets of values that are not expected to grow to arbitrary size. A good example of a collection are the phone numbers or mail addresses for a contact. A poor example of a collection are message posts for individual users in a discussion forum. In this regard, a large collection may have a significant, negative impact on performance for related queries. For example, some list operations require a read-before-write access pattern such as inserting an element into a list at a particular index.

> 📝 Note
>
> Collections have usage restrictions. A collection cannot be part of a primary key unless frozen. A collection cannot be part of a secondary index. A collection cannot be referenced in a where expression.  Empty collections are treated as null values and collections cannot be nested.

#### list
A list is similar to an array data structure. All elements in a list must be of the same primitive type. 

Run the following cell to alter the employees table so as to add a list of authorized offices.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # alter the table, first drop , then add
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu, drop
./ycqlsh -r -k $DB_NAME -e "
  alter table tbl_employees 
  drop authorized_offices;
"

sleep 1;

# DB_NAME=ks_ybu, add
./ycqlsh -r -k $DB_NAME -e "
  alter table tbl_employees 
  add authorized_offices LIST<TEXT>
  ;
"

> You can ignore the initial error from the preceding cell. It is related to the drop column.

Next, update the table with the array list.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # update
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  update tbl_employees
  set authorized_offices = ['Sunnyvale', 'Boston', 'Singapore', 'Tokyo', 'Banglore', 'Gothom']
  where id = 1
    and full_name = 'Bruce Wayne';
  ;
"

Confirm the update.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query for list and index ordinal
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select id,
    full_name, 
    authorized_offices,
    authorized_offices[0] as primary_office
  from tbl_employees
  where id = 1
    and full_name = 'Bruce Wayne';
  ;
"

> 🤔 Question:
>
> Is the array of authorized offices sorted?
>
> 🙋 Answer:
>
> No. The ordering of the elements of the array are in their original order.
>
> 🤔 Question:
>
> Does a list support an array index?
>
> 🙋 Answer:
>
> Yes, you can use an array index to query a specific element.

#### set
A set is a sorted collection of elements. 

Run the following cell to alter the employees table so as to add an sorted collection of authorized offices.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # alter the table, first drop , then add
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu, drop
./ycqlsh -r -k $DB_NAME -e "
  alter table tbl_employees 
  drop authorized_offices
  ;
"

sleep 1;
# DB_NAME=ks_ybu, add
./ycqlsh -r -k $DB_NAME -e "
  alter table tbl_employees 
  add authorized_offices SET<TEXT>
  ;
"

> You can ignore the initial error from the preceding cell. It is related to the drop column.

Next, update the table with the set of values.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # update
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  update tbl_employees
  set authorized_offices = {'Sunnyvale', 'Boston', 'Singapore', 'Tokyo', 'Banglore', 'Gothom'}
  where id = 1
    and full_name = 'Bruce Wayne';
  ;
"

Confirm the update.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query for id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select id,
   full_name, 
   authorized_offices
 -- , authorized_offices[0] as first_office
  from tbl_employees
  where id = 1
    and full_name = 'Bruce Wayne';
  ;
"

> 🤔 Question:
>
> Is the array of authorized offices sorted?
>
> 🙋 Answer:
>
> Yes. The elements of the array set are now sorted.
>
> 🤔 Question:
>
> Does a set support an array index?
>
> 🙋 Answer:
>
> No. A set collection does not support an index to query a specific element.

#### map
A map is a sorted collection of key-value pairs where the key and value elements each have a data type, and the values of the key element determine the sort order. 


Run the following cell to alter the employees table so as to add a map of roles.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query for id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  alter table tbl_employees 
  add roles MAP<TEXT,TEXT>
  ;
"

Next, update the table with the map of key-values.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query for id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
update tbl_employees
  set roles = {'role_security' : 'all', 'role_admin' : 'revoke_only'}
  where id = 1
    and full_name = 'Bruce Wayne';
  ;
"

Confirm the update and query for a specific key value.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # Query for id
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select 
    id, 
    full_name, 
    roles,
    roles['role_admin'] as key_value
  from tbl_employees
  where id = 1
    and full_name = 'Bruce Wayne';
  ;
"

#### Frozen collection
A frozen collection cannot have its elements added, updated, or removed. Only by overwriting the collection itself, can a frozen collection be changed. A column whose type is frozen can only have its value replaced as a whole.

 Data types that can be frozen include collections such as maps, sets, or lists, and user defined types. A user defined type is a data type that allows you to extend available data types into customized data types. 
 
 >📝 Note
 >
 > Columns of type frozen can be part of the primary key.


To begin, run the following cell to create a list collection for employees named `previous_employers`.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # frozen collection alter
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH


# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  alter table tbl_employees 
  drop previous_employers;
"

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  alter table tbl_employees 
  add previous_employers list<TEXT>;
"

> You can ignore the initial error from the preceding cell. It is related to the drop column.

Insert a row with into the table.

In [None]:
  %%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # frozen collection insert
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  insert into tbl_employees (id, full_name, email, year, previous_employers) values (100, 'Barry Allen', 'theflash@yb.com', 2023, ['Airforce','Army','Marines','Navy']);
"

Update the first element in the list.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # frozen collection update
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  update tbl_employees
  set previous_employers[0] = 'Justice League'
  where id = 100
    and full_name = 'Barry Allen';
"

Query to see the change from `Airforce` to `Justice Leauge`.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # frozen collection update
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  select id, full_name, email, year,  previous_employers
  from tbl_employees
  where id = 100
    and full_name = 'Barry Allen';
"

The update to the list collection was successful. 

To change this behavior, you can alter the DDL for the table to use a frozen collection.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # frozen collection alter
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  alter table tbl_employees 
  drop previous_employers;
  "
  
sleep 1;

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  alter table tbl_employees 
  add previous_employers frozen<list<TEXT>>;
"

Insert a row into the table.

In [None]:
  %%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # frozen collection insert
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  insert into tbl_employees (id, full_name, email, year, previous_employers) values (101, 'Barry Allen', 'theflash@yb.com', 2023, ['Airforce','Army','Marines','Navy']);
"

Attempt to modify the first element in the frozen list. This will result in an exception in the an output cell.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # frozen collection update, throw exception
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  update tbl_employees
  set previous_employers[0] = 'Justice League'
  where id = 101
    and full_name = 'Barry Allen';
"

```<stdin>:1:SyntaxException: Invalid CQL Statement. Columns with elementary types cannot take arguments
update tbl_employees
  set previous_employers[0] = 'Justice League'
   ^^^^^^^^^^^^^^^^^^^
  where id = 101
    and full_name = 'Barry Allen';
 (ql error -12)
 ```

 This is the exception thrown by the attempted update of a value in a frozen collection.

---
## Time to Live (TTL)
Time to live or TTL is a data expiration property used to remove data that is time sensitive, private, or deprecated. An example use-case for TTL is an authorization application. After the new user creation. the application can only query the user data for 1 hours. After 3,600 seconds, the data is no longer available for the application to read.

The TTL property can be set as a 
- table property
- column property
- row property


> 📝 Note
> 
> It is important to note that a table with transactions enabled does not support TTL.

### DDL: TTL as a table property
For all rows in a table, you can define a default time to live property. The property expires data systematically. The default value is `0` for zero seconds.

First, review the DDL of the employees table.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # describe table
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  desc tbl_employees;
"

Alter the table to modify the value of the  `default_time_to_live` property to `15` seconds.

> ⚠️ Warning 
>
> This DDL change affects all existing rows in the table. After execution of the following statement, all rows in the employees table will expire in 15 seconds.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # alter default_time_to_live table property
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  alter table tbl_employees 
  with default_time_to_live = 15;
"

Confirm the DDL change.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"  # describe table
YB_PATH=${1}
DB_NAME=${2}
cd $YB_PATH

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  desc tbl_employees;
"

Now that you've confirmed the property update, run the following to insert a new row, sleep, query for the new row value, sleep, and query for the row value again.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"   # ttl, insert, tselect 
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

# DB_NAME=ks_ybu, insert
./ycqlsh -r -k $DB_NAME  -e "
  insert into tbl_employees (id, full_name, email, year) values (100, 'Barry Allen', 'theflash@yb.com', 2023);
  "

sleep 1;

# DB_NAME=ks_ybu, select
./ycqlsh -r -k $DB_NAME  -e "
  select id, full_name, email, year 
  from tbl_employees 
  where id = 100 
    and full_name = 'Barry Allen';
  "

sleep 14;

# DB_NAME=ks_ybu, select
./ycqlsh -r -k $DB_NAME  -e "
  select id, full_name, email, year 
  from tbl_employees 
  where id = 100 
    and full_name = 'Barry Allen';
  "


The above output cell shows `1 rows` from the first `select` statement, and `0 rows` from the second `select` statement. 

### DML: `update` using TTL for a column property
You can set the row value of a non-primary key column to expire after a set amount of time. With the `using TTL` clause, you can add a time to live value. Before the set clause of an update statement, you specify the TTL in seconds as an integer value. Once the time expires, the TTL property will set the column value for the row to null.

In the following example, you can set the life of the column to 5 seconds.


In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"   # insert, sleep, query by pk
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

# DB_NAME=ks_ybu, insert
./ycqlsh -r -k $DB_NAME  -e "
  insert into tbl_employees (id, full_name, email, year) values (100, 'Jay Garrick', 'theflash@yb.com', 2023);
"

sleep 1;

# update
./ycqlsh -r -k $DB_NAME  -e "
  update tbl_employees
  using ttl 5 
    set email = 'realflash@yb.com'
  where id = 100
    and full_name = 'Jay Garrick';
"

sleep 1;

# select
./ycqlsh -r -k $DB_NAME  -e "
  select id, full_name, email, year 
  from tbl_employees 
  where id = 100
    and full_name = 'Jay Garrick';
"

sleep 5;

# select
./ycqlsh -r -k $DB_NAME  -e "
  select id, full_name, email, year 
  from tbl_employees 
  where id = 100
    and full_name = 'Jay Garrick';
"

sleep 8;

# select

./ycqlsh -r -k $DB_NAME  -e "
  select id, full_name, email, year 
  from tbl_employees 
  where id = 100
    and full_name = 'Jay Garrick';
"

The above output cell shows `1 rows` from the first `select` statement. The second `select` statement shows `1 rows`, but the value for the `email` column is now `null`.

As expected, since the TTL property for the table is set to `15` seconds, the last `select` statement shows `0 rows` for the query.

### DML: `insert` using TTL for a row property
You can insert a row that only exists in the table for a given amount of time. Known as TTL, the time to live property determines the number of seconds that a given row exists in a table. When the time expires, the row is deleted from the table.

In the following example, you can set the life of the row to 15 seconds.

In [None]:
%%bash -s "$NB_YB_PATH_BIN"  "$NB_DB_NAME"   # insert ttl, sleep, query by pk
YB_PATH=${1}
DB_NAME=${2}  

cd $YB_PATH

./ycqlsh -r -k $DB_NAME  -e "
  insert into tbl_employees (id, full_name, email, year) values (100, 'Wally West', 'theflash@yb.com', 2023) using TTL 15;
"

sleep 1;

./ycqlsh -r -k $DB_NAME  -e "
  select id, full_name, email, year 
  from tbl_employees 
  where id = 100 
    and full_name = 'Wally West';
"

sleep 9;

./ycqlsh -r -k $DB_NAME  -e "
  select id, full_name, email, year 
  from tbl_employees 
  where id = 100 
    and full_name = 'Wally West';
"

sleep 5;

./ycqlsh -r -k $DB_NAME  -e "
  select id, full_name, email, year 
  from tbl_employees 
  where id = 100 
    and full_name = 'Wally West';
"


The above output cell shows `1 rows` from the first and second `select` statements,  and `0 rows` from last `select` statement. 

---
# 🌟🌟 Good job! 

In this notebook, you completed the following:

- Language fundamentals
  - Requirements
  - DDL commands for YCQL
  - DML: Write with `insert`
  - Partition key, partition key hash, and the `partition_hash()` built-in function
  - DML: Upsert with `insert`
  - Distributed Document Store (DocDB)
  - DML: Query with `select` 
  - DML: Upsert with `insert`
  - DML: Modify with 'update'
  - DML: Upsert with `update`
  - DML: Deletions with `delete`
  - DDL: Advanced data types
  - Time to Live (TTL)


## 😊 Next up!
Continue your learning by opening the next notebook, `03_QDDM_query_plans.ipynb`. 

You can either open the file from the Explorer or simply run the following cell:

In [None]:
%%bash
gp open '03_QDDM_query_plans.ipynb'