<div style="width:100%; background-color: #000041"><a target="_blank\" href="http://university.yugabyte.com\"><img src="assets/YBU_Logo.webp" /></a></div>

# Lab Requirements and Setup

This lab consists of several Jupyter notebooks. The notebooks run in Gitpod. [Gitpod](https://www.gitpod.io/) facilitates runtime environments where a YugabyteDB database cluster can be deployed. Read the following instructions for requirements and setup of the Gitpod environment.

## About Jupyter notebooks
You will use a Jupyter notebook in this lab to run commands to assign environmental variables and Cassandra Query Language commands for the Yugabyte Cloud Query Language known as YCQL. 

There are two types of cells: markdown and code. This is a markdown cell.

You run a code cell by simply selecting the play icon in the cell's left gutter. For code cells, you can modify the code for execution. Certain labs contain challenges or experiments that require you to do just that - modify a code cell and re-run it!

### Requirements
Here are the requirements for this lab:
- Launch using a gitpod workspace
- Run a three node, YugabyteDB cluster using `yb-ctl`

> Note
>  
> Although a three node cluster is up and running, Gitpod does not support visiting loopback addresses over a web ui, even if exposed on a different port.
> 127.0.0.1 is the only web user interfaces. To see all available ports in Gitpod, in the terminal, run `gp ports list`.

#### Notebook keyboard shortcuts
The Jupyter extension for Gitpod supports the following keyboard shortcuts:
| Keystroke | Description |
|--|--|
| ESC | Change the cell mode |
| A | Add a cell above |
| B | Add a cell below |
| J or down arrow key |  Change a cell to below | 
| K or up arrow key | Change a cell to above | 
| Ctrl+Enter | Run the currently selected cell |
| Shift+Enter | Run the currently selected cell and insert a new cell immediately below (focus moves to new cell) |
| Alt+Enter | Run the currently selected cell and insert a new cell immediately below (focus remains on current cell) |
| dd | Delete a selected cell |
| z | Undo the last change | 
| M | switch the cell type to Markdown | 
| Y | switch the cell type to code |
| L | Enable/Disable line numbers |
```


## Setup steps
Here are the steps to setup this lab:
- Install missing dependencies and restart the notebook
- Create the notebook variables
- Create the `ks_ybu` database

### Install missing dependencies and restart the notebook
Run the following cell to ensure that the notebook dependencies are available to the notebook. 

### Create the notebook variables 

> IMPORTANT!
> 
> Do NOT skip running this cell. 
> 

The following Python cell creates and stores variables that all the notebooks in this lab will use. You can view these variables in the Jupyter tab.

- To run the script, select Execute Cell (Play Arrow) in the left gutter of the cell.
- Verify the accuracy of the output values

In [None]:
# Env variables for Notebook
import os

# read env_vars.env
env_vars = !cat env_vars.env
for var in env_vars:
    key, value = var.split('=')
    os.environ[key] = value
 

# Comment out Local 
#MY_YB_PATH=os.environ.get('MY_YB_PATH_LOCAL')
#MY_YB_PATH_DATA=os.environ.get('MY_YB_PATH_DATA_LOCAL')
#MY_GITPOD_WORKSPACE_URL=os.environ.get('MY_GITPOD_WORKSPACE_URL_LOCAL')


# Gitpod specific
MY_YB_PATH=os.environ.get('MY_YB_PATH')
MY_YB_PATH_DATA=os.environ.get('MY_YB_PATH_DATA')
MY_GITPOD_WORKSPACE_URL=os.environ.get('GITPOD_WORKSPACE_URL')


# env_vars defines the following
MY_DB_NAME=os.environ.get('MY_DB_NAME')
MY_HOST_IPv4_01=os.environ.get('MY_HOST_IPv4_01')
MY_HOST_IPv4_02=os.environ.get('MY_HOST_IPv4_02')
MY_HOST_IPv4_03=os.environ.get('MY_HOST_IPv4_03')
MY_TSERVER_WEBSERVER_PORT=os.environ.get('MY_TSERVER_WEBSERVER_PORT')

# Gitpod URLS
MY_YB_MASTER_HOST_GITPOD_URL = MY_GITPOD_WORKSPACE_URL.replace('https://','https://7000-')
MY_YB_TSERVER_HOST_GITPOD_URL = MY_GITPOD_WORKSPACE_URL.replace('https://','https://'+MY_TSERVER_WEBSERVER_PORT+'-')

# Current directory of project and related child folders
MY_NOTEBOOK_DIR=os.getcwd()
MY_NOTEBOOK_DATA_FOLDER=MY_NOTEBOOK_DIR +'/data'

# Data files
MY_DATA_DDL_FILE=os.environ.get("MY_DATA_DDL_FILE")
MY_DATA_DML_FILE=os.environ.get("MY_DATA_DML_FILE")

# Store the note book values for other notebooks to use

%store MY_DB_NAME
%store MY_YB_PATH
%store MY_YB_PATH_DATA
%store MY_GITPOD_WORKSPACE_URL
%store MY_HOST_IPv4_01
%store MY_HOST_IPv4_02
%store MY_HOST_IPv4_03
%store MY_NOTEBOOK_DIR
%store MY_TSERVER_WEBSERVER_PORT
%store MY_NOTEBOOK_DATA_FOLDER
%store MY_YB_MASTER_HOST_GITPOD_URL
%store MY_YB_TSERVER_HOST_GITPOD_URL

%store MY_DATA_DDL_FILE
%store MY_DATA_DML_FILE



# YCQL Shell Commands
The following commands are YCQL shell commands. Using `ucqlsh`, you can execute YCQL statements for zero or more keyspaces. 

By default, YugabyteDB and the YCQL API have a consistency level of QUORUM. 

In [None]:
%%bash -s "$MY_YB_PATH"   # Shell Commands
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "CONSISTENCY"

YCQL is compatible with CQL 3.4.2.

In [None]:
%%bash -s "$MY_YB_PATH"   # Shell Commands
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "show version"  

In [None]:
%%bash -s "$MY_YB_PATH"   # Shell Commands
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "select cql_version from system.local;"  

`show host` details the host connection.

In [None]:
%%bash -s "$MY_YB_PATH"   # Shell Commands
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "show host"  

`decribe kespaces`  returns the names of all keyspaces.

In [None]:
%%bash -s "$MY_YB_PATH"   # Shell Commands
YB_PATH=${1}
cd $YB_PATH

#./bin/ycqlsh --execute "describe keyspaces"  
./bin/ycqlsh --execute "help describe"  


`describe tables` returns the names of all tables in the current keyspace, or in all keyspacces.

In [None]:
%%bash -s "$MY_YB_PATH"   # Shell Commands
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "desc tables"

The following queries the roles table. `cassandra` is a default role. 

In [None]:
%%bash -s "$MY_YB_PATH"   # Shell Commands
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "select * from system_auth.roles;"  

# DDL commands for YCQL

## Create the `ks_ybu` keyspace
Run the following cells to connect to the YugabyteDB cluster using `ycqlsh`. Then, complete the following tasks:
- Create the `ks_ybu` keyspace 
- Create the `tbl_employees` table
- Describe the  `tbl_employees` table

In [None]:
%%bash -s "$MY_YB_PATH"   # Create the keyspace, ks_ybu, and table, employees
YB_PATH=${1}
cd $YB_PATH


./bin/ycqlsh --execute "
  create keyspace if not exists ks_ybu;
  "

Confirm the keysapce creation.

In [None]:
%%bash -s "$MY_YB_PATH"   # Create the keyspace, ks_ybu, and table, employees
YB_PATH=${1}
cd $YB_PATH


./bin/ycqlsh --execute "describe keyspace ks_ybu;
  "

## Create the `tbl_employees` table

If the table already exists, the command will drop the table.

In [None]:
%%bash -s "$MY_YB_PATH"   # Create the keyspace, ks_ybu, and table, employees
YB_PATH=${1}
cd $YB_PATH


./bin/ycqlsh --execute "
  drop table if exists ks_ybu.tbl_employees;
  "

Create the table, `tbl_employees`. 

> Note: A primary key is required for all user tables in a keysapce.

In [None]:
%%bash -s "$MY_YB_PATH"   # Create the keyspace, ks_ybu, and table, employees
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "
  create table ks_ybu.tbl_employees ( 
    id int, 
    full_name text, 
    email text,
    year int static,
    primary key (id, full_name) 
    );
  "

Describe `tbl_employees`. 

> Question:
>   
> What is different about the table description and the create table statement?

In [None]:
%%bash -s "$MY_YB_PATH"   # Create the keyspace, ks_ybu, and table, employees
YB_PATH=${1}
cd $YB_PATH


./bin/ycqlsh --execute "desc ks_ybu.tbl_employees "


> Answer:
>   
> The description of the table reveals that the primary key consists of two key columns. The first key column is `id`. The second key column is `full_name`. `id` is the partition key for the table. The clustering key column is `full_name`. Ascending is the order of the clustering key column.

#### Write to the table employees with `insert`

Insert rows into the `tbl_employees` table.

In [None]:
%%bash -s "$MY_YB_PATH"   # Populate the employees table
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "
  insert into ks_ybu.tbl_employees 
    (id, full_name, email, year) 
    values 
    (1, 'Bruce Wayne', 'batman@yb.com', 2020);
 "

 ./bin/ycqlsh --execute "
  insert into ks_ybu.tbl_employees 
    (id, full_name, email, year) 
    values 
    (1, 'Dick Grayson', 'robin@yb.com', 2020);
 "

./bin/ycqlsh --execute "
  insert into ks_ybu.tbl_employees 
     (id, full_name, email, year) 
    values 
    (2, 'Clark Kent', 'superman@yb.com', 2021);
"

./bin/ycqlsh --execute "
  insert into ks_ybu.tbl_employees 
     (id, full_name, email, year) 
    values 
    (3, 'Peter Parker', 'spiderman@yb.com', 2022);
"

In [None]:
%%bash -s "$MY_YB_PATH"   # Populate the employees table
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "select * from ks_ybu.tbl_employees"


---

# Distributed Document Storage for YugabyteDB

Since you just created a table in YCQL, it's a good time to dig into the distributed storage of the YCQL table, known as DocDB.


The partition key for the table determines how the consistent hash sharding algorithm of YugabyteDB distributes rows into tablets. The replication factor for the cluster determines the number of followers per tablet leader. Your YugabyteDB cluster is running with a replication factor of 3. The cluster also is running with a global flag that automatically configures the number of tablets per node per table at 1. This means that the table, `tbl_employees`, automatically has 3 tablet leaders (one for each cluster node) and 6 tablet followers (2 for each leader). Each tablet is a customized instance of RocksDB.

A tablet leader persists data to disk in two forms: as a Write Ahead Log (WAL) file and then as a Sorted Sequence Table (SST) file. The WAL file for a tablet leader replicates to the tablet followers in its tablet-peer, raft consensus group. At a configurable size, the WAL file flushes to disk and persists as a SST file. As a SST file grows, it will undergo compaction using a universal compaction strategy. By default, tablet data grows to about 100 GB in size before a tablet splits.

The SST file stores row data in a particular schema for RocksDB. This schema for YugabyteDB is known as DocDB. It consists of a DocKey and document values.  A DocKey is made up of a partition key hash, a partition key value, and any cluster key values. Document values are sub-keys of the DocKey. A SubDocKey contains a column id, a hybrid-logical clock time, a write order, and the actual value. Here is an example:

```
SubDocKey(DocKey(0xc0c4, [2], ["Clark Kent"]), [SystemColumnId(0); HT{ physical: 1671751515613266 }]) -> null
SubDocKey(DocKey(0xc0c4, [2], ["Clark Kent"]), [ColumnId(2); HT{ physical: 1671751515613266 w: 1 }]) -> "superman@yb.com"
SubDocKey(DocKey(0xc0c4, [2], []), [ColumnId(3); HT{ physical: 1671751515613266 w: 2 }]) -> 2021
```

In this exercise, you will decode a SST file for a tablet leader for `tbl_employees`.

### Select a YB-TServer host
Set the host variable for one of the nodes. All three nodes in the cluster are running a Tablet Server (YB-TServer). You can comment/uncomment lines 8-10 as needed.


In [None]:
%%bash -s "$MY_HOST_IPv4_01" "$MY_HOST_IPv4_02" "$MY_HOST_IPv4_03" --out MY_HOST_IPv4

HOST_IPv4_01=${1}
HOST_IPv4_02=${2}
HOST_IPv4_03=${3}

# change the hosts for different tablet leaders
MY_HOST_IPv4=$HOST_IPv4_01
#MY_HOST_IPv4=$HOST_IPv4_02
#MY_HOST_IPv4=$HOST_IPv4_03

echo ${MY_HOST_IPv4}

Store the select host variable.

In [None]:
%store MY_HOST_IPv4
print(MY_HOST_IPv4)


Get the table_id for the table.

In [None]:
MY_OBJECT_NAME="tbl_employees"
%store MY_OBJECT_NAME
print(MY_OBJECT_NAME)

In [None]:
%%bash -s "$MY_OBJECT_NAME" "$MY_HOST_IPv4" --out MY_TABLE_ID

OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")


MY_URL="http://${HOST_IPv4}:${MY_TSERVER_WEBSERVER_PORT}/metrics"

MY_TABLE_ID=`curl -s --compressed ${MY_URL} | jq -r 'limit(1;  .[] | select(.attributes.namespace_name=="ks_ybu" and .type=="tablet" and .attributes.table_name=="'$OBJECT_NAME'") |  .attributes.table_id) '`

echo ${MY_TABLE_ID} 


Store the table_id for the table.

In [None]:
%store MY_TABLE_ID
print(MY_TABLE_ID)

Get the tablet_id for the tablet leader for the select node host.

In [None]:
%%bash -s "$MY_OBJECT_NAME" "$MY_HOST_IPv4" --out MY_TABLET_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")

MY_URL="http://${HOST_IPv4}:8200/metrics"


TABLET_ID=`curl -s --compressed ${MY_URL} | jq --raw-output ' .[] | select(.attributes.namespace_name=="ks_ybu" and .type=="tablet" and .attributes.table_name=="'$OBJECT_NAME'") | {tablet_id: .id, metrics: .metrics[] | select(.name == ("is_raft_leader") ) | select(.value == 1) } | select(.tablet_id) | {tablet_id} | .tablet_id '`

echo ${TABLET_ID}

Store the tablet_id for the tablet leader.

In [None]:
%store MY_TABLET_ID
print(MY_TABLET_ID)

Flush the WAL file to a SST file for the given table_id.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_HOST_IPv4_01" "$MY_TABLE_ID"  # Import file path of Yugabyte and DB name
YB_PATH=${1}
HOST_IPv4=${2}

TABLE_ID=$( echo "${3}" | tr -d " ")

cd $YB_PATH

./bin/yb-admin -init_master_addrs ${HOST_IPv4}:7100 flush_table_by_id ${TABLE_ID} 600


Dump and decode the SST file in human-readable form.

> Note:
>
> If the following does dump the SST file, it is most likely that there are not any rows written to this tablet. To resolve this issue, you need to select a different Tablet Server host. Return back to [Select a YB-TServer host] and select a different node host.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_YB_PATH_DATA" "$MY_HOST_IPv4_01" "$MY_TABLE_ID" "$MY_TABLET_ID" # Import file path of Yugabyte and DB name
YB_PATH=${1}
YB_PATH_DATA=${2}
HOST_IPv4=${3}
TABLE_ID=$( echo "${4}" | tr -d " ")
TABLET_ID=$( echo "${5}" | tr -d " ")

cd $YB_PATH/bin/
pwd
TABLE_ID_PATH=${YB_PATH_DATA}/node-1/disk-1/yb-data/tserver/data/rocksdb/table-${TABLE_ID}/tablet-${TABLET_ID}

 ls -l  ${TABLE_ID_PATH}

./sst_dump --command=scan --file=${TABLE_ID_PATH} --output_format=decoded_regulardb


---

### `select` statement
Run the following cells to observe the differences between:
- query all rows of the table 
- query the table with a where-expression predicate that contains the primary key and a equality operator
- query the table with a where-expression predicate that for a range for a regular column

Query all the rows from `tbl_employees` using a`select statement` and a wildcard.

In [None]:
%%bash -s "$MY_YB_PATH"   # Data search
YB_PATH=${1}
cd $YB_PATH

# Read all columns from the employees table
./bin/ycqlsh --execute "
    select * from ks_ybu.tbl_employees;
    "

Query the table using a where-expression predicate that contains the primary key and a equality operators.

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Find an employee id = 3
./bin/ycqlsh --execute "
  select *
  from ks_ybu.tbl_employees
  where id = 1 and full_name='Dick Grayson';
"

Query the table using where-expression predicate that specifies a range for a regular column.

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

./bin/ycqlsh --execute "
  select * from ks_ybu.tbl_employees 
  where  year > 2020 and year < 2022;
"

### `update` statement and upsert behavior
The following update statement exhibits upsert behavior when there is no row to update based on the where clause predicate.

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
 update ks_ybu.tbl_employees
 set email='captainamerica@yb.com'
   , year = 2022
 where id = 4 
   and full_name = 'Steven Rogers'
  "

Confirm the new row.

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
 select * from ks_ybu.tbl_employees where id = 4;
  "

## todo DELTE Behavior


single-row deleteion

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  delete from ks_ybu.tbl_employees where id = 6;
"

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
 select * from ks_ybu.tbl_employees;
  "

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  create table ks_ybu.tbl1 (col_pk int, col_ck1 text, col_reg int, primary key (col_pk, col_ck));
  "

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  insert into ks_ybu.tbl1 (col_pk , col_ck , col_reg ) values ( 1, 'green', 2022);
  insert into ks_ybu.tbl1 (col_pk , col_ck , col_reg ) values ( 1, 'red', 2022);
  insert into ks_ybu.tbl1 (col_pk , col_ck , col_reg ) values ( 2, 'red', 2022);
  insert into ks_ybu.tbl1 (col_pk , col_ck , col_reg ) values ( 2, 'green', 2022);
    insert into ks_ybu.tbl1 (col_pk , col_ck , col_reg ) values ( 3, 'blue', 2022);
  insert into ks_ybu.tbl1 (col_pk , col_ck , col_reg ) values ( 3, 'red', 2022);
  "

### single row, PK

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  delete from ks_ybu.tbl1 where col_pk = 1 and col_ck = 'green';
  "

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  select * from ks_ybu.tbl1;
  "

### multi-row

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  delete from ks_ybu.tbl1 where col_pk=1 and col_ck > 'green'
  "

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  select * from ks_ybu.tbl1;
  "

### delete a column value?


In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  delete col_reg from ks_ybu.tbl1 where col_pk = 3 and col_ck = 'blue';
  "

#### delete if

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  delete from ks_ybu.tbl1 where col_pk = 2 and col_ck = 'green' if col_ck > 2021;
  "

## Part 5 | Built-in functions

date-time

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_HOST_IPv4_01" "$MY_INDEX_TABLE_ID" "$MY_INDEX_TABLET_ID" # Import file path of Yugabyte and DB name
YB_PATH=${1}
DB_NAME=${2}
HOST_IPv4=${3}
INDEX_TABLE_ID=$( echo "${4}" | tr -d " ")
INDEX_TABLET_ID=$( echo "${5}" | tr -d " ")

cd $YB_PATH
cd bin/

TABLE_ID_PATH=/Users/seth/yugabyte-data/node-1/disk-1/yb-data/tserver/data/rocksdb/table-${INDEX_TABLE_ID}/tablet-${INDEX_TABLET_ID}
ls -l  ${TABLE_ID_PATH}

./sst_dump --command=scan --file=${TABLE_ID_PATH} --output_format=decoded_regulardb

# ./sst_dump --command=scan --file=${TABLE_ID_PATH} --show_properties
# -command=scan --read_num=5

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  select col_pk, totimestamp(now()),todate(now()), dateof(now()) from ks_ybu.tbl1 


"

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  select col_pk, col_ck, col_reg, currentdate(), currenttime(), currenttimestamp(), todate(now()) from ks_ybu.tbl1 


"

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  select partition_hash(artist,song) from ks_ybu.tbl_music;


"

In [None]:
%%bash -s "$MY_YB_PATH"   
YB_PATH=${1}
cd $YB_PATH

# Search a range of values
./bin/ycqlsh --execute "
  select partition_hash(col_pk),col_pk, partition_hash('AAAA'),col_ck, col_reg, currentdate(), currenttime(), currenttimestamp(), todate(now()) from ks_ybu.tbl1 
"