<div style="width:100%; background-color: #000041"><a target="_blank\" href="http://university.yugabyte.com\"><img src="assets/YBU_Logo.webp" /></a></div><br>

> **YugabyteDB YCQL Development**
>
> Enroll for free at [Yugabyte University](https://university.yugabyte.com/courses/yugabytedb-ycql-development).
>

# Secondary indexes
 A secondary index is a data structure that contains some of the columns of the index table and an index key that supports one or more data access patterns. In this notebook, you will learn how to create secondary indexes to not only improve query performance, but also remove unnecessary tables from the data model.

### Import the notebook variables 

> Requirements:
>
> You must first create the variables in the `01_Setup.ipynb` notebook.
>

The following Python cell reads the stored variables created in the `01_Setup.ipynb` notebook. 

- To run the script, select Execute Cell (Play Arrow) in the left gutter of the cell.

In [None]:
%store -r MY_DB_NAME
%store -r MY_YB_PATH
%store -r MY_YB_PATH_DATA
%store -r MY_GITPOD_WORKSPACE_URL
%store -r MY_HOST_IPv4_01
%store -r MY_HOST_IPv4_02
%store -r MY_HOST_IPv4_03
%store -r MY_NOTEBOOK_DIR
%store -r MY_TSERVER_WEBSERVER_PORT
%store -r MY_NOTEBOOK_DATA_FOLDER
%store -r MY_YB_MASTER_HOST_GITPOD_URL
%store -r MY_YB_TSERVER_HOST_GITPOD_URL
%store -r MY_DATA_DDL_FILE
%store -r MY_DATA_DML_FILE

---
## Secondary index: `Index Scan` query plan
A secondary index is a data structure that often contains some of the columns of the index table and an index key that supports one or more data access patterns. You can often create a secondary index so that a given query plan uses the secondary index.

An `index scan` query plan uses a secondary index. However, after accessing the index, the query accesses the index table. This type of query plan is often better than a `sequence scan` query plan.

To begin, examine the query plan without the implementation of a secondary index.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # new query plan
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  explain select  category, brand, product_name, description, product_id 
  from tbl_products_by_category 
  where brand=?;
"  

The query plan reveals that the query uses a sequence scan.

### Create the secondary index

> **Important**
>
> You can only create a secondary index for a table with the `transactions` property enabled.
> 

To begin, drop the secondary index if it exists.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # Secondary index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
drop index if exists idx_products_by_category_brand;"

sleep 1;

Run the following cell to create an index by describing the `tbl_products_by_category` table.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # Secondary index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
create index if not exists idx_products_by_category_brand 
  on tbl_products_by_category (brand) 
  ;"

Use the `describe` keyword  to verify if the index was created for the given table. 


In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # describe table to view index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  describe tbl_products_by_category;
  "

The index key consists of a partition key and zero or more clustering keys. In the previous example, you can see that the index key is `brand`. The index inherits the primary key from the index table. The clustering keys are `category`,`product_name`, and `product_id`. 

### Index backfill
By default, YugabyteDB will automatically backfill an index. You can check the status of the index backfill operation using the YB-Master web ui at `http://yb-master-url:7000/tasks`. You can also grep the html output of the web ui. 

In [None]:
%%bash -s "$MY_HOST_IPv4_01" 
HOST_IPv4=$( echo "${1}" | tr -d " ")
MY_URL="http://${HOST_IPv4}:7000/tasks"

curl -s  ${MY_URL} | html2text | grep  idx_products_by_category_brand 

There are multiple tasks associated with the creation of the index. For an index on a very large table, you will want to check for these task names associated with the given index:
- `Backfill Index Table`
- `Mark backfill done.`

> Important!
>
> In certain cases, the backfill may fail. The state will show `kFailed` instead of `kComplete`.
> 

### View the query plan for the secondary index

> **Important:** 
> The primary key of the index table is part of the secondary index data structure. 

Run the following cell to verify if adding a secondary index for the `brand` can make the product_name query on the `tbl_products_by_category` more efficient.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # new query plan
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  explain select  category, brand, product_name, description, product_id 
  from tbl_products_by_category 
  where brand=?;
"  

#### `Index Scan`
With the introduction of a secondary index for the table, there is a new query plan. This query plan contains one node. The action is an index scan. Key Conditions is a sub-action that specifies the use of the partition key for the index. The index contains the partition key for the index and the primary key columns of the index table. After accessing the index, the query accesses the table. The table contains the description column. The action to access the index table is similar to a primary key lookup. Since the index contains the primary key columns, the query is able to lookup the rows in the table using the primary key values.

### Index tablet and `index scan` query
YugabyteDB stores a secondary index in the same way as it does for a table. A secondary index exists as tablets. The data structure is also similar. It is DocDB. 

#### Select a YB-TServer host
Set the host variable for one of the nodes. All three nodes in the cluster are running a Tablet Server (YB-TServer). You can comment/uncomment lines 7-9 as needed.


In [None]:
%%bash -s "$MY_HOST_IPv4_01" "$MY_HOST_IPv4_02" "$MY_HOST_IPv4_03" --out MY_HOST_IPv4
HOST_IPv4_01=$( echo "${1}" | tr -d " ")
HOST_IPv4_02=$( echo "${2}" | tr -d " ")
HOST_IPv4_03=$( echo "${3}" | tr -d " ")

# change the hosts for different tablet leaders
MY_HOST_IPv4=$HOST_IPv4_01
#MY_HOST_IPv4=$HOST_IPv4_02
#MY_HOST_IPv4=$HOST_IPv4_03

echo ${MY_HOST_IPv4}

Store the select host variable.

In [None]:
%store MY_HOST_IPv4
print(MY_HOST_IPv4)

Save the table name as a variable.

In [None]:
MY_OBJECT_NAME="idx_products_by_category_brand"
%store MY_OBJECT_NAME
print(MY_OBJECT_NAME)

Grep the index_id for the index using `curl` and `jq`.

In [None]:
%%bash -s "$MY_OBJECT_NAME" "$MY_HOST_IPv4"  "$MY_DB_NAME"  "$MY_TSERVER_WEBSERVER_PORT"  --out MY_INDEX_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
DB_NAME=$( echo "${3}" | tr -d " ")
TSERVER_WEBSERVER_PORT=$( echo "${4}" | tr -d " ")


MY_URL="http://${HOST_IPv4}:${TSERVER_WEBSERVER_PORT}/metrics"

MY_INDEX_ID=`curl -s --compressed ${MY_URL} | jq -r 'limit(1;  .[] | select(.attributes.namespace_name=="'${DB_NAME}'" and .type=="tablet" and .attributes.table_name=="'${OBJECT_NAME}'") |  .attributes.table_id) '`

echo ${MY_INDEX_ID}

Store the index_id for the index.

In [None]:
%store MY_INDEX_ID
print(MY_INDEX_ID)

Get the tablet_id for the tablet leader for the select node host.

In [None]:
%%bash -s "$MY_OBJECT_NAME" "$MY_HOST_IPv4" --out MY_INDEX_TABLET_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")

MY_URL="http://${HOST_IPv4}:8200/metrics"

MY_INDEX_TABLET_ID=`curl -s --compressed ${MY_URL} | jq --raw-output ' .[] | select(.attributes.namespace_name=="ks_ybu" and .type=="tablet" and .attributes.table_name=="'$OBJECT_NAME'") | {tablet_id: .id, metrics: .metrics[] | select(.name == ("is_raft_leader") ) | select(.value == 1) } | select(.tablet_id) | {tablet_id} | .tablet_id '`

echo ${MY_INDEX_TABLET_ID}

Store the tablet_id for the tablet leader.

In [None]:
%store MY_INDEX_TABLET_ID
print(MY_INDEX_TABLET_ID)

Flush the WAL file to a SST file for the given index_id.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_HOST_IPv4" "$MY_INDEX_ID"  # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
INDEX_ID=$( echo "${3}" | tr -d " ")
cd $YB_PATH/bin

./yb-admin -init_master_addrs ${HOST_IPv4}:7100 flush_table_by_id ${INDEX_ID} 600

Dump and decode the SST file in human-readable form.

> Note:
>
> If the following does **NOT** dump the SST file, it is most likely that there are not any rows written to this tablet. To resolve this issue, you need to select a different Tablet Server host. Return back to **Select a YB-TServer host** and select a different node host.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_YB_PATH_DATA" "$MY_INDEX_ID" "$MY_INDEX_TABLET_ID" # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
YB_PATH_DATA=$( echo "${2}" | tr -d " ")
INDEX_ID=$( echo "${3}" | tr -d " ")
INDEX_TABLET_ID=$( echo "${4}" | tr -d " ")

cd $YB_PATH/bin/

INDEX_ID_PATH=${YB_PATH_DATA}/node-1/disk-1/yb-data/tserver/data/rocksdb/table-${INDEX_ID}/tablet-${INDEX_TABLET_ID}

# ls -l  ${TABLE_ID_PATH}

./sst_dump --command=scan --file=${INDEX_ID_PATH} --output_format=decoded_regulardb 

The DocKey consists of the partition key hash, the partition key, and the clustering keys. 

The index scan query begins by accessing a single tablet for the index. The seek operation reads data from the SST file for the index tablet using the partition key hash. Because the brand column contains non-unique values, there may be multiple seeks in the related SST file. The seek operation gathers the DocKeys. The DocKeys contains the primary key values for the index table. Using this list, a second operation accesses the index table tablets. When there is more than a single index table tablet, this is a batch operation. Often, the index and table tablets reside on different nodes. This means that the batch operation requires at least one or more remote procedure calls to one or more nodes in the cluster. The goal with the batch operation is to optimize the number of seek operations for a given table tablet by using a list of primary keys that fall within the tablet hash value range. Although this query is not as costly as a sequential scan query, it does require accessing one index tablet and at least one tablet for the index table. 

## Secondary index: `Index Only Scan` query plan

The previous index required the query to access both the tablets for the index and for the table. A covering index only utilizes the index itself for the query. The term, covering index, describes a secondary index that a query uses to only access the index and not the index table. 
 
You can define one or more `include` columns to cover a query with the index alone.  There are some restrictions for defining an `include` column in a secondary index. The `include` column needs to be a column with a basic data type.

### Create the secondary index

To begin, drop the secondary index if it exists.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # Secondary index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
drop index if exists idx_products_by_category_brand_inc;"

sleep 1;

Run the following cell to create the index the uses the `include` clause.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # Secondary index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
create index if not exists idx_products_by_category_brand_inc 
  on tbl_products_by_category (brand) 
  include (description)
  ;"

Use the `describe` keyword  to verify if the index was created for the given table. 


In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # describe table to view index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  describe tbl_products_by_category;
  "

There are now two indexes for the table. The index key  for the `idx_products_by_category_brand_inc` index consists of a partition key and zero or more clustering keys. In the previous example, you can see that the index key is `brand`. The index inherits the primary key from the index table. The clustering keys are `category`, `product_name`, and `product_id`.  In addition, the `description` column is added to the index.

### View the query plan that uses the covering index
Run the following to view the plan.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # new query plan
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  explain select  category, brand, product_name, description, product_id 
  from tbl_products_by_category 
  where brand = ?;
"  

#### `Index Only Scan`
The introduction of a new covering index results in a new query plan. This query plan contains one node. The action is an `index only scan`. Key Conditions is a sub-action that specifies the use of the partition key for the index. The equality operator indicates that the internal operation is to locate a specific partition key on a single tablet. The index contains the partition key for the index, the primary key columns of the index table, and any `include` columns from the index table. In this example, the `include` column is the description column from the index table.

### Index tablet and `index only scan` query
To learn more about why this is an  `index only scan` query, take a look at the SST files for an index tablet.

#### Select a YB-TServer host
Set the host variable for one of the nodes. All three nodes in the cluster are running a Tablet Server (YB-TServer). You can comment/uncomment lines 7-9 as needed.


In [None]:
%%bash -s "$MY_HOST_IPv4_01" "$MY_HOST_IPv4_02" "$MY_HOST_IPv4_03" --out MY_HOST_IPv4
HOST_IPv4_01=$( echo "${1}" | tr -d " ")
HOST_IPv4_02=$( echo "${2}" | tr -d " ")
HOST_IPv4_03=$( echo "${3}" | tr -d " ")

# change the hosts for different tablet leaders
MY_HOST_IPv4=$HOST_IPv4_01
#MY_HOST_IPv4=$HOST_IPv4_02
#MY_HOST_IPv4=$HOST_IPv4_03

echo ${MY_HOST_IPv4}

Store the select host variable.

In [None]:
%store MY_HOST_IPv4
print(MY_HOST_IPv4)

Save the table name as a variable.

In [None]:
MY_OBJECT_NAME="idx_products_by_category_brand_inc"
%store MY_OBJECT_NAME
print(MY_OBJECT_NAME)

Grep the index_id for the index using `curl` and `jq`.

In [None]:
%%bash -s "$MY_OBJECT_NAME" "$MY_HOST_IPv4"  "$MY_DB_NAME"  "$MY_TSERVER_WEBSERVER_PORT"  --out MY_INDEX_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
DB_NAME=$( echo "${3}" | tr -d " ")
TSERVER_WEBSERVER_PORT=$( echo "${4}" | tr -d " ")


MY_URL="http://${HOST_IPv4}:${TSERVER_WEBSERVER_PORT}/metrics"

MY_INDEX_ID=`curl -s --compressed ${MY_URL} | jq -r 'limit(1;  .[] | select(.attributes.namespace_name=="'${DB_NAME}'" and .type=="tablet" and .attributes.table_name=="'${OBJECT_NAME}'") |  .attributes.table_id) '`

echo ${MY_INDEX_ID}

Store the index_id for the index.

In [None]:
%store MY_INDEX_ID
print(MY_INDEX_ID)

Get the tablet_id for the tablet leader for the select node host.

In [None]:
%%bash -s "$MY_OBJECT_NAME" "$MY_HOST_IPv4" --out MY_INDEX_TABLET_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")

MY_URL="http://${HOST_IPv4}:8200/metrics"

MY_INDEX_TABLET_ID=`curl -s --compressed ${MY_URL} | jq --raw-output ' .[] | select(.attributes.namespace_name=="ks_ybu" and .type=="tablet" and .attributes.table_name=="'$OBJECT_NAME'") | {tablet_id: .id, metrics: .metrics[] | select(.name == ("is_raft_leader") ) | select(.value == 1) } | select(.tablet_id) | {tablet_id} | .tablet_id '`

echo ${MY_INDEX_TABLET_ID}

Store the tablet_id for the tablet leader.

In [None]:
%store MY_INDEX_TABLET_ID
print(MY_INDEX_TABLET_ID)

Flush the WAL file to a SST file for the given index_id.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_HOST_IPv4" "$MY_INDEX_ID"  # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
INDEX_ID=$( echo "${3}" | tr -d " ")
cd $YB_PATH/bin

./yb-admin -init_master_addrs ${HOST_IPv4}:7100 flush_table_by_id ${INDEX_ID} 600

Dump and decode the SST file in human-readable form.

> Note:
>
> If the following does **NOT** dump the SST file, it is most likely that there are not any rows written to this tablet. To resolve this issue, you need to select a different Tablet Server host. Return back to **Select a YB-TServer host** and select a different node host.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_YB_PATH_DATA" "$MY_INDEX_ID" "$MY_INDEX_TABLET_ID" # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
YB_PATH_DATA=$( echo "${2}" | tr -d " ")
INDEX_ID=$( echo "${3}" | tr -d " ")
INDEX_TABLET_ID=$( echo "${4}" | tr -d " ")

cd $YB_PATH/bin/

INDEX_ID_PATH=${YB_PATH_DATA}/node-1/disk-1/yb-data/tserver/data/rocksdb/table-${INDEX_ID}/tablet-${INDEX_TABLET_ID}

# ls -l  ${TABLE_ID_PATH}

./sst_dump --command=scan --file=${INDEX_ID_PATH} --output_format=decoded_regulardb 

A query plan with an `index only scan` operation accesses a single tablet for the index. The seek operation reads data from the SST file for the index tablet. Depending on the number of products for a given brand, there may be multiple seeks. However, because brand is the partition key for this index, the query locates the tablet with the partition key hash almost immediately. The partition key hash for the index is the DocKey hash. The DocKey consists of the index partition key hash, the index partition key, and clustering keys. The clustering keys are the primary key columns of the index table. The DocKey maps to subdocuments. The subdocuments in this example are any include columns, and in this case, just `description`. Each subdocument contains a column value. 

The covering index efficiently processes the query without needing to access the data for the index table.

## Unique index
A unique index creates a unique constraint for the index column in the index table. This is especially useful when maintaining data integrity for rows of data that must have unique values such as identifiers, phone numbers, or emails. In this example, the unique index constraint prevents the insertion of a row with a duplicate global trade identification number.

You can disallow a column from having duplicate values by using a unique constraint as shown in the following cell.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # if exists, drop index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu

./ycqlsh -r -k $DB_NAME -e "
  drop index if exists idx_products_by_category_unq;
"

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # if not exists, create index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  create unique index if not exists idx_products_by_category_unq
  on tbl_products_by_category (gtin);
"

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # describe table to view index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  desc tbl_products_by_category;
"  

This following `insert` statement will generate an error since there already is a row with `gtin=006236226326`:

```Duplicate value disallowed by unique index idx_products_by_category_unq```

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # insert statement that throws error
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  insert into tbl_products_by_category (category, product_name, product_id, brand, price, discount, description, gtin) values ('H20','Talc 9',62569,'Yeah',29.99,8,'3 liter','006236226326');
"

### Index tablet for unique index
YugabyteDB stores a secondary unique index in the same way as it does for a table. The data structure is also similar. It is DocDB. However, the implementation is a bit different for unique indexes.

#### Select a YB-TServer host
Set the host variable for one of the nodes. All three nodes in the cluster are running a Tablet Server (YB-TServer). You can comment/uncomment lines 7-9 as needed.


In [None]:
%%bash -s "$MY_HOST_IPv4_01" "$MY_HOST_IPv4_02" "$MY_HOST_IPv4_03" --out MY_HOST_IPv4
HOST_IPv4_01=$( echo "${1}" | tr -d " ")
HOST_IPv4_02=$( echo "${2}" | tr -d " ")
HOST_IPv4_03=$( echo "${3}" | tr -d " ")

# change the hosts for different tablet leaders
MY_HOST_IPv4=$HOST_IPv4_01
#MY_HOST_IPv4=$HOST_IPv4_02
#MY_HOST_IPv4=$HOST_IPv4_03

echo ${MY_HOST_IPv4}

Store the select host variable.

In [None]:
%store MY_HOST_IPv4
print(MY_HOST_IPv4)

Save the table name as a variable.

In [None]:
MY_OBJECT_NAME="idx_products_by_category_unq"
%store MY_OBJECT_NAME
print(MY_OBJECT_NAME)

Grep the index_id for the index using `curl` and `jq`.

In [None]:
%%bash -s "$MY_OBJECT_NAME" "$MY_HOST_IPv4"  "$MY_DB_NAME"  "$MY_TSERVER_WEBSERVER_PORT"  --out MY_INDEX_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
DB_NAME=$( echo "${3}" | tr -d " ")
TSERVER_WEBSERVER_PORT=$( echo "${4}" | tr -d " ")


MY_URL="http://${HOST_IPv4}:${TSERVER_WEBSERVER_PORT}/metrics"

MY_INDEX_ID=`curl -s --compressed ${MY_URL} | jq -r 'limit(1;  .[] | select(.attributes.namespace_name=="'${DB_NAME}'" and .type=="tablet" and .attributes.table_name=="'${OBJECT_NAME}'") |  .attributes.table_id) '`

echo ${MY_INDEX_ID}

Store the index_id for the index.

In [None]:
%store MY_INDEX_ID
print(MY_INDEX_ID)

Get the tablet_id for the tablet leader for the select node host.

In [None]:
%%bash -s "$MY_OBJECT_NAME" "$MY_HOST_IPv4" --out MY_INDEX_TABLET_ID
OBJECT_NAME=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")

MY_URL="http://${HOST_IPv4}:8200/metrics"

MY_INDEX_TABLET_ID=`curl -s --compressed ${MY_URL} | jq --raw-output ' .[] | select(.attributes.namespace_name=="ks_ybu" and .type=="tablet" and .attributes.table_name=="'$OBJECT_NAME'") | {tablet_id: .id, metrics: .metrics[] | select(.name == ("is_raft_leader") ) | select(.value == 1) } | select(.tablet_id) | {tablet_id} | .tablet_id '`

echo ${MY_INDEX_TABLET_ID}

Store the tablet_id for the tablet leader.

In [None]:
%store MY_INDEX_TABLET_ID
print(MY_INDEX_TABLET_ID)

Flush the WAL file to a SST file for the given index_id.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_HOST_IPv4" "$MY_INDEX_ID"  # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
HOST_IPv4=$( echo "${2}" | tr -d " ")
INDEX_ID=$( echo "${3}" | tr -d " ")
cd $YB_PATH/bin

./yb-admin -init_master_addrs ${HOST_IPv4}:7100 flush_table_by_id ${INDEX_ID} 600

Dump and decode the SST file in human-readable form.

> Note:
>
> If the following does **NOT** dump the SST file, it is most likely that there are not any rows written to this tablet. To resolve this issue, you need to select a different Tablet Server host. Return back to **Select a YB-TServer host** and select a different node host.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_YB_PATH_DATA" "$MY_INDEX_ID" "$MY_INDEX_TABLET_ID" # Import file path of Yugabyte and DB name
YB_PATH=$( echo "${1}" | tr -d " ")
YB_PATH_DATA=$( echo "${2}" | tr -d " ")
INDEX_ID=$( echo "${3}" | tr -d " ")
INDEX_TABLET_ID=$( echo "${4}" | tr -d " ")

cd $YB_PATH/bin/

INDEX_ID_PATH=${YB_PATH_DATA}/node-1/disk-1/yb-data/tserver/data/rocksdb/table-${INDEX_ID}/tablet-${INDEX_TABLET_ID}

# ls -l  ${TABLE_ID_PATH}

./sst_dump --command=scan --file=${INDEX_ID_PATH} --output_format=decoded_regulardb 

The DocKey for the index consists solely of the hash encoded partition key and the index value itself. The subdocuments of the DocKey contain the other document values.

### View the query plan for the unique index
Run the following to view the plan.

## Partial Index

A partial index contains only the rows that satisfy the where expression in the index predicate. The predicate column in the sub-expression can be an integer type, a boolean, or text. The supported operators are equal, not equal, greater than, less than, greater than or equal to, or less than or equal to. 



The logical implication holds if all sub-expressions of the index predicate are present in the where expression of the select query.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # drop index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  drop index if exists idx_products_by_category_high_discount;
"

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # create index
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  create index idx_products_by_category_high_discount
  on tbl_products_by_category
  (discount)
  include (brand, description, price) 
  where discount > 9;
"

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"   # describe table
YB_PATH=${1}
DB_NAME=${2}  
cd $YB_PATH/bin

# DB_NAME=ks_ybu
./ycqlsh -r -k $DB_NAME -e "
  desc tbl_products_by_category;
"

TODO: 
- Query plan for partial  index... 
- logical implication
- SST dump for partial index

### Specific Indexes
An index can be created that specifies a particular value of an attribute. For instance, suppose that a popular item in our ecommerce site is searched for constantly. It may be beneficial to create an index that satisfies this query quickly and efficiently. In the following cell, there is an example of an index for a popular product, batteries.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

# Specific Index
./bin/ycqlsh --execute "
  CREATE INDEX idx_batteries_desc 
  ON ks_ybu.tbl_products_by_category (description) 
  INCLUDE (product_name) 
  WHERE description='Batteries';
"
#  Index Only Scan
./bin/ycqlsh --execute "
  EXPLAIN SELECT 
    description, 
    product_name, 
    product_id, 
    category, 
    price 
  FROM ks_ybu.tbl_products_by_category 
  WHERE description='Batteries';
"  

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

#  Seq Scan
./bin/ycqlsh --execute "
  EXPLAIN SELECT description, product_name, product_id, category, price FROM ks_ybu.tbl_products_by_category 
  WHERE description='Hotdogs';
"  

>**Important:** Note that in the previous cell, an index only scan will occur only if the WHERE clause predicate returns the same expression as originally stated in the index creation statement.

Run the following statement to verify that a query to a different expression will not use the index to satisfy the query.

In [None]:
%%bash -s "$MY_YB_PATH"  
YB_PATH=${1}
cd $YB_PATH

#  Seq Scan
./bin/ycqlsh --execute "
  EXPLAIN SELECT description, product_name, product_id, category, price FROM ks_ybu.tbl_products_by_category 
  WHERE description=?;
"  

In the preceding cell, note the syntax necessary to insert a JSON object into the table. Outside the curly brackets are single quotes, where inside the brackets there are double quotes.

If the row is still visible, wait a few more seconds to run the preceding cell. This will verify that the row has expired as expected.

---
# All done!
In this lab, you completed the following:

- Setup
  - Created the `ks_ybu` database with `ycqlsh`
  - Created tables and loaded data using DDL and DML scripts