# Lab Requirements and Setup

This lab consists of several Jupyter notebooks. Each notebook consists of one or more notebook cells. There are specific requirements for running the notebook cells. Follow the instructions for requirements and setup.

### Requirements
Here are the requirements for this lab:
- Run in a gitpod workspace
- yb-ctl has a three node cluster up and running, however Gitpod does not support all loopback addresses, so only 127.0.0.1 has available web user interfaces (gp ports list in the terminal)

#### Notebook keyboard shortcuts
| Keystroke | Description |
|--|--|
| ESC | Change the cell mode |
| A | Add a cell above |
| B | Add a cell below |
| J or down arrow key |  Change a cell to below | 
| K or up arrow key | Change a cell to above | 
| Ctrl+Enter | Run the currently selected cell |
| Shift+Enter | Run the currently selected cell and insert a new cell immediately below (focus moves to new cell) |
| Alt+Enter | Run the currently selected cell and insert a new cell immediately below (focus remains on current cell) |
| dd | Delete a selected cell |
| z | Undo the last change | 
| M | switch the cell type to Markdown | 
| Y | switch the cell type to code |
| L | Enable/Disable line numbers |
```


## Setup steps
Here are the steps to setup this lab:
- Create the notebook variables
- Create the `db_ybu` database

### Create the notebook variables 

> IMPORTANT!
> 
> Do NOT skip editing and running this cell. 
> 


The following Python cell creates and stores variables that all the notebooks will use. You can view these variables in the Jupyter tab.

- To run the script, select Execute Cell (Play Arrow in the cell side bar).
- Verify the accuracy of the output values

In [None]:
# Env variables for Notebook
import os

# read env_vars.env
env_vars = !cat env_vars.env
for var in env_vars:
    key, value = var.split('=')
    os.environ[key] = value
 
# env_vars defines the following
MY_DB_NAME=os.environ.get('MY_DB_NAME')
MY_YB_PATH=os.environ.get('MY_YB_PATH')
MY_HOST_IPv4_01=os.environ.get('MY_HOST_IPv4_01')
MY_HOST_IPv4_02=os.environ.get('MY_HOST_IPv4_02')
MY_HOST_IPv4_03=os.environ.get('MY_HOST_IPv4_03')
MY_TSERVER_WEBSERVER_PORT=os.environ.get('MY_TSERVER_WEBSERVER_PORT')
MY_DATA_DDL_FILE=os.environ.get("MY_DATA_DDL_FILE")
MY_DATA_DML_FILE=os.environ.get("MY_DATA_DML_FILE")
MY_UTIL_FUNCTIONS_FILE=os.environ.get("MY_UTIL_FUNCTIONS_FILE")
MY_UTIL_YBTSERVER_METRICS_FILE=os.environ.get("MY_UTIL_YBTSERVER_METRICS_FILE")

# Current directory of project and related child folders
MY_NOTEBOOK_DIR=os.getcwd()
MY_NOTEBOOK_DATA_FOLDER=MY_NOTEBOOK_DIR +'/data'
MY_NOTEBOOK_UTILS_FOLDER=MY_NOTEBOOK_DIR + '/utils'

# Store the note book values for other notebooks to use
%store MY_DB_NAME
%store MY_YB_PATH
%store MY_HOST_IPv4_01
%store MY_HOST_IPv4_02
%store MY_HOST_IPv4_03
%store MY_NOTEBOOK_DIR
%store MY_TSERVER_WEBSERVER_PORT
%store MY_NOTEBOOK_DATA_FOLDER
%store MY_NOTEBOOK_UTILS_FOLDER
%store MY_DATA_DDL_FILE
%store MY_DATA_DML_FILE
%store MY_UTIL_FUNCTIONS_FILE
%store MY_UTIL_YBTSERVER_METRICS_FILE

### Create the `db_ybu` database with `ysqlsh`
- With `ysqlsh`, connect to the local host
- Create the `db_ybu` database
- List the databases

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"  # create database
YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

# drop and create
./bin/ysqlsh -d yugabyte -c "drop database if exists "${DB_NAME}";"  
./bin/ysqlsh -d yugabyte -c "create database "${DB_NAME}";" 

# list dbs
./bin/ysqlsh -d yugabyte -c "\l"

### Create Utils
For illustration purposes, this lab requires specific user defined functions and objects for the following:
- Functions for converting hash range strings to integers
- YB-TServer metrics

#### Util Functions

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_UTILS_FOLDER" "$MY_UTIL_FUNCTIONS_FILE"  # util functions
YB_PATH=${1}
DB_NAME=${2}
UTILS_FOLDER=${3}
UTIL_FUNCTIONS_FILE=${4}


#ls $UTIL_FOLDER
UTIL_FUNCTIONS_FILE_PATH=${UTILS_FOLDER}/${UTIL_FUNCTIONS_FILE}

cd $YB_PATH

# Functions file
./bin/ysqlsh -d ${DB_NAME} -f ${UTIL_FUNCTIONS_FILE_PATH} 
sleep 1;

# Describe functions
./bin/ysqlsh -d ${DB_NAME} -c "\df"

#### Util Metrics

In [3]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_UTILS_FOLDER" "$MY_UTIL_YBTSERVER_METRICS_FILE"  # utl metrics
YB_PATH=${1}
DB_NAME=${2}
UTILS_FOLDER=${3}
UTIL_YBTSERVER_METRICS_FILE=${4}

#ls $UTIL_FOLDER

UTIL_YBTSERVER_METRICS_FILE=${UTILS_FOLDER}/${UTIL_YBTSERVER_METRICS_FILE}

cd $YB_PATH

# Metrics file
./bin/ysqlsh -d ${DB_NAME} -f ${UTIL_YBTSERVER_METRICS_FILE} 
sleep 1;

# Describe relations
./bin/ysqlsh -d ${DB_NAME} -c "\d"

CREATE EXTENSION
DROP TABLE
CREATE TABLE
DROP FUNCTION
CREATE FUNCTION
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
DROP FUNCTION
CREATE FUNCTION
                               List of relations
 Schema |                        Name                        | Type  |  Owner   
--------+----------------------------------------------------+-------+----------
 public | tbl_cities                                         | table | yugabyte
 public | tbl_countries                                      | table | yugabyte
 public | tbl_no_pk                                          | table | yugabyte
 public | tbl_states                                         | table | yugabyte
 public | tbl_yb_tserver_metrics_snapshots                   | table | yugabyte
 public | vw_yb_tserver_metrics_last                         | view  | yugabyte
 public | vw_yb_tserver_metrics_report                       | view  | yugabyte
 public | vw_yb_tserver_metrics_snap_and_show_tablet_loa

ysqlsh:/Users/seth/Documents/GitHub/ybu/101_CodeSnippets/Module_Anatomy_of_an_index/utils/util_ybtserver_metrics.sql:3: NOTICE:  extension "tablefunc" already exists, skipping
ysqlsh:/Users/seth/Documents/GitHub/ybu/101_CodeSnippets/Module_Anatomy_of_an_index/utils/util_ybtserver_metrics.sql:6: NOTICE:  drop cascades to 5 other objects
DETAIL:  drop cascades to view vw_yb_tserver_metrics_last
drop cascades to view vw_yb_tserver_metrics_report
drop cascades to view vw_yb_tserver_metrics_snap_and_show_tablet_load
drop cascades to view vw_yb_tserver_metrics_snapshot_tablets
drop cascades to view vw_yb_tserver_metrics_snapshot_tablets_metrics


### Create tables and loaded data using DDL and DML scripts
- Create tables with a DDL script
- Load data with a DML script
- Verify the creation of tables and data
- View the DDL for tbl_countries

##### Create tables, load data, and review relations

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_DATA_FOLDER" "$MY_DATA_DDL_FILE" "$MY_DATA_DML_FILE"   # World Cities
YB_PATH=${1}
DB_NAME=${2}
DATA_FOLDER=${3}
DATA_DDL_FILE=${4}
DATA_DML_FILE=${5}

#ls $DATA_FOLDER

WORLD_DDL_PATH=${DATA_FOLDER}/${DATA_DDL_FILE}
WORLD_DML_PATH=${DATA_FOLDER}/${DATA_DML_FILE}

cd $YB_PATH

# DDL file
./bin/ysqlsh -d ${DB_NAME} -f ${WORLD_DDL_PATH} 
sleep 1;

# DML file
./bin/ysqlsh -d ${DB_NAME} -f ${WORLD_DML_PATH} 
sleep 1;

# Describe relations
./bin/ysqlsh -d ${DB_NAME} -c "\d"

##### View DDL for tbl_countries

- Require bash, as the magic sql cell won't do it
- Need to run \d command

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"  # \d tbl_countriees

YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH


./bin/ysqlsh -d ${DB_NAME} -c "\d tbl_countries"
#./bin/ysqlsh -d ${DBNAME} -c "\d tbl_countries"
#./bin/ysqlsh -d ${DBNAME} -c "\d tbl_states"


## Connect to YugabyteDB using the PostgreSQL Driver for Python
The following cell requires:
- Python 3.7.9 and psycopg2


In [None]:
# Connect to db_ybu
# Inspiration from https://medium.com/analytics-vidhya/postgresql-integration-with-jupyter-notebook-deb97579a38d
import psycopg2
import sqlalchemy as alc
from sqlalchemy import create_engine

# env_var.env
db_host=MY_HOST_IPv4_01
db_name=MY_DB_NAME

connection_str='postgresql+psycopg2://yugabyte@'+db_host+':5433/'+db_name

engine = create_engine(connection_str)

#### Load SQL magic extension
>IMPORTANT!
>
> To use SQL magic, you must run the following cell that loads the notebook extension.

In [None]:
%reload_ext sql
# Example format
%sql {connection_str}

#### Show table row counts

In [None]:
%%sql /* row counts */

select '' _
    , v1.name
    , v1.counts
from (
    select 'tbl_cities' as name, count(*) as counts from tbl_cities
    union 
    select 'tbl_cities_name_alt_null' as name, count(*) 
    from tbl_cities
    where 1=1
    and city_name_alt IS NULL
    union
    select 'tbl_states' as name,count(*) from tbl_states
    union 
    select 'tbl_countries' as name, count(*) from tbl_countries
    union
    select 'tbl_country_name_alt_null' as name, count(*) 
    from tbl_countries 
    where 1=1
    and country_name_alt IS NULL
    ) v1
order by v1.counts desc;

---
# That's It!
In this lab, you completed the following:

- Setup
  - Created the `db_ybu` database with `ysqlsh`
  - Created utils
  - Created tables and loaded data using DDL and DML scripts
  - Connected to the database using a PostgreSQL driver for Python
