# Lab Requirements and Setup

This lab consists of several Jupyter notebooks and runs in Gitpod using VS Code.  Follow the instructions for requirements and setup.

## About Jupyter notebooks
A notebook consists of one or more cells. In VS Code, notebooks cells are editable. 

There are two types of cells: markdown and code. This is a markdown cell.

You run a code cell by simply selecting the play icon in the cell's left gutter. For code cells, you can modify the code for execution. Certain labs contain challenges or experiments that require you to do just that - modify a code cell and re-run it!

### Requirements
Here are the requirements for this lab:
- Launch using a gitpod workspace
- Run a three node, YugabyteDB cluster using `yb-ctl`

> Note
>  
> Although a three node cluster is up and running, Gitpod does not support visiting loopback addresses over a web ui, even if exposed on a different port.
> 127.0.0.1 is the only web user interfaces. To see all available ports in Gitpod, in the terminal, run `gp ports list`.

#### Notebook keyboard shortcuts
The Jupyter extension for Gitpod supports the following keyboard shortcuts:
| Keystroke | Description |
|--|--|
| ESC | Change the cell mode |
| A | Add a cell above |
| B | Add a cell below |
| J or down arrow key |  Change a cell to below | 
| K or up arrow key | Change a cell to above | 
| Ctrl+Enter | Run the currently selected cell |
| Shift+Enter | Run the currently selected cell and insert a new cell immediately below (focus moves to new cell) |
| Alt+Enter | Run the currently selected cell and insert a new cell immediately below (focus remains on current cell) |
| dd | Delete a selected cell |
| z | Undo the last change | 
| M | switch the cell type to Markdown | 
| Y | switch the cell type to code |
| L | Enable/Disable line numbers |
```


## Setup steps
Here are the steps to setup this lab:
- Install missing dependencies and restart the notebook
- Create the notebook variables
- Create the `db_ybu` database

### Install missing dependencies and restart the notebook
Run the following cell to ensure that the notebook dependencies are available to the notebook. 

In [6]:
! pip install cassandra-driver



> Important!
> 
> Restart the Notebook.
> 
> Do NOT skip this step.
> 
> After restarting the notebook, you can continue running notebook cells below, at **Create the notebook variables**.


### Create the notebook variables 

> IMPORTANT!
> 
> Do NOT skip running this cell. 
> 

The following Python cell creates and stores variables that all the notebooks in this lab will use. You can view these variables in the Jupyter tab.

- To run the script, select Execute Cell (Play Arrow) in the left gutter of the cell.
- Verify the accuracy of the output values

### Create the `db_ybu` database with `ycqlsh`
Run the following cell to connect to the local host using `ycqlsh`, create the `db_ybu` database, and then list the databases.

In [7]:
import cassandra

In [8]:
from cassandra.cluster import Cluster
try:
  cluster = Cluster(['127.0.0.1'], port=9042)
  session = cluster.connect()
except Exception as e:
  print(e)

In [9]:
try:
  session.execute("""
  CREATE KEYSPACE IF NOT EXISTS ycql_demos_fun
  WITH REPLICATION = 
  { 'class': 'SimpleStrategy', 'replication_factor': 1 }"""
  )

except Exception as e:
  print(e)

In [10]:
## Adding the keyspace
try: 
  session.set_keyspace('ycql_demos_fun')

except Exception as e:
  print(e)

In [55]:
try:
  session.execute("""
  USE ycql_demos_fun;
  """)

except Exception as e:
  print(e)

In [11]:
# Create Table

query = "CREATE TABLE IF NOT EXISTS tbl_wishlists_by_user "
query = query + "(user_id INT, wishlist_id UUID, name TEXT, is_public BOOLEAN, PRIMARY KEY ((user_id), name));"
try:
  session.execute(query)
except Exception as e:
  print(e)

In [4]:
query = "DROP TABLE users "
try:
  session.execute(query)
  
except Exception as e:
  print(e)

name 'session' is not defined


In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"  # create database
YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

# drop and create
./bin/ycqlsh -d yugabyte -c "drop database if exists "${DB_NAME}";"  
./bin/ycqlsh -d yugabyte -c "create database "${DB_NAME}";" 

# list dbs
./bin/ysqlsh -d yugabyte -c "\l"

In [13]:
query = DESCRIBE TABLE tbl_wishlists_by_user

try: 
  session.execute(query)
except Exception as e:
  print(e)

SyntaxError: invalid syntax (2952544888.py, line 1)

In [15]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"  # \d tbl_countriees

YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH


# ./bin/ysqlsh -d ${DB_NAME} -c "\d tbl_countries"
#./bin/ysqlsh -d ${DBNAME} -c "\d tbl_countries"
#./bin/ysqlsh -d ${DBNAME} -c "\d tbl_states"


bash: line 5: cd: $MY_YB_PATH: No such file or directory


CalledProcessError: Command 'b'\nYB_PATH=${1}\nDB_NAME=${2}\n\ncd $YB_PATH\n\n\n# ./bin/ysqlsh -d ${DB_NAME} -c "\\d tbl_countries"\n#./bin/ysqlsh -d ${DBNAME} -c "\\d tbl_countries"\n#./bin/ysqlsh -d ${DBNAME} -c "\\d tbl_states"\n'' returned non-zero exit status 1.

In [14]:
# Add Row

query = "INSERT INTO tbl_wishlists_by_user(user_id, wishlist_id, name, is_public) VALUES (23487, 2a70494e-6b68-4739-b3e0-ff06aa0a2d67, 'Grocery', true) USING TTL 5;"

try:
  session.execute(query)
except Exception as e:
  print(e)

Error from server: code=2200 [Invalid query] message="Datatype Mismatch
INSERT INTO tbl_wishlists_by_user(user_id, wishlist_id, name, is_public) VALUES (23487, 2a70494e-6b68-4739-b3e0-ff06aa0a2d67, 'Grocery', true) USING TTL 5;
                                                                                 ^^^^^
 (ql error -201)"


In [52]:
# ALTER Column Data Type

query = "ALTER TABLE tbl_wishlists_by_user ALTER name TYPE varchar;"

try:
  session.execute(query)
except Exception as e:
  print(e)

<Error from server: code=2000 [Syntax error in CQL query] message="Feature Not Supported
ALTER TABLE tbl_wishlists_by_user ALTER name TYPE varchar;
                                                  ^^^^^^^
 (ql error -14)">


In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"  # create database
YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

# drop and create
./bin/ysqlsh -d yugabyte -c "drop database if exists "${DB_NAME}";"  
./bin/ysqlsh -d yugabyte -c "create database "${DB_NAME}";" 

# list dbs
./bin/ysqlsh -d yugabyte -c "\l"

### Create Utils
For illustration purposes, this lab requires specific user defined functions and objects for the following:
- Functions for converting hash range strings to integers
- YB-TServer metrics

#### Util Functions
Runt the following cell to create the utility functions.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_UTILS_FOLDER" "$MY_UTIL_FUNCTIONS_FILE"  # util functions
YB_PATH=${1}
DB_NAME=${2}
UTILS_FOLDER=${3}
UTIL_FUNCTIONS_FILE=${4}


#ls $UTIL_FOLDER
UTIL_FUNCTIONS_FILE_PATH=${UTILS_FOLDER}/${UTIL_FUNCTIONS_FILE}

cd $YB_PATH

# Functions file
./bin/ysqlsh -q -d ${DB_NAME} -f ${UTIL_FUNCTIONS_FILE_PATH} 
sleep 1;

# Describe functions
./bin/ysqlsh -d ${DB_NAME} -c "\df"

#### Util Metrics
Run the following cell to create the utility functions for the metrics report.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_UTILS_FOLDER" "$MY_UTIL_YBTSERVER_METRICS_FILE"  # utl metrics
YB_PATH=${1}
DB_NAME=${2}
UTILS_FOLDER=${3}
UTIL_YBTSERVER_METRICS_FILE=${4}

#ls $UTIL_FOLDER

UTIL_YBTSERVER_METRICS_FILE=${UTILS_FOLDER}/${UTIL_YBTSERVER_METRICS_FILE}

cd $YB_PATH

# Metrics file
./bin/ysqlsh -d ${DB_NAME} -f ${UTIL_YBTSERVER_METRICS_FILE}
# >&/dev/null

# Describe relations
./bin/ysqlsh -d ${DB_NAME} -c "\df"

### Create tables and loaded data using DDL and DML scripts
In this section of the notebook, you will:
- Create tables with a DDL script
- Load data with a DML script
- Verify the creation of tables and data
- View the DDL for tbl_countries

##### Create tables, load data, and review relations
Run the following cell to execute the DDL and DML scripts using `ysqlsh`.

In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_DATA_FOLDER" "$MY_DATA_DDL_FILE" "$MY_DATA_DML_FILE"   # World Cities
YB_PATH=${1}
DB_NAME=${2}
DATA_FOLDER=${3}
DATA_DDL_FILE=${4}
DATA_DML_FILE=${5}

#ls $DATA_FOLDER

WORLD_DDL_PATH=${DATA_FOLDER}/${DATA_DDL_FILE}
WORLD_DML_PATH=${DATA_FOLDER}/${DATA_DML_FILE}

cd $YB_PATH

# DDL file
./bin/ysqlsh -d ${DB_NAME} -f ${WORLD_DDL_PATH} >&/dev/null
sleep 1;

# DML file
./bin/ysqlsh -d ${DB_NAME} -f ${WORLD_DML_PATH} >&/dev/null
sleep 1;

# Describe relations
./bin/ysqlsh -d ${DB_NAME} -c "\d"

##### View DDL for tbl_countries
Run the following cell using `ysqlsh` to view a table definition.

> Note
> 
> SQL magic does not support PostgreSQL `psql` commands. In order to execute `psql` commands, the notebook uses bash and `ysqlsh`.



In [None]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"  # \d tbl_countriees

YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH


./bin/ysqlsh -d ${DB_NAME} -c "\d tbl_countries"
#./bin/ysqlsh -d ${DBNAME} -c "\d tbl_countries"
#./bin/ysqlsh -d ${DBNAME} -c "\d tbl_states"


## Connect to YugabyteDB using the PostgreSQL Driver for Python
The following cells requires:
- Python 3.8+ and psycopg2

In [None]:
# Connect to db_ybu
# Inspiration from https://medium.com/analytics-vidhya/postgresql-integration-with-jupyter-notebook-deb97579a38d
import psycopg2
import sqlalchemy as alc
from sqlalchemy import create_engine

# env_var.env
db_host=MY_HOST_IPv4_01
db_name=MY_DB_NAME

connection_str='postgresql+psycopg2://yugabyte@'+db_host+':5433/'+db_name

# engine = create_engine(connection_str)

#### Load SQL magic extension
>IMPORTANT!
>
> To use SQL magic, you must run the following cell that loads the notebook extension.

In [None]:
%reload_ext sql
# creates connection for sql magic
%sql {connection_str}

#### Show table row counts
Run the cell below to view the row counts for the tables.

In [None]:
%%sql /* row counts */

select '' _
    , v1.name
    , v1.counts
from (
    select 'tbl_cities' as name, count(*) as counts from tbl_cities
    union 
    select 'tbl_cities_name_alt_null' as name, count(*) 
    from tbl_cities
    where 1=1
    and city_name_alt IS NULL
    union
    select 'tbl_states' as name,count(*) from tbl_states
    union 
    select 'tbl_countries' as name, count(*) from tbl_countries
    union
    select 'tbl_country_name_alt_null' as name, count(*) 
    from tbl_countries 
    where 1=1
    and country_name_alt IS NULL
    ) v1
order by v1.counts desc;

---
# All done!
In this lab, you completed the following:

- Setup
  - Created the `db_ybu` database with `ysqlsh`
  - Created utils
  - Created tables and loaded data using DDL and DML scripts
  - Connected to the database using a PostgreSQL driver for Python

Next, run the following cell to open `02_Demystifying_table_sharding_tablets_and_data_distribution.ipynb`.

In [None]:
%%bash
gp open 02_Demystifying_table_sharding_tablets_and_data_distribution.ipynb

In [None]:
from cassandra.cluster import Cluster
from cassandra.policies import WhiteListRoundRobinPolicy

lbp = WhiteListRoundRobinPolicy(['70.191.42.158'])
cluster = Cluster( contact_points=['70.191.42.158'], load_balancing_policy=lbp )
session = cluster.connect()

  cluster = Cluster( contact_points=['70.191.42.158'], load_balancing_policy=lbp )


NoHostAvailable: ('Unable to connect to any servers', {'70.191.42.158:9042': OSError(None, "Tried connecting to [('70.191.42.158', 9042)]. Last error: timed out")})