## Setup steps
Here are the steps to setup this lab:
- Install missing dependencies and restart the notebook
- Create the notebook variables
- Create Loop back IP addresses
- Spin up cluster locally
- Create the `db_ybu` database

### Install missing dependencies and restart the notebook
Run the following cell to ensure that the notebook dependencies are available to the notebook. 

In [None]:
!pip install ipython-sql
!pip3 install psycopg2-binary==2.8.6
!pip install sqlalchemy

### Create the notebook variables 

> IMPORTANT!
> 
> Do NOT skip running this cell. 
> 

The following Python cell creates and stores variables that all the notebooks in this lab will use. You can view these variables in the Jupyter tab.

- To run the script, select Execute Cell (Play Arrow) in the left gutter of the cell.
- Verify the accuracy of the output values

In [1]:
# connect use Python 3.7.9+
import psycopg2
import sqlalchemy as alc
from sqlalchemy import create_engine

# Inspiration from https://medium.com/analytics-vidhya/postgresql-integration-with-jupyter-notebook-deb97579a38d
# Use %store -r to read 01_Lab_Requirements_Setup variables

%store -r MY_DB_NAME
%store -r MY_YB_PATH
%store -r MY_HOST_IPv4_01
%store -r MY_HOST_IPv4_02
%store -r MY_HOST_IPv4_03
%store -r MY_GITPOD_WORKSPACE_URL

%store -r MY_NOTEBOOK_DATA_FOLDER
%store -r MY_NOTEBOOK_UTILS_FOLDER

%store -r MY_DATA_DDL_FILE_1
%store -r MY_DATA_DML_FILE_1
%store -r MY_DATA_DDL_FILE_2
%store -r MY_DATA_DML_FILE_2
%store -r MY_DATA_DDL_FILE_3
%store -r MY_DATA_DML_FILE_3

db_host=MY_HOST_IPv4_01
db_name=MY_DB_NAME


connection_str='postgresql+psycopg2://yugabyte@'+db_host+':5433/'+db_name

#### Connect to YugabyteDB using the PostgreSQL Driver for Python
The following cells requires:
- Python 3.8+ and psycopg2

##### Create tables and loaded data using DDL and DML scripts
In this section of the notebook, you will:
- Create tables with a DDL script
- Load data with a DML script
- Verify the creation of tables and data
- View the DDL for `order_changes`

##### Create tables, load data, and review relations
Run the following cell to execute the DDL and DML scripts using `ysqlsh`.

In [2]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME" "$MY_NOTEBOOK_DATA_FOLDER" "$MY_DATA_DDL_FILE_3" "$MY_DATA_DML_FILE_3"   # order_changes
YB_PATH=${1}
DB_NAME=${2}
DATA_FOLDER=${3}
DATA_DDL_FILE=${4}
DATA_DML_FILE=${5}

ORDER_DDL_PATH=${DATA_FOLDER}/${DATA_DDL_FILE}
ORDER_DML_PATH=${DATA_FOLDER}/${DATA_DML_FILE}
echo $ORDER_DDL_PATH
echo $ORDER_DML_PATH

cd $YB_PATH

# DDL file
./bin/ysqlsh -d ${DB_NAME} -f ${ORDER_DDL_PATH} >&/dev/null
sleep 1;

# DML file
./bin/ysqlsh -d ${DB_NAME} -f ${ORDER_DML_PATH} >&/dev/null
sleep 1;

# Describe relations
./bin/ysqlsh -d ${DB_NAME} -c "\d"

/Users/seth/Documents/GitHub/YugabyteDB-University/YSQL-LP/data/orders_ddl.sql
/Users/seth/Documents/GitHub/YugabyteDB-University/YSQL-LP/data/orders_dml.sql
                                    List of relations
 Schema |                      Name                       |       Type        |  Owner   
--------+-------------------------------------------------+-------------------+----------
 public | mvw_report_sal_per_dept                         | materialized view | yugabyte
 public | order_changes                                   | table             | yugabyte
 public | order_changes_2022_02                           | table             | yugabyte
 public | order_changes_2022_03                           | table             | yugabyte
 public | order_changes_default                           | table             | yugabyte
 public | tbl_cities                                      | table             | yugabyte
 public | tbl_countries                                   | table         

##### View DDL for Table partitions
Run the following cell using `ysqlsh` to view a table definition.

> Note
> 
> SQL magic does not support PostgreSQL `psql` commands. In order to execute `psql` commands, the notebook uses bash and `ysqlsh`.



In [4]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"  

YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

#./bin/ysqlsh -d ${DB_NAME} -c "\dt"
.#/bin/ysqlsh -d ${DB_NAME} -c "\d order_changes"
./bin/ysqlsh -d ${DB_NAME} -c "\d order_changes_2022_02"
# ./bin/ysqlsh -d ${DB_NAME} -c "\d order_changes_2022_03"
# ./bin/ysqlsh -d ${DB_NAME} -c "\d order_changes_default"

          Table "public.order_changes_2022_02"
   Column    |  Type   | Collation | Nullable | Default 
-------------+---------+-----------+----------+---------
 user_id     | integer |           | not null | 
 account_id  | integer |           | not null | 
 change_date | date    |           | not null | 
 description | text    |           |          | 
Partition of: order_changes FOR VALUES FROM ('2022-02-01') TO ('2022-03-01')
Indexes:
    "order_changes_2022_02_pkey" PRIMARY KEY, lsm (user_id HASH, account_id ASC, change_date ASC)



bash: line 8: .#/bin/ysqlsh: No such file or directory


##### Set Autocommit

Need to assign autocommit to true in order for DML transaction to occur without a transaction block error for the tablespace creation.

In [5]:
%config SqlMagic.autocommit=True

In [6]:
# Connect to db_ybu
# Inspiration from https://medium.com/analytics-vidhya/postgresql-integration-with-jupyter-notebook-deb97579a38d
import psycopg2
import sqlalchemy as alc
from sqlalchemy import create_engine

# env_var.env
db_host=MY_HOST_IPv4_01
db_name=MY_DB_NAME

connection_str='postgresql+psycopg2://yugabyte@'+db_host+':5433/'+db_name

# engine = create_engine(connection_str)

#### Load SQL magic extension
>IMPORTANT!
>
> To use SQL magic, you must run the following cell that loads the notebook extension.

In [7]:
%reload_ext sql
# creates connection for sql magic
%sql {connection_str}

#### Show table row counts
Run the cell below to view the row counts for the tables.

A SQL update can compute the new value and return it without the need to query again. The following adds 100 to the salaries of all employees who are not managers and show the new value

In [9]:
%%sql

-- SELECT * FROM order_changes
SELECT * FROM order_changes_2022_03 
-- SELECT * FROM order_changes_2022_02 
-- SELECT * FROM order_changes_default

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
1 rows affected.


user_id,account_id,change_date,description
2,2002,2022-03-05,replace shower head


#### Indexed Relations

By creating an index on the partitioned or parent table, a matching index is also created on any partitions that exist now or in the future. An index or unique constraint declared on a partitioned table is “virtual” in the same way that the partitioned table is: the actual data is in child indexes on the individual partition tables.

In [10]:
%%sql

CREATE INDEX ON order_changes (change_date)

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
Done.


[]

#### Partition Maintenance

It is common to have a dynamic set of partitions that define a table. Partitions are frequently dropped and created to dispose of old information and add new info. 

Partitions simplify the removal of old data with the following command:

In [None]:
%%sql

DROP TABLE order_changes_2022_02;

Alternatively, a partition can be removed from the partitioned table, but still retain access to the data. This is incase a report or aggregation of the data is necessary.

This is done by detaching the partition from the partitioned table with the following statement:

## Tablespaces and Geo Row Partitioning

In [11]:
%%sql

CREATE TABLE transactions (
  user_id       INT NOT NULL,
  account_id	  INT NOT NULL,
  geo_partition TEXT,
  account_type  TEXT NOT NULL,
  amount        NUMERIC NOT NULL,
  created_at    TIMESTAMP DEFAULT NOW()
) PARTITION BY LIST (geo_partition)


 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
Done.


[]

## Tablespaces

Tablespaces are assigned repositories with assigned locations. In our example, each tablespace is assigned to a certain node in a particular cloud, region, and zone.

In [12]:
%%sql
CREATE TABLESPACE tblspace_us WITH (replica_placement='{"num_replicas": 1, "placement_blocks": [{"cloud": "cloud1", "region": "region1", "zone": "zone1", "min_num_replicas": 1}]}'
)

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
Done.


[]

In [13]:
%%sql

CREATE TABLESPACE  tblspace_eu WITH (replica_placement='{"num_replicas": 1, "placement_blocks": [{"cloud": "cloud2", "region": "region2", "zone": "zone2", "min_num_replicas": 1}]}'
)

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
Done.


[]

In [14]:
%%sql

CREATE TABLESPACE tblspace_ap WITH (replica_placement='{"num_replicas": 1, "placement_blocks": [{"cloud": "cloud3", "region": "region3", "zone": "zone3", "min_num_replicas": 1}]}'
)

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
Done.


[]

In [15]:
%%bash -s "$MY_YB_PATH" "$MY_DB_NAME"  

YB_PATH=${1}
DB_NAME=${2}

cd $YB_PATH

./bin/ysqlsh -d ${DB_NAME} -c "\db+"

                                                                                                                  List of tablespaces
    Name     |  Owner   | Location | Access privileges |                                                                               Options                                                                               |   Size    | Description 
-------------+----------+----------+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-------------
 pg_default  | postgres |          |                   |                                                                                                                                                                     | 608 bytes | 
 pg_global   | postgres |          |                   |                                                                              

#### Create Table Partitions

The partitions will determine the which rows are included with the value from `geo_location`. Since the partitioned table has the Partition property by LIST, not RANGE, only rows that contain the LIST value will be assigned to a partition.

In [16]:
%%sql /* Table Reads */

CREATE TABLE transactions_us PARTITION OF transactions
    (user_id, account_id, geo_partition, account_type, amount, created_at,
    PRIMARY KEY (user_id HASH, account_id, geo_partition))
  FOR VALUES IN ('US') TABLESPACE tblspace_us

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
Done.


[]

In [17]:
%%sql /* Table Reads */

CREATE TABLE transactions_eu PARTITION OF transactions
    (user_id, account_id, geo_partition, account_type, amount, created_at,
    PRIMARY KEY (user_id HASH, account_id, geo_partition))
  FOR VALUES IN ('EU') TABLESPACE tblspace_eu

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
Done.


[]

In [18]:
%%sql /* Table Reads */

CREATE TABLE transactions_ap PARTITION OF transactions
    (user_id, account_id, geo_partition, account_type, amount, created_at,
    PRIMARY KEY (user_id HASH, account_id, geo_partition))
  FOR VALUES IN ('India') TABLESPACE tblspace_ap

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
Done.


[]

#### Add records to the transactions table

Note that a new record is filtered by the `geo_partition` attribute to a specific table partition. Since the table partition's tablespace is assigned to a geographic location, offering data residency that complies with regulatory requirements imposed on data based on the data laws that govern a country or region in which the data resides. For example think about the user data stored at Tik Tok and the regulatory standards that prevents that data being stored outside this country's boundaries.

Data localization also has a role in performance as well. Keeping the data source closer to the client will reduce network latency, improving the response time of the database. Having an understanding of what data is needed where coupled with the ability to place data in particular location is an important tool in distributed sql systems.

In [19]:
%%sql

INSERT INTO transactions  VALUES (1, 100, 'US', 'customer', 100, now());
INSERT INTO transactions  VALUES (2, 200, 'EU', 'customer', 200, now());
INSERT INTO transactions  VALUES (3, 300, 'India', 'customer', 300, now());

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
1 rows affected.
1 rows affected.
1 rows affected.


[]

## Validate SQL operations

Review the rows to compare how the `geo_location` attribute determines the table partition. In this case, we set the location of the tablespace, then assigned a partition with a specific value for the `geo_location` attribute to determine which row will be assigned to the tablespace. This is geo row partitioning.

In [20]:
%%sql

SELECT * FROM transactions;


 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
3 rows affected.


user_id,account_id,geo_partition,account_type,amount,created_at
2,200,EU,customer,200,2022-09-14 11:54:33.818854
3,300,India,customer,300,2022-09-14 11:54:33.829403
1,100,US,customer,100,2022-09-14 11:54:33.798656


#### Object Identifiers

In this example, we are using Object Identifiers to locate the partition table that is associated with the row of data.

In [21]:
%%sql
SELECT * FROM transactions_us;

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
1 rows affected.


user_id,account_id,geo_partition,account_type,amount,created_at
1,100,US,customer,100,2022-09-14 11:54:33.798656


In [22]:
%%sql
SELECT * FROM transactions_eu;

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
1 rows affected.


user_id,account_id,geo_partition,account_type,amount,created_at
2,200,EU,customer,200,2022-09-14 11:54:33.818854


In [23]:
%%sql
SELECT * FROM transactions_ap;


 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
1 rows affected.


user_id,account_id,geo_partition,account_type,amount,created_at
3,300,India,customer,300,2022-09-14 11:54:33.829403


In [24]:
%%sql
SELECT tableoid::regclass, user_id, account_id, geo_partition FROM transactions;


 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
3 rows affected.


tableoid,user_id,account_id,geo_partition
transactions_eu,2,200,EU
transactions_ap,3,300,India
transactions_us,1,100,US


In [25]:
%%sql
SELECT tableoid::regclass, user_id, account_id, geo_partition  FROM transactions_us;


 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
1 rows affected.


tableoid,user_id,account_id,geo_partition
transactions_us,1,100,US


In [26]:
%%sql
SELECT tableoid::regclass, user_id, account_id, geo_partition FROM transactions_eu;


 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
1 rows affected.


tableoid,user_id,account_id,geo_partition
transactions_eu,2,200,EU


In [27]:
%%sql
SELECT tableoid::regclass, user_id, account_id, geo_partition  FROM transactions_ap;

 * postgresql+psycopg2://yugabyte@127.0.0.1:5433/db_ybu
1 rows affected.


tableoid,user_id,account_id,geo_partition
transactions_ap,3,300,India


---
# All done!
In this lab, you completed the following:

- Setup
  - Created the `db_ybu` database with `ysqlsh`
  - Created tables and loaded data using DDL and DML scripts
  - Connected to the database using a PostgreSQL driver for Python

