<a href="https://colab.research.google.com/github/ainfanzon/Cockroach_IAM_Workshop/blob/main/GCP_Colab_notebooks/Exercise_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


<img src="https://drive.google.com/uc?id=1XYr9Tyrz31a5kZdo601xD1QWz_YM8-H3">

### CockroachDB is a distributed SQL database that is __*highly scalable*__, __*resilient*__, and __*easy to use*__.

# Identity and Access Management Workshop.
---
## Scalability and Resiliency.

In this section we are going to scale the cluster by adding three more nodes.<br>

<html>
<head>
<style>
table, th, td {
  border: 1px solid black;
  border-collapse: collapse;
}
</style>
</head>
<body>

<table style="width:100%">
  <tr>
       <td align="right">
          <img src="https://drive.google.com/uc?id=11EL90ujxRqr-aVUMmlgh9OnizmDo6ev-" width="850"
          height="300">
      </td>
  </tr>
</table>

</body>
</html>

You will:

1. Add three more nodes to cluster.
1. Create and populate a new database.
1. Simulate two nodes failure.
1. Restart nodes.

---

## 1. Add three more nodes to cluster.

CockroachDB is designed to be low touch and highly automated for operators, while remaining easy to reason about for developers.

**Benefits**

CockroachDB's horizontal scalability allows users to start small and scale out as needed. It also maintains ACID guarantees, so users don't have to risk their data to improve performance.

Execute the next two cells below to overview the current status of the cluster (alternatively use the DB Console).

In [2]:
import psycopg2
import pandas as pd

from IPython.display import IFrame, display, HTML, Markdown

pd.set_option('display.max_colwidth', None)

conn = psycopg2.connect(
        database = 'defaultdb'
      , user = 'root'
      , host = '35.89.64.89'                        # Use the GCP Compute Engine external IP address
      , port = '26257'
      , sslmode = 'disable'
)
cursor = conn.cursor()

In [None]:
cursor.execute("""
SELECT gn.node_id AS "Node ID"
     , gn.advertise_sql_address AS "Advertised Address"
     , gn.build_tag AS "Version"
     , current_timestamp() AT TIME ZONE 'UTC' - gn.started_at AS "Up Time"
     , "ranges" AS "Ranges"
     , leases AS "Leaders"
     , CASE WHEN is_live THEN 'LIVE' ELSE 'DEAD' END AS "status"
     , gl.membership
FROM crdb_internal.gossip_nodes AS gn join crdb_internal.gossip_liveness AS gl USING(node_id)
""")
result_set = cursor.fetchall()
df_result_set = pd.DataFrame(result_set, columns=[desc[0] for desc in cursor.description])
df_result_set.set_index('Node ID', inplace=True)
df_result_set

Unnamed: 0_level_0,Advertised Address,Version,Up Time,Ranges,Leaders,status,membership
Node ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,10.14.0.181:26257,v24.2.3,0 days 13:03:53.151964,73,18,LIVE,active
2,10.14.0.181:26259,v24.2.3,0 days 13:03:52.594917,72,17,LIVE,active
3,10.14.0.181:26258,v24.2.3,0 days 13:03:52.419338,73,20,LIVE,active
4,10.14.0.181:26260,v24.2.3,0 days 00:14:52.992078,74,17,LIVE,active
5,10.14.0.181:26261,v24.2.3,0 days 00:14:52.149110,75,21,LIVE,active
6,10.14.0.181:26262,v24.2.3,0 days 00:14:51.294552,75,19,LIVE,active


To scale the cluster follow the steps below:

- Copy and paste the following statements to the terminal **NOT** running the loading server.

> <code>
cockroach start --insecure --listen-addr=10.0.1.2:26260 --join=10.0.1.2:26257,10.0.1.2:26258,10.0.1.2:26259 --http-addr=10.0.1.2:8083 --locality=region=us-west,zone=us-west-1c --store=/home/cockroach/data/cr_data_4 --background<br>
cockroach start --insecure --listen-addr=10.0.1.2:26261 --join=10.0.1.2:26257,10.0.1.2:26258,10.0.1.2:26259 --http-addr=10.0.1.2:8084 --locality=region=us-west,zone=us-west-1b --store=/home/cockroach/data/cr_data_5 --background<br>
cockroach start --insecure --listen-addr=10.0.1.2:26262 --join=10.0.1.2:26257,10.0.1.2:26258,10.0.1.2:26259 --http-addr=10.0.1.2:8085 --locality=region=us-west,zone=us-west-1a --store=/home/cockroach/data/cr_data_6 --background<br>
</code>

Verify there are six instances of the `cockroach` process running on different ports.

- List all active `cockroach` processes.

> `pgrep -a cockroach | awk '{ print $5}'`

&emsp;&emsp;Each process will be listening on the same IP but different port.

> <code>
--listen-addr=10.0.1.2:26257<br>
--listen-addr=10.0.1.2:26258<br>
--listen-addr=10.0.1.2:26259<br>
--listen-addr=10.0.1.2:26260<br>
--listen-addr=10.0.1.2:26261<br>
--listen-addr=10.0.1.2:26262<br>
</code>

---

## 2. Create and populate the Northwind database.
<html>
<head>
<style>
table, th, td {
  border: 1px solid black;
  border-collapse: collapse;
}
</style>
</head>
<body>

<table style="width:100%">
  <tr>
      <td align="right">
          <img src="https://drive.google.com/uc?id=1eM0otn7ieCvBMXVQ0WmXgRKUf41GaS9h" width="850"
          height="650">
      </td>
  </tr>
</table>

</body>
</html>

Execute the following steps to create and load the data to the database:

- On the second terminal (not running the loading server) execute the **northwind.sql** script.

> <code>
cockroach sql --host 10.0.1.2 -u root -d default -f /home/cockroach/sql/northwind.sql --insecure
></code>

- On the cockroach **DB Console**:

> <code>
http://IP Address:8080/#/overview/list
</code>


In [5]:
conn = psycopg2.connect(
        database = 'northwind'
      , user = 'root'
      , host = '35.89.64.89'                        # Use the GCP Compute Engine external IP address
      , port = '26257'
      , sslmode = 'disable'
)
cursor = conn.cursor()
cursor.execute("""
SELECT gn.node_id AS "Node ID"
     , gn.advertise_sql_address AS "Advertised Address"
     , gn.build_tag AS "Version"
     , current_timestamp() AT TIME ZONE 'UTC' - gn.started_at AS "Up Time"
     , "ranges" AS "Ranges"
     , leases AS "Leaders"
     , CASE WHEN is_live THEN 'LIVE' ELSE 'DEAD' END AS "status"
     , gl.membership
FROM crdb_internal.gossip_nodes AS gn join crdb_internal.gossip_liveness AS gl USING(node_id)
""")
result_set = cursor.fetchall()
df_result_set = pd.DataFrame(result_set, columns=[desc[0] for desc in cursor.description])
df_result_set.set_index('Node ID', inplace=True)
df_result_set

Unnamed: 0_level_0,Advertised Address,Version,Up Time,Ranges,Leaders,status,membership
Node ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,10.14.0.148:26257,v24.2.3,0 days 02:21:43.718629,99,21,LIVE,active
2,10.14.0.148:26259,v24.2.3,0 days 02:21:43.231673,103,19,LIVE,active
3,10.14.0.148:26258,v24.2.3,0 days 02:21:43.078820,101,20,LIVE,active
4,10.14.0.148:26260,v24.2.3,0 days 00:05:31.424622,61,23,LIVE,active
5,10.14.0.148:26261,v24.2.3,0 days 00:05:30.520548,59,19,LIVE,active
6,10.14.0.148:26262,v24.2.3,0 days 00:05:29.631028,58,23,LIVE,active


# MAKE SURE TO COMPARE WITH THE DB CONSOLE AND EXPLAIN WHY

Compare the output with the result before

- How many nodes are there in the cluster?
- Is the number of ranges higher or lower than before?
- **In the DB Console what is the number of replicas different than the sql output? why?**

---

## 3. Simulate availability zone (two nodes) failure.

<html>
<head>
<style>
table, th, td {
  border: 1px solid black;
  border-collapse: collapse;
}
</style>
</head>
<body>

<table style="width:100%">
  <tr>
      <td align="right">
          <img src="https://drive.google.com/uc?id=1MCqaWMMCNDr2dYaP_JMDZykiQxpu_8Gd" width="850" height="375">
      </td>
  </tr>
</table>

</body>
</html>

- Set the time_until_store_dead to reduce the amount of time the cluster waits before considering a node dead. The default is five minutes and the minimum allowed is 1 minute and 15 seconds. Execute the command below on the second terminal.

> <code>
cockroach sql --insecure --host=10.0.1.2:26257 -d defaultdb --execute="SET CLUSTER SETTING server.time_until_store_dead = '1m15s';"
</code>

- Kill the west-1c nodes

> ```kill -9 $(pgrep -a cockroach | grep west-1c | awk '{print $1}')```

Check the DB Console after a couple of minutes :

- How many nodes are reported dead?
- In the replication status now many under replicated ranges are there?

- Check the Northwind database is still operational after a whole availability zone went down.

In [None]:
cursor.execute("""
SELECT p.product_name AS "Product Name"
     , SUM(od.unit_price * CAST(od.quantity AS FLOAT) * (1.0 - od.discount)) AS Sales
FROM products AS p INNER JOIN order_details AS od ON od.product_id = p.product_id
GROUP BY p.product_name
ORDER BY Sales DESC LIMIT 5
""")
result_set = cursor.fetchall()
df_result_set = pd.DataFrame(result_set, columns=[desc[0] for desc in cursor.description])
df_result_set.set_index('Product Name', inplace=True)
df_result_set

Unnamed: 0_level_0,sales
Product Name,Unnamed: 1_level_1
Côte de Blaye,141396.735619
Thüringer Rostbratwurst,80368.672484
Raclette Courdavault,71155.699911
Tarte au sucre,47234.969946
Camembert Pierrot,46825.480313


---

## 4. Restart nodes.

Execute the commands below to restart the nodes

<code>
cockroach start --insecure --listen-addr=10.0.1.2:26259 --join=10.0.1.2:26257,10.0.1.2:26258,10.0.1.2:26259 --http-addr=10.0.1.2:8082 --locality=region=us-west,zone=us-west-1c --store=/home/cockroach/data/cr_data_3 --background<br>
cockroach start --insecure --listen-addr=10.0.1.2:26260 --join=10.0.1.2:26257,10.0.1.2:26258,10.0.1.2:26259 --http-addr=10.0.1.2:8083 --locality=region=us-east,zone=us-west-1c --store=/home/cockroach/data/cr_data_4 --background
</code>

- What is happening with the under replicated ranges?
- Are all the nodes operational again?

Kill ALL CRDB processes and remove all CRDB data files.

> <code>
kill -9 $(pgrep cockroach)<br>
sudo rm -fR /home/cockroach/data/*
<code>


---
## CockroachDB is a distributed SQL database that is __*highly scalable*__, __*resilient*__, and __*easy to use*__.
<img src="https://drive.google.com/uc?id=1XYr9Tyrz31a5kZdo601xD1QWz_YM8-H3">

---

# Appendix

Workshop CRDB user id and passowrd

> <p>uid = roachie<br>
pwd = roachfan
</p>

List CRDB process id and process name.

> <code>pgrep -l cockroach</code>

List the listening address of each `cockroach` process.

> <code>pgrep -a cockroach | awk '{ print $5}'</code>

Kill ALL CRDB processes

> <code>kill -9  $(pgrep cockroach)</code>

Remove all CRDB files

> <code>sudo rm -fR /home/cockroach/data/*</code>

Update the IP in the script

sed -E -i s/HOST_IP/$(hostname -I | awk '{print $1}')/ northwind.sql