<a href="https://colab.research.google.com/github/ainfanzon/Cockroach_IAM_Workshop/blob/main/GCP_Colab_notebooks/Exercise_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


<img src="https://drive.google.com/uc?id=1XYr9Tyrz31a5kZdo601xD1QWz_YM8-H3">

### CockroachDB is a distributed SQL database that is __*highly scalable*__, __*resilient*__, and __*easy to use*__.

# Identity and Access Management Workshop.
---
## Scalability and Resiliency.

In this section we are going to scale-up the cluster. You will:

1. Add three more nodes to cluster.
1. Configure the HAProxy Load Balancer.
1. Create and populate the Northwind database.
1. Simulate two nodes failure.
1. Restart nodes (Optional)
<br>

<html>
<head>
<style>
table, th, td {
  border: 1px solid black;
  border-collapse: collapse;
}
</style>
</head>
<body>

<table style="width:100%">
  <tr>
       <td align="right">
          <img src="https://drive.google.com/uc?id=1VS1jCK6UAUeqdNrKot3BbOZrbMVo2m5l" width="850"
          height="350">
      </td>
  </tr>
</table>

</body>
</html>

---

## 1. Add three more nodes to cluster.

CockroachDB is designed to be low touch and highly automated for operators, while remaining easy to reason about for developers.

**Benefits**

CockroachDB's horizontal scalability allows users to start small and scale out as needed. It also maintains ACID guarantees, so users don't have to risk their data to improve performance.

Execute the next two cells below to overview the current status of the cluster (alternatively use the DB Console). In the cell below, replace the host value with your PUBLIC IP address.

In [None]:
import psycopg2
import pandas as pd

from IPython.display import IFrame, display, HTML, Markdown

pd.set_option('display.max_colwidth', None)

try:
    conn = psycopg2.connect(
        database = 'defaultdb'
      , user = 'root'
      , host = '<Your PUBLIC IP>'      # Use the PUBLIC IP
      , port = '26257'
      , sslmode = 'disable'
    )
    display(Markdown("## Connection successful!"))
    cursor = conn.cursor()
except psycopg2.OperationalError as e:
    print(f"Error connecting to database: {e}")

Display the information of the cluster before adding more data.

In [None]:
try:
    cursor = conn.cursor()
    cursor.execute("""
        SELECT gn.node_id AS "Node ID"
             , gn.advertise_sql_address AS "Advertised Address"
             , gn.build_tag AS "Version"
             , current_timestamp() AT TIME ZONE 'UTC' - gn.started_at AS "Up Time"
             , "ranges" AS "Ranges"
             , leases AS "Leaders"
             , CASE WHEN is_live THEN 'LIVE' ELSE 'DEAD' END AS "status"
             , gl.membership
        FROM crdb_internal.gossip_nodes AS gn join crdb_internal.gossip_liveness AS gl USING(node_id)
        """)
    result_set = cursor.fetchall()
    df_result_set = pd.DataFrame(result_set, columns=[desc[0] for desc in cursor.description])
    df_result_set.set_index('Node ID', inplace=True)
    display(df_result_set)
    cursor.close()
except psycopg2.OperationalError as e:
    cursor.close()
    conn.commit()

To scale the cluster follow the steps below:

- Use the __**cockroach start**__ command to provision additional nodes to the cluster.

> <p>cockroach start<br>
&emsp;&emsp;--insecure<br>
&emsp;&emsp;--listen-addr=&lt;ip address&gt;:&lt;sql listening port&gt;<br>
&emsp;&emsp;--join=&lt;ip address&gt;:&lt;sql listening port&gt;, ... ,&lt;ip address&gt;:&lt;sql listening port&gt;<br>
&emsp;&emsp;--http-addr=&lt;ip address&gt;:&lt;http listening port&gt;<br>
&emsp;&emsp;--locality=region=us-west,zone=us-west-1a<br> &emsp;&emsp;--store=/home/cockroach/data/cr_data_1<br>
&emsp;&emsp;--background<br>
</p>

Execute the __**add_nodes.sh**__ shell script in the scripts directory.
<br>
> </code>
/home/cockroach/scripts/add_nodes.sh us-west
</code>

Verify there are six instances of the `cockroach` process running on different ports.

- List all active `cockroach` processes.

> `pgrep -a cockroach | awk '{ print $5}'`

&emsp;&emsp;Each process will be listening on the same IP but different port.

> <code>
--listen-addr=10.0.1.2:26257<br>
--listen-addr=10.0.1.2:26258<br>
--listen-addr=10.0.1.2:26259<br>
--listen-addr=10.0.1.2:26260<br>
--listen-addr=10.0.1.2:26261<br>
--listen-addr=10.0.1.2:26262<br>
</code>

### Few points to note.

Using the DB Console overview.

- Expnd the nodes to see how they are grouped?
- Which databases have been created in the cluster?
- Are there any hot ranges in the IAM database?
<br>

---

## 2. Configure the HAProxy Load Balancer.

HAProxy is an open-source TCP load balancer. CockroachDB includes a built-in command for generating a configuration file that is preset to work with your running cluster.

- Change to the /home/cockroach/haproxy directory

> ```cd /home/cockroach/haproxy```

- Run the cockroach gen haproxy command, specifying the address of any CockroachDB node:

> ```echo $(hostname -I) | xargs -I {} cockroach gen haproxy --insecure --host={} --port=26257```

- Edit the haproxy.cfg file and change the binding port number to 26276

> ```sed -E -i 's/bind :26257/bind :26276/' haproxy.cfg```

- Execute hpaproxy in background mode

> ```haproxy -f haproxy.cfg &```

## 3. Create and populate the Northwind database.
<html>
<head>
<style>
table, th, td {
  border: 1px solid black;
  border-collapse: collapse;
}
</style>
</head>
<body>

<table style="width:100%">
  <tr>
      <td align="right">
          <img src="https://drive.google.com/uc?id=1eM0otn7ieCvBMXVQ0WmXgRKUf41GaS9h" width="850"
          height="650">
      </td>
  </tr>
</table>

</body>
</html>

Execute the following steps to create and load the data to the database:

- Run the sed command below to update the script IP address:

> ```sed -E -i s/HOST_IP/$(hostname -I | awk '{print $1}')/ /home/cockroach/sql/northwind.sql```

&emsp;&emsp;NOTE: Make sure the HTTP server is running on port 3000. ( `python -m http.server 3000` )


> ```cockroach sql --host $(hostname -I) -u root -d default -f /home/cockroach/sql/northwind.sql --insecure```

- On the cockroach **DB Console** go to the Overview page:

> <code>
http://&lt;Public IP Address&gt>:8080/#/overview/list
</code>

- Execute the SQL statement in the cell below to connect to the Northwind database
 (NOTE: make sure to change the host with your Public IP):

In [None]:
try:
    conn = psycopg2.connect(
        database = 'northwind'
      , user = 'root'
      , host = '<Your PUBLIC IP>'     # Use the PUBLIC IP
      , port = '26276'            # This is the HAProxy listening port
      , sslmode = 'disable'
    )
    display(Markdown("### Connection successful!"))
    cursor = conn.cursor()
except psycopg2.OperationalError as e:
    print(f"Error connecting to database: {e}")

try:
    cursor = conn.cursor()
    cursor.execute("""
        SELECT gn.node_id AS "Node ID"
             , gn.advertise_sql_address AS "Advertised Address"
             , gn.build_tag AS "Version"
             , current_timestamp() AT TIME ZONE 'UTC' - gn.started_at AS "Up Time"
             , "ranges" AS "Ranges"
             , leases AS "Leaders"
             , CASE WHEN is_live THEN 'LIVE' ELSE 'DEAD' END AS "status"
             , gl.membership
        FROM crdb_internal.gossip_nodes AS gn JOIN crdb_internal.gossip_liveness AS gl USING(node_id)
        """)
    result_set = cursor.fetchall()
    df_result_set = pd.DataFrame(result_set, columns=[desc[0] for desc in cursor.description])
    df_result_set.set_index('Node ID', inplace=True)
    display(df_result_set)
    cursor.close()
except psycopg2.OperationalError as e:
    cursor.close()
    conn.commit()

#### Few points to note

Compare the output with the earlier result.

- How many nodes are there in the cluster?
- Is the number of ranges higher or lower than before?
- **In the DB Console what is the number of replicas and nodes different than the sql output? why?**

---

## 4. Simulate failure of two nodes on different regions.

<html>
<head>
<style>
table, th, td {
  border: 1px solid black;
  border-collapse: collapse;
}
</style>
</head>
<body>

<table style="width:100%">
  <tr>
      <td align="right">
          <img src="https://drive.google.com/uc?id=1MCqaWMMCNDr2dYaP_JMDZykiQxpu_8Gd" width="850" height="375">
      </td>
  </tr>
</table>

</body>
</html>

- Set the __**time_until_store_dead**__ to reduce the amount of time the cluster waits before considering a node dead. The default is five minutes and the minimum allowed is 1 minute and 15 seconds. Execute the command below on the second terminal.

> <code>
echo $(hostname -I) | xargs -I {} cockroach sql --insecure --host={}:26257 -d defaultdb --execute="SET CLUSTER SETTING server.time_until_store_dead = '1m15s';"
</code>

- To simulate an availability zone failure that brings down two nodes in the cluster execute the `az_failure.sh` script.

> ```/home/cockroach/scripts/az_failure.sh```

#### Few points to note

- Check the DB Console after a couple of minutes:

    - Did your DB Console connection died? Was the DB Consoled connected to the node that was killed?
    - How many nodes are reported dead?
    - Under the **Replication Status** heading, how many many under replicated ranges are there?
    - Are there any warnings from the haproxy load balancer?

- Check if the Northwind database is still operational after a whole availability zone went down.

In [None]:
try:
    cursor = conn.cursor()
    cursor.execute("""
        SELECT p.product_name AS "Product Name"
             , SUM(od.unit_price * CAST(od.quantity AS FLOAT) * (1.0 - od.discount)) AS Sales
        FROM products AS p INNER JOIN order_details AS od ON od.product_id = p.product_id
        GROUP BY p.product_name
        ORDER BY Sales DESC LIMIT 5
        """)
    result_set = cursor.fetchall()
    df_result_set = pd.DataFrame(result_set, columns=[desc[0] for desc in cursor.description])
    df_result_set.set_index('Product Name', inplace=True)
    display(df_result_set)
    cursor.close()
except psycopg2.OperationalError as e:
    cursor.close()
    conn.commit()

__**You might need to reconnect to the database**__

In [None]:
try:
    conn = psycopg2.connect(
        database = 'northwind'
      , user = 'root'
      , host = '35.164.168.170'     # Use the PUBLIC IP
      , port = '26276'            # This is the HAProxy listening port
      , sslmode = 'disable'
    )
    display(Markdown("### Connection successful!"))
    cursor = conn.cursor()
except psycopg2.OperationalError as e:
    print(f"Error connecting to database: {e}")

---

## 5. Restart nodes (Optional).

Execute the `restore_nodes.sh` script that was generated in the previous step.

> ```/tmp/restore_nodes.sh```

- What is happening with the under replicated ranges?
- Are all the nodes operational again?
- What is reported in the haproxy log?

If there are still dead nodes you can decommision them using the following command:

__**NOTE:**__ Make sure to replace the &lt;node ids&gt; to the node you want to decommission.

> <code>echo $(hostname -I) | xargs -I {} cockroach node decommission &lt;node ids&gt; --insecure --host={}</code>

For example if you have nodes 1 and 4 dead, then replace the command should be:

> <code> echo $(hostname -I) | xargs -I {} cockroach node decommission 1 4 --insecure --host={}</code>

---
## CockroachDB is a distributed SQL database that is __*highly scalable*__, __*resilient*__, and __*easy to use*__.
<img src="https://drive.google.com/uc?id=1XYr9Tyrz31a5kZdo601xD1QWz_YM8-H3">

---

# Appendix

Workshop CRDB user id and passowrd

> <p>uid = roachie<br>
pwd = roachfan
</p>

List CRDB process id and process name.

> <code>pgrep -l cockroach</code>

List the listening address of each `cockroach` process.

> <code>pgrep -a cockroach | awk '{ print $5}'</code>

Kill ALL CRDB processes

> <code>kill -9  $(pgrep cockroach)</code>

Remove all CRDB files

> <code>sudo rm -fR /home/cockroach/data/*</code>

Update the IP in the script

> <code>sed -E -i s/HOST_IP/$(hostname -I | awk '{print $1}')/ northwind.sql</code>

List all listening ports

> <code>netstat -ntlp</code>