# Getting Started with CipherStash and Jupyter Notebook

This notebook describes how to get started with CipherStash using Python3, Jupyter Notebook, and psycopg2.

## Prerequisites

You must have:
* [Python 3](https://www.python.org/)
* [Jupyter Notebook](https://jupyter.org/install)
* [Docker](https://docs.docker.com/get-started/get-docker/)
* [Docker compose](https://docs.docker.com/compose/install/)
* [CipherStash account](https://cipherstash.com/signup)
* [CipherStash CLI](https://github.com/cipherstash/cli-releases/releases/latest)

## Start CipherStash Proxy and PostgreSQL

To start CipherStash Proxy and PostgreSQL on your machine, use the included `docker-compose.yml`.
This file requires you to set up a few environment variables:

* `CS_WORKSPACE_ID`
* `CS_CLIENT_ACCESS_KEY`
* `CS_ENCRYPTION__CLIENT_ID`
* `CS_ENCRYPTION__CLIENT_KEY`

There are other variables but default values are set for them in `docker-compose.yml`.
Change them if necessary to suit your setup.

If have the values for these variables available, you can skip to the **"Upload dataset config"** section.
Otherwise, sign up to [CipherStash](https://cipherstash.com/signup), install [CipherStash CLI](https://github.com/cipherstash/cli-releases/releases/latest), and do the following steps:

### Log into the workspace

Check that you have `stash` command available in your PATH, and run the command below and follow the instructions.
You will be either automatically be logged into your workspace, or be prompted to log into one of them.
Note the **Workspace ID** shown here.

In [None]:
! stash login

### Create an access key

> **NOTE**: If you already have an access key and prefer to use that rather than create a new one, you can use it instead. However, it is recommended that you create one here to use with this notebook.

Now you need to create an access key for the workspace.
Run the following command and securely store the values for **CS_CLIENT_ACCESS_KEY** as you will not be able to recover it if you lose it.
The `CS_WORKSPACE_ID` should be the same value as the Workspace ID shown at the step above.

In [None]:
! stash access-keys create cipherstash_getting_started_access_key

### Create a dataset

> **NOTE**: If you already have a dataset and prefer to use that rather than create a new one, you can use it instead. However, it is recommended that you create one here to use with this notebook.

After logging into your workspace, run the following command to create a new dataset, and note the **dataset ID**.

In [None]:
! stash datasets create cipherstash_getting_started

### Create a client

> **NOTE**: If you already have a client and prefer to use that rather than create a new one, you can use it instead. However, it is recommended that you create one here to use with this notebook.

Set the `CS_DATASET_ID` to the dataset ID value from the command above.
After that, run the command to create a client.
Note the **Client ID** and **Client Key** in the output.

In [None]:
%env CS_DATASET_ID=<dataset_id>

In [None]:
! stash clients create --dataset-id $CS_DATASET_ID cipherstash_getting_started_client

## Upload dataset config

There is a dataset configuration file provided as `dataset.yml` for the example table we will create.
You have to upload it to ZeroKMS so Proxy will know what to do with each column.
Replace the `<client_id>` and `<client_key>` values in the following `%env` to set the environment variables.
After that, run the command.
It should upload the configuration.

In [None]:
%env CS_CLIENT_ID=<client_id>

In [None]:
%env CS_CLIENT_KEY=<client_key>

In [None]:
! yes | head -n 1 | stash datasets config upload --file dataset.yml --client-id $CS_CLIENT_ID --client-key $CS_CLIENT_KEY

## Run docker compose

With the values for `CS_WORKSPACE_ID`, `CS_CLIENT_ACCESS_KEY`, `CS_ENCRYPTION__CLIENT_ID` and `CS_ENCRYPTION__CLIENT_KEY`, and `dataset.yml` uploaded, it's time to start PostgreSQL and Cipherstash Proxy.

Replace `<workspace_id>`, `<client_access_key>`, `<client_id>` and `<client_key>` with the values from the steps above and set those environment variables.
After setting those variables, run the `docker compose` commands. docker compose should successfully start the database and Proxy.

In [None]:
%env CS_WORKSPACE_ID=<workspace_id> 

In [None]:
%env CS_CLIENT_ACCESS_KEY=<client_access_key>

In [None]:
%env CS_ENCRYPTION__CLIENT_ID=<client_id>

In [None]:
%env CS_ENCRYPTION__CLIENT_KEY=<client_key>

In [None]:
! docker compose up -d

## Installing required components and table creation

Once the containers are up, there are a few things to be installed.
A table must also be created to store encrypted data.
Do the following steps to install them and create a table.

### Install database extensions

In [None]:
! PGPASSWORD=postgres psql -h localhost -p 5432 -U postgres cipherstash_getting_started < install.sql # should output messages like `CREATE *`

### Install EQL

In [None]:
! PGPASSWORD=postgres psql -h localhost -p 5432 -U postgres cipherstash_getting_started < cipherstash_encrypt_eql.sql # should output messages like `CREATE *`

### Install application specific database types

In [None]:
! PGPASSWORD=postgres psql -h localhost -p 5432 -U postgres cipherstash_getting_started < application_types.sql # should output messages like `CREATE DOMAIN`

### Create a table for testing encryption

In [None]:
! PGPASSWORD=postgres psql -h localhost -p 5432 -U postgres cipherstash_getting_started < create_examples_table.sql

### Classes that convert between the database format and Python format

There are classes prefixed with `defined in `cs_types.py` which handles conversion between the format CypherStash Proxy requires and the format for Python.

In order to encrypt and store plaintext values, CipherStash Proxy requires encrypted columns to be in JSONB format like:
```
{
  "k": "pt",
  "p": "hell, world",
  "i": {
    "t": "examples",
    "c": "encrypted_utf8_str"
  },
  "v": 1,
}
```

In Python, this conversion can be done by creating an object of `CsText` as:
```
txt = CsText("hell, world", "examples", "encrypted_utf8_str")
txt.to_db_format()
```

The constructor for `CsText` takes the string value, the table name (`"examples"`) and the column name (`"encrypted_utf8_str"`).

### Import class definitions

Those classes are defined in [cs_types.py](cs_types.py) if you are interested in implementation details.

In [None]:
from cs_types import *
from psycopg2.extras import RealDictCursor

## Insert end query encrypted data

With the database extensions, EQL, and application specific data types installed together with the type definitions for Python, your setup is now ready to encrypt and decrypt data.

To check what the JSONB format looks like, run the following:

In [None]:
CsText("hello, python", "examples", "encrypted_utf8_str").to_db_format()

Insert an example row:

In [None]:
from pprint import pprint
from datetime import datetime

conn = psycopg2.connect("host=localhost dbname=cipherstash_getting_started user=postgres password=postgres port=6432")

cur = conn.cursor(cursor_factory=RealDictCursor)

cur.execute("delete from examples") # Clear the table in case there are records from previous runs

cur.execute("INSERT INTO examples (encrypted_int, encrypted_boolean, encrypted_date, encrypted_float, encrypted_utf8_str) VALUES (%s, %s, %s, %s, %s)",
    (
        CsInt(-51, "examples", "encrypted_int").to_db_format(),
        CsBool(False, "examples", "encrypted_boolean").to_db_format(),
        CsDate(datetime.now().date(), "examples", "encrypted_date").to_db_format(),
        CsFloat(-0.5, "examples", "encrypted_float").to_db_format(),
        CsText("hello, world", "examples", "encrypted_utf8_str").to_db_format()
    )
)

conn.commit()

print("example row created in examples table")

Check What it looks like from both regular PostgreSQL running on port 5432 and CipherStash Proxy running on port 6432:

In [None]:
# From CipherStash Proxy; you should see plaintext JSONB
!printf '\\x \n select * from examples limit 1;' | PGPASSWORD=postgres psql -h localhost -p 6432 -U postgres cipherstash_getting_started

In [None]:
# From PostgreSQL; you should see JSONB with encrypted values
!printf '\\x \n select * from examples limit 1;' | PGPASSWORD=postgres psql -h localhost -p 5432 -U postgres cipherstash_getting_started

In the above example, not all fields are populated, but the populated fields should contain JSONB values including the encrypted values, with "k" set to "ct" indicating "cipher text".

### Converting to Python types

By querying the proxy, you will see the JSONB values as seen above (decrypted version in the Proxy example, not the PostgreSQL example).
The values should then be converted to types that can be used in Python using class methods for each type:

In [None]:
cur.execute("select * from examples")

records = cur.fetchall()

record0 = records[0]

# `from_parsed_json` methods convert the values into the corresponding Python types
print(f"int: {CsInt.from_parsed_json(record0['encrypted_int'])}")
print(f"boolean: {CsBool.from_parsed_json(record0['encrypted_boolean'])}")
print(f"datetime: {CsDate.from_parsed_json(record0['encrypted_date'])}")
print(f"float: {CsFloat.from_parsed_json(record0['encrypted_float'])}")
print(f"text: {CsText.from_parsed_json(record0['encrypted_utf8_str'])}")

### Querying with the encrypted fields

You can also use the encrypted fields for queries.

First, add some values so there are more than 1 text values and float values stored:

In [None]:
# data for MATCH
cur.execute("INSERT INTO examples (encrypted_utf8_str) VALUES (%s) ON CONFLICT DO NOTHING",
    (
        CsText("hello, python", "examples", "encrypted_utf8_str").to_db_format(),
    )
)

cur.execute("INSERT INTO examples (encrypted_utf8_str) VALUES (%s) ON CONFLICT DO NOTHING",
    (
        CsText("hello, jupyter", "examples", "encrypted_utf8_str").to_db_format(),
    )
)

# data for ORE
cur.execute("INSERT INTO examples (encrypted_float) VALUES (%s)",
    (
        CsFloat(100.1, "examples", "encrypted_float").to_db_format(),
    )
)

cur.execute("INSERT INTO examples (encrypted_float) VALUES (%s)",
    (
        CsFloat(100.2, "examples", "encrypted_float").to_db_format(),
    )
)

conn.commit()

print("created data for MATCH and ORE queries")

Now, a query can be run to look for a record in the `examples` table where `encrypted_utf_8_str` field contains text `"pyth"`:

In [None]:
# MATCH query for "pyth"
cur.execute("SELECT * FROM examples WHERE cs_match_v1(encrypted_utf8_str) @> cs_match_v1(%s)", (CsText("pyth", "examples", "encrypted_utf8_str").to_db_format(),))

found = cur.fetchall()[0]
print(f"Record Found with MATCH query: {CsRow(found).row}\n")
print(f"Text inside the found record: {CsText.from_parsed_json(found['encrypted_utf8_str'])}")

Similarly, a query for the exact text of `"hello, jupyter"` in the `encrypted_utf_8_str` field:

In [None]:
# UNIQUE
cur.execute("SELECT * FROM examples WHERE cs_unique_v1(encrypted_utf8_str) = cs_unique_v1(%s)", (CsText("hello, jupyter", "examples", "encrypted_utf8_str").to_db_format(),))
found = cur.fetchall()[0]
print(f"Record Found with UNIQUE query: {CsRow(found).row}\n")
print(f"Text inside the found record: {CsText.from_parsed_json(found['encrypted_utf8_str'])}")

Finally, a query for a record with `encrypted_float` that is larger than `100.15`:

In [None]:
# ORE
cur.execute("SELECT * FROM examples WHERE cs_ore_64_8_v1(encrypted_float) > cs_ore_64_8_v1(%s)", (CsFloat(100.15, "examples", "encrypted_float").to_db_format(),))
found = cur.fetchall()[0]
print(f"Record Found with ORE query: {CsRow(found).row}\n")
print(f"Float inside the found record: {CsFloat.from_parsed_json(found['encrypted_float'])}")
