Knockoff CLI
---

In this notebook we'll walk throught the `run` command that enables loading knockoff data configured via the sdk into an existing persistent or ephemerally created temp database. 


* [KnockoffContainer](#KnockoffContainer)
    * [Blueprint](#Blueprint)
* [TempDBContainer](#TempDBContainer)
* [Configuration](#config)
* [Example](#Example)


## KnockoffContainer

This is a [declarative container](https://python-dependency-injector.ets-labs.org/containers/declarative.html) that is provided to the cli with the **--container CONTAINER** option. **CONTAINER** is expected to be the package name for the constructor of a declarative container that provides a [KnockoffDatabaseService](KnockoffDB.ipynb) and a [Blueprint](#Blueprint).

The package name for default container used is `knockoff.sdk.container.default:KnockoffContainer` and this container should work for most use cases leveraging the **DefaultDatabaseService** implementation of the **KnockoffDatabaseService**.

_Note: Blueprint and KnockoffDatabaseService can be extended and injected into the run command by defining a new KnockoffContainer that configures them and is provided to the CLI through the --container option with the corresponding configuration passed through the --yaml-config option. Any KnockoffContainer should use the database_service.url path in the config for database connections in order to leverage the temp database created
with the --ephemeral flag_


### Blueprint

This is a class that helps organize [KnockoffDB](KnockoffDB.ipynb) configurations and enables dependency injection vis-a-vis a **Blueprint plan** function. 

A **Blueprint plan** is a function that accepts a **KnockoffDB** as input and returns the same instance after applying it's plan. This plan is provided to the **Blueprint** class' **\_\_init\_\_**. The **run** command calls the **construct** method of the **Blueprint** instance with a **KnockoffDB** instance as an input parameter to build the DataFrames. 


## TempDBContainer
This is another declarative container that is used if the **--ephemeral** flag is used. The **TempDBContainer** is provided to the cli with the **--tempdb-container TEMPDB_CONTAINER** option. **TEMPDB_CONTAINER** is expected to be the package name for the constructor of a declarative container that provides a [TempDatabaseService](TempDatabaseService.ipynb).


The **TempDBService** provided by the **default TempDBContainer** can be configured with any **setup_teardown** generator function and leverages the **SqlAlchemyInitTablesFunc** for the **initialize_tables** function. The **setup_teardown** generator function and the **SqlAlchemyInitTablesFunc**'s sqlalchemy declarative base are both configured using a package name.


## <a name="config"></a>Configuration (Default Containers)

The configuration for the containers is then provided to the cli with the **--yaml-config** option. Below is the default configuration for the **default KnockoffContainer and TempDBContainer**. The configuration provided through the **--yaml-config** option takes precedence over the default configuration and is overlayed recursively.


```yaml
# KnockoffContainer: KnockoffDatabaseService (DefaultDatabaseService) configuration
database_service:
  # Database instance URL. This defaults to "postgresql://postgres@localhost:5432/postgres"
  # unless the KNOCKOFF_RUN_DB_URL environment variable is set.
  # Note: This will be overriden by a temp url if the --ephemeral flag is used and 
  # should be used to configure the database connection for any KnockoffContainer
  # to enable the --ephemeral flag feature
  url: ${KNOCKOFF_RUN_DB_URL:postgresql://postgres@localhost:5432/postgres}

# KnockoffContainer: Blueprint configuration
blueprint:
  plan:
     # Package name for a Blueprint plan function which defaults to 
     # a plan that does nothing (noplan) unless the KNOCKOFF_RUN_BLUEPRINT_PLAN
     # environment variable is set
    package: ${KNOCKOFF_RUN_BLUEPRINT_PLAN:knockoff.sdk.blueprint:noplan}
    
# TempDBContainer: TempDBService
tempdb:
  # Database instance URL that will be used to create a temp database within. 
  # This defaults to "postgresql://postgres@localhost:5432/postgres"
  # unless the KNOCKOFF_RUN_DB_URL environment variable is set.
  url: ${KNOCKOFF_RUN_DB_URL:postgresql://postgres@localhost:5432/postgres}
  setup_teardown:
    package: knockoff.tempdb.setup_teardown:postgres_setup_teardown
```

These defaults can be overriden with the `KNOCKOFF_RUN_DB_URL` and `KNOCKOFF_RUN_BLUEPRINT_PLAN` environment variables, but configurations provided via the **--yaml-config** option will still take precendence.

## Example 

To run this example, you must install knockoff into your python environment (`pip install knockoff`). This example will use the following data model (`tests.knockoff.data_model:Base`) defined for the knockoff unit tests.

```python
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import JSON, Column, PrimaryKeyConstraint, UniqueConstraint
from sqlalchemy.types import Integer, String, Boolean, DateTime, BigInteger, Float

Base = declarative_base()

SOMETABLE = "sometable"


class SomeTable(Base):
    """A class that can be used for testing"""
    __tablename__ = SOMETABLE
    id = Column(BigInteger, autoincrement=True)
    str_col = Column(String)
    bool_col = Column(Boolean)
    dt_col = Column(DateTime)
    int_col = Column(Integer)
    float_col = Column(Float)
    json_col = Column(JSON)
    __table_args__ = (
        PrimaryKeyConstraint('id'),
        UniqueConstraint('str_col', 'int_col')
    )

```

We will also use the following blueprint plan (`tests.knockoff.blueprint:sometable_blueprint_plan`) defined for the knockoff unit tests (i.e. we are using postgres and will run the postgres instance at `postgresql://postgres@localhost:5432/postgres` as expected by the defaults).

```python
from knockoff.sdk.table import KnockoffTable
from tests.knockoff.data_model import SOMETABLE


def sometable_blueprint_plan(knockoff_db):
    table = KnockoffTable(
        SOMETABLE,
        autoload=True,
        size=10,
        # we drop this because it's an autoincrement table
        # so we will offload populating this to the
        # database sequencer
        drop=["id"]
    )
    knockoff_db.add(table)
    return knockoff_db
```

We will use the following configuration to provide the blueprint plan to the **KnockoffContainer** and the data model to the **TempDBContainer**. The rest of the configuration will leverage knockoff defaults.

```yaml
# knockoff.yaml
blueprint:
  plan:
    package: tests.knockoff.blueprint:sometable_blueprint_plan
tempdb:
  initialize_tables:
    base:
      package: tests.knockoff.data_model:Base
```

#### Run a postgres instance using docker
```bash
docker run --rm  --name pg-docker -e POSTGRES_HOST_AUTH_METHOD=trust -d -p 5432:5432  postgres:11.9
```

#### Run knockoff
```bash
knockoff run --yaml-config knockoff.yaml --ephemeral
```

Running the above produces the following output.
```bash
[2021-07-20 20:17:16,197] [knockoff.command.run] [MainProcess] [INFO]: TempDatabaseService created temp database:
postgresql://postgres@localhost:5432/test_347b1660e3c346a090d42491b31862c9
[2021-07-20 20:17:16,421] [knockoff.command.run] [MainProcess] [INFO]: knockoff data successfully loaded into database.
Press Enter when finished to destroy temp database.
```

In a separate terminal we will use the postgresql shell to see the data generated by knockoff.
```bash
# you can use psql if you prefer
pgcli -U postgres -h localhost -d test_347b1660e3c346a090d42491b31862c9

```


The psql commands and output from the session can be seen below.


```psql
test_347b1660e3c346a090d42491b31862c9> \dt
+----------+-----------+--------+----------+
| Schema   | Name      | Type   | Owner    |
|----------+-----------+--------+----------|
| public   | sometable | table  | postgres |
+----------+-----------+--------+----------+
SELECT 1
Time: 0.026s
test_347b1660e3c346a090d42491b31862c9> select * from sometable;
+------+-------------+------------+---------------------+-----------+-------------------+------------+
| id   | str_col     | bool_col   | dt_col              | int_col   | float_col         | json_col   |
|------+-------------+------------+---------------------+-----------+-------------------+------------|
| 1    | s7-0717678E | True       | 1972-06-17 17:19:28 | 4546      | 16.1508330740386  | {}         |
| 2    | A9-0939855z | True       | 1998-05-22 19:36:12 | 6247      | 67.2822214044621  | {}         |
| 3    | C0-3954708O | False      | 2005-04-26 02:09:57 | 9638      | 120626813.46796   | {}         |
| 4    | I3-2449072j | False      | 1997-09-05 00:11:21 | 901       | -30.7655916319046 | {}         |
| 5    | A5-8162936T | True       | 1996-03-23 03:36:16 | 8960      | 5064.96728253619  | {}         |
| 6    | X6-5655167x | True       | 1972-05-12 14:22:15 | 6356      | -2.15252611360164 | {}         |
| 7    | m6-9313205W | False      | 1986-04-10 17:41:58 | 2290      | -613.820576277695 | {}         |
| 8    | O5-406705P  | False      | 2019-03-08 15:43:31 | 1292      | 7118135362700.77  | {}         |
| 9    | R4-5782891o | False      | 2016-08-02 01:19:21 | 9872      | -5.2662141459354  | {}         |
| 10   | g3-5525789h | True       | 2012-05-10 00:20:37 | 7741      | 9.88884052395397  | {}         |
+------+-------------+------------+---------------------+-----------+-------------------+------------+
SELECT 10
Time: 0.015s
test_347b1660e3c346a090d42491b31862c9>
```

In the original terminal, pressing **Enter** will destroy the database.
