Skip to content

Latest commit

 

History

History
272 lines (245 loc) · 11.9 KB

legacy.md

File metadata and controls

272 lines (245 loc) · 11.9 KB

Legacy YAML Based CLI Documentation

We are moving towards enabling the features available in this legacy yaml config based knockoff cli to the knockoff sdk. In the interim, this functionality is still supported using the legacy subcommand of the knockoff cli. We will also work to support yaml based configuration for the SDK.

Run poetry install/update with psycopg2

Requirements:

  • postgresql (brew install postrgresql)

Run the following command:

pg_config | grep "LDFLAGS ="

Output:

LDFLAGS = -L/usr/local/opt/openssl@1.1/lib -L/usr/local/opt/readline/lib -Wl,-dead_strip_dylibs

Take the value of LDFLAGS and set that environment variable. E.g.:

export LDFLAGS="-L/usr/local/opt/openssl@1.1/lib -L/usr/local/opt/readline/lib -Wl,-dead_strip_dylibs"

You should now be able to run poetry install and/or update commands without failing on psycopg2.

Local Postgres Setup

The following steps can be used to setup a local postgres instance for testing.

Requirements

  • docker
  • poetry (curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python)
  • postgresql (brew install postgresql) or pgcli (brew install pgcli)

Run Postgres

  1. Pull docker image docker pull postgres:11.7
  2. Run docker container: docker run --rm --name pg-docker -e POSTGRES_PASSWORD=docker -d -p 5432:5432 postgres:11.7
    • Note: you can see the running container with docker ps and can terminate it with docker kill pg-docker

You can now access the shell with PGPASSWORD=docker pgcli -h localhost -U postgres or PGPASSWORD=docker psql -h localhost -U postgres.

Tests

Run unit tests

poetry run pytest

The unit tests depend on fixtures using ephemeral postgres databases and/or instances. By default it will attempt to connect to an existing instance at postgresql://postgres@localhost:5432/postgres and will create and destroy databases per fixture. This postgres location can be overridden with the KNOCKOFF_TEST_DB_URI environment variable.

If no external postgres instance is available for testing, but postgresql is installed, the TEST_USE_EXTERNAL_DB environment variable can be set to 0. The fixtures will then rely on the testing.postgresql library to create ephemeral postgres instances per fixture.

If postgres is not available, dependent tests can be disabled with the following environment variable

export TEST_POSTGRES_ENABLED=0

Knockoff Legacy Yaml Based Configuration

Note: This yaml based configuration has been moved under the legacy subcommand for knockoff and a new yaml based configuration will be introduced that relies on the same objects in python based configuartion (knockoff.sdk).

Creating Databases

Knockoff will start by creating any specified databases. This section is optional if you do not need a database created. You can also configure an engine builder to use by providing the config parameter for each configured database (factory function used is knockoff.utilities.orm.sql.EngineBuilder.from_config) otherwise the default engine based on knockoff environment variables will be used. The following yaml will result in the following sql queries:

  • create database mydb;
  • create user myuser with encrypted password 'MYUSER_PASSWORD';
    • MYUSER_PASSWORD is replaced with the value of the corresponding environment variable
  • grant all privileges on database mydb to myuser;
create-databases:
  - name: mydb
    type: postgres
    users:
      - user: myuser
        password_env: MYUSER_PASSWORD

Load existing table definitions

Knockoff uses Pyrseas's yamltodb tool to load existing table definitions into a database.

Example:

In this example there is a table films with the following definition:

+----------+-------------------+-------------+
| Column   | Type              | Modifiers   |
|----------+-------------------+-------------|
| title    | character varying |  not null   |
| director | character varying |             |
| year     | character varying |             |
+----------+-------------------+-------------+
Indexes:
    "films_pkey" PRIMARY KEY, btree (title)

Executing dbtoyaml mydb results in the following yaml that knockoff can be configured to use to load with yamltodb.

schema public:
  description: standard public schema
  owner: postgres
  privileges:
  - PUBLIC:
    - all
  - postgres:
    - all
  table films:
    columns:
    - title:
        not_null: true
        type: character varying
    - director:
        type: character varying
    - year:
        type: character varying
    owner: myuser
    primary_key:
      films_pkey:
        columns:
        - title

Note: If you are running the local postgres setup described above and running from within a docker container on your mac, you can use the following: PGPASSWORD=docker dbtoyaml -H docker.for.mac.host.internal -U postgres mydb

Loading data into tables

Data can be loaded into new or existing tables.

Examples

The following example loads data into an existing table from a provided csv.

knockoff:
  dag:
    - name: films # arbitrary name of node in dag
      type: table # table | prototype | component | part
      table: films # defaults to the name of the node if not provided 
      source:
        strategy: io
        reader: pandas.read_csv
        kwargs:
          filepath_or_buffer: example/films.csv # local or s3:// path
          sep: "|"
      sink:
        database: mydb
        kwargs:
          if_exists: append # defaults to fail
          index: false # Data is loaded into a pandas DataFrame this option ignores the index

The following example loads data into a new table from data defined in the yaml.

knockoff:
  dag:
    - name: films2 # Note: "table" key not specified, so defaults to "film2"
      type: table
      source:
        strategy: io
        reader: inline
        kwargs:
          sep: ","
          data: |
            title,director,year
            t5,d1,2020
            t6,d2,2020
            t7,d1,2020
      sink:
        database: mydb
        user: myuser
        password_env: MYUSER_PASSWORD
        kwargs:
          index: false

Generating fake retail data

Knockoff uses faker to help generate fake retail data that can be used for testing. Hierarchical relationships with various dependencies can be also be modelled with knockoff. This example generates the following tables (in addition to the above examples).

postgres@localhost:mydb> select * from location;
+-------------------------------+---------------+-----------+
| address                       | location_id   | channel   |
|-------------------------------+---------------+-----------|
| 07528 Fischer Track Suite 779 | 1             | nfs       |
| Melissaview, MD 90363         |               |           |
| 1535 Kelly Canyon             | 2             | nso       |
| Rhodesborough, CA 43893       |               |           |
| 216 Kayla Lake Apt. 126       | 3             | nso       |
| South Matthewmouth, OH 36332  |               |           |
| 561 Jones Burg Suite 382      | 4             | nso       |
| Hugheschester, DE 21908       |               |           |
| 042 Robinson Fort Suite 945   | 5             | nfs       |
| Pattersonshire, NC 96317      |               |           |
| 2332 Watkins Road             | 0             | digital   |
| Davidfort, MS 71411           |               |           |
+-------------------------------+---------------+-----------+

postgres@localhost:mydb> select * from product;
+------------+----------+-----------------------+---------------+------------+
| division   | gender   | category              | color         | sku        |
|------------+----------+-----------------------+---------------+------------|
| apparel    | men      | shorts                | PaleGoldenRod | 6357812379 |
| apparel    | women    | pants & tights        | NavajoWhite   | 8332320303 |
| apparel    | men      | tops & t-shirts       | Lavender      | 9243289077 |
| shoes      | men      | lifestyle             | PaleGoldenRod | 7270972977 |
| apparel    | women    | pants & tights        | PaleGoldenRod | 4443641793 |
| apparel    | women    | hoodies & sweatshirts | NavajoWhite   | 6130018459 |
| shoes      | women    | jordan                | Lavender      | 3791231041 |
| apparel    | men      | pants & tights        | PaleGoldenRod | 3899370297 |
| apparel    | men      | shorts                | Lavender      | 7557742055 |
| apparel    | men      | pants & tights        | SkyBlue       | 9785957221 |
| apparel    | women    | shorts                | SkyBlue       | 9979359561 |
| apparel    | women    | tops & t-shirts       | Lavender      | 7006056836 |
| shoes      | women    | jordan                | Lavender      | 4853474331 |
| shoes      | women    | jordan                | NavajoWhite   | 6589395336 |
| apparel    | men      | pants & tights        | Beige         | 7168664719 |
| apparel    | men      | hoodies & sweatshirts | Beige         | 7525844204 |
| apparel    | men      | shorts                | SkyBlue       | 9735336861 |
| shoes      | men      | skateboarding         | SkyBlue       | 6385212885 |
| apparel    | men      | tops & t-shirts       | Beige         | 9735107927 |
| apparel    | women    | pants & tights        | SkyBlue       | 2633853831 |
| apparel    | women    | jackets & vests       | NavajoWhite   | 2758275877 |
| apparel    | men      | shorts                | Lavender      | 1330756304 |
| apparel    | women    | tops & t-shirts       | NavajoWhite   | 9334676293 |
| shoes      | men      | skateboarding         | Lavender      | 6735393792 |
| apparel    | men      | jackets & vests       | Lavender      | 2907811814 |
+------------+----------+-----------------------+---------------+------------+

postgres@localhost:mydb> select * from transactions limit 10;
+---------------+------------+-----------+------------+------------+------------+
| location_id   | sku        | line_id   | order_id   | quantity   | date       |
|---------------+------------+-----------+------------+------------+------------|
| 5             | 7557742055 | 2         | 2957859949 | 0          | 2018-05-06 |
| 1             | 1330756304 | 1         | 3920316859 | 0          | 2018-07-19 |
| 4             | 9243289077 | 3         | 1875617688 | 0          | 2019-10-14 |
| 3             | 9334676293 | 2         | 7317451987 | 0          | 2018-06-09 |
| 0             | 9979359561 | 3         | 1236640244 | 2          | 2019-07-17 |
| 0             | 9735107927 | 1         | 9030486883 | 3          | 2018-04-27 |
| 1             | 9735336861 | 2         | 6196902209 | 2          | 2020-01-21 |
| 3             | 7006056836 | 4         | 1432537227 | 0          | 2019-04-22 |
| 2             | 2907811814 | 1         | 4039132536 | 0          | 2019-09-23 |
| 0             | 4443641793 | 5         | 8648324533 | 2          | 2018-09-06 |
+---------------+------------+-----------+------------+------------+------------+

Run the example:

  1. Run postgres: docker run --rm --name pg-docker -e POSTGRES_PASSWORD=docker -d -p 5432:5432 postgres:11.9
    • Terminate existing container with docker kill pg-docker
    • Note: The example assumes you're running with POSTGRES_PASSWORD=docker and on port 5432
  2. Checkout the repo or download the examples folder
  3. Pull knockoff docker image: docker pull knockoff-factory
docker run --rm -v $PWD/examples/legacy:/examples \
-e KNOCKOFF_DB_HOST='docker.for.mac.host.internal' \
-e KNOCKOFF_DB_USER='postgres' \
-e KNOCKOFF_DB_PASSWORD='docker' \
-e KNOCKOFF_DB_NAME='knockoff' \
-e KNOCKOFF_CONFIG=/examples/knockoff.yaml knockoff-factory:latest knockoff legacy

Note: if you are loading data from an s3 bucket you have access to, you can enable your docker container access to those credentials by adding -v ~/.aws:/root/.aws to the docker run command.