Support to add with properties when creating Trino/Presto tables #58

mdesmet · 2021-07-23T16:22:24Z

Fixes #53

By adding with_props in the model, we can enable users to add properties when creating tables. As presto/Trino supports a lot of different adapters, for now a Dict[String, String] type enables flexibility.

{{
  config(
    materialized='table',
    with_props={
      "format": "'PARQUET'",
      "partitioning": "ARRAY['bucket(id, 2)']",
    }
  )
}}
select 
  * 
from {{ source('inventory', 'products') }}
WHERE weight < 5
```
Following query is executed on Trino

```
create table "iceberg"."datalake"."low_weight_products__dbt_tmp"
    WITH (format = 'PARQUET',partitioning = ARRAY['bucket(id, 2)'])
  as (
    
select 
  * 
from "postgres"."inventory"."products"
WHERE weight < 5
  )
```

cla-bot · 2021-07-23T16:22:28Z

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @mdesmet

bachng2017 · 2021-08-08T02:50:51Z

hello @mdesmet
Do you think it is necessary to apply same implements for __create_csv_table also ? When seeding, this macros is called instead of create_table_as

bachng2017 · 2021-08-08T04:26:10Z

I need a comma here, before '\n' to run with our trino/hive configuration. Not sure about others.
650fb27#diff-0b5044213c0ce72456e38a752114fedb09c0623a7256be2a2fa3c6b4a7291f2bR65

mdesmet · 2021-08-08T05:42:20Z

@bachng2017: the comma is indeed necessary. Serves as a good reminder to myself to actually test before committing last minute changes.

Following two points:

The seed should indeed also adapted.
We should probably add a test for this feature into the integration test.

mdesmet · 2021-08-08T07:53:38Z

Hi @bachng2017 ,

I have an iceberg docker setup running and there it works for both the seeds and the models.

dbt_project.yaml:

seeds:
  trino_project:
      +catalog: iceberg
      +schema: datalake
      +with_props:
          format: "'PARQUET'"
          partitioning: ARRAY['bucket(series_reference, 2)']

Generating following create table statement on Trino:

create table "iceberg"."datalake"."trx" (Series_reference VARCHAR,Period DOUBLE,Data_value DOUBLE,Suppressed VARCHAR,STATUS VARCHAR,UNITS VARCHAR,Magnitude INTEGER,Subject VARCHAR,Group_name VARCHAR,Series_title_1 VARCHAR,Series_title_2 VARCHAR,Series_title_3 VARCHAR,Series_title_4 INTEGER,Series_title_5 INTEGER) WITH (format = 'PARQUET',
  partitioning = ARRAY['bucket(series_reference, 2)'])

bachng2017 · 2021-08-08T10:18:34Z

cool and thanks :) Seems that the maintainers are waiting for your CLA to be signed. Hope this will be merged soon

CLA has not been signed by users: @mdesmet

mdesmet · 2021-08-09T07:27:20Z

The CLA has been signed after the first commit.

@jtcohen6: any other remarks before this PR can be merged?

jtcohen6

@mdesmet Thanks for the PR, and apologies for the delayed follow-up!

The actual substance of the change here looks fine by me. I have two questions:

Do you have a strong feeling about naming the config with_props, versus properties or with_properties? I'm trying to find a canonical name for what these things are in Presto/Trino documentation (a la options in BigQuery or Spark), but a really satisfying answer is eluding me.
Did you consider implementing this in the way recommended by Support for table properties #53, and as we've done for other adapters, where each with property would be configured + validated independently? The biggest reason to prefer that approach is that it allows users to set/clobber each config independently, so e.g. they could set format for the entire project but then independently set another_property without clobbering the entire with_props dictionary. The biggest reason to prefer the generic dict approach is if there are dozens/hundreds of properties available, and we'd want to support all of them arbitrarily without naming/validating them specifically—as discussed a bit in Support setting table OPTIONS using config dbt-spark#147 (comment).

In any case, you certainly deserve props for the solid PR! :)

jtcohen6 · 2021-08-10T01:12:01Z

dbt/adapters/presto/impl.py


 import agate


+@dataclass
+class PrestoConfig(AdapterConfig):


mdesmet · 2021-08-10T17:00:32Z

Hi @jtcohen6,

The concept is called table properties. So maybe properties is a better name.

I have considered that approach mentioned in #53. However presto/trino is different from athena that it supports way more connectors.

For example take the Kudu connector:

CREATE TABLE user_events (
  user_id int WITH (primary_key = true),
  event_name varchar WITH (primary_key = true),
  message varchar,
  details varchar WITH (nullable = true, encoding = 'plain')
) WITH (
  partition_by_hash_columns = ARRAY['user_id'],
  partition_by_hash_buckets = 5,
  number_of_replicas = 3
);

Another example is the Hive connector:

CREATE TABLE hive.avro.avro_data (
   id bigint
 )
WITH (
   format = 'AVRO',
   avro_schema_url = '/usr/local/avro_data.avsc'
)

Another for clickhouse:

CREATE TABLE default.trino_ck (
  id int NOT NULL,
  birthday DATE NOT NULL,
  name VARCHAR,
  age BIGINT,
  logdate DATE NOT NULL
)
WITH (
  engine = 'MergeTree',
  order_by = ARRAY['id', 'birthday'],
  partition_by = ARRAY['toYYYYMM(logdate)'],
  primary_key = ARRAY['id'],
  sample_by = 'id'
);

It will be difficult to maintain and follow up with all connectors, hence the freedom of the dict[str, str] approach.

jtcohen6 · 2021-08-11T13:26:36Z

Thanks for the thorough response @mdesmet! I'm happy with properties as the config name, and I totally buy your rationale about there being too many connector-specific properties to state them all explicitly. That flexibility is worth the cost of losing finer-grained config inheritance + validation.

We don't currently have a suite of custom tests for Presto-specific functionality in this plugin. The code change looks straightforward, and most of the logic lives inside Jinja macros, so users have the ability to override/reimplement if they need to. If this is something you've tested locally, and verified that it's working for the cases you're interested in, I'd be ok with merging this for inclusion in v0.21.

mdesmet · 2021-08-11T14:30:13Z

Great, it's tested locally and working. Actually testing it is definitely possible through including a few docker images. That's how i've tested it, see https://github.com/innoverio/modern-data-stack

jtcohen6 · 2021-08-11T19:13:29Z

Very cool repo! I hadn't seen before, thanks for linking to it.

I'm thinking more along the lines of custom dbt integration tests to automate in CI, such as these ones for dbt-spark. That leverages a testing framework we've played around with, but haven't all the way locked down. If we find we need automated testing for more complex dbt-presto features, we should set up similar scaffolding.

jtcohen6 · 2021-08-11T19:14:53Z

I'll backport onto 0.21.latest now, so that this feature will become available in the next v0.21 prerelease.

bachng2017 · 2021-08-20T00:18:09Z

hello @mdesmet .
Talking about this, do you think we should have 2 kinds of properties: the current one and something line table_properties

In the implement macro, then we mix those 2 properties together before create the final one. This will help to solve the issue of common/individual setting while make sure users could use various formats for different connectors.

For example we will have a ``transactional: trueset in the project level andpartitioned_by` is set in table level of the config file or in each model. Increase the number of properties keyword might make user more confused.

mdesmet · 2021-08-20T07:23:29Z

@bachng2017 : Can you create a new issue for further discussion?

I have following thoughts:

How many levels of merging exist (project level, directory level, ...), so any key of the properties set on a lower level woud override or set the keys from a higher level.
Why name it differently as it really is the same properties?

mdesmet force-pushed the master branch from d82232f to 6a51185 Compare July 24, 2021 17:32

cla-bot bot added the cla:yes label Jul 24, 2021

mdesmet force-pushed the master branch from 1618953 to c5ad3a7 Compare August 8, 2021 08:00

jtcohen6 reviewed Aug 10, 2021

View reviewed changes

dbt/adapters/presto/impl.py

import agate

@dataclass

class PrestoConfig(AdapterConfig):

Copy link

Contributor

jtcohen6 Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

support for model and seed table properties

827d676

mdesmet force-pushed the master branch from c5ad3a7 to 827d676 Compare August 10, 2021 17:18

jtcohen6 approved these changes Aug 11, 2021

View reviewed changes

jtcohen6 merged commit 43c6e2b into dbt-labs:master Aug 11, 2021

bachng2017 mentioned this pull request Sep 3, 2021

improve properties setting for Trino/Presto tables #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to add with properties when creating Trino/Presto tables #58

Support to add with properties when creating Trino/Presto tables #58

mdesmet commented Jul 23, 2021

cla-bot bot commented Jul 23, 2021

bachng2017 commented Aug 8, 2021

bachng2017 commented Aug 8, 2021

mdesmet commented Aug 8, 2021

mdesmet commented Aug 8, 2021

bachng2017 commented Aug 8, 2021

mdesmet commented Aug 9, 2021

jtcohen6 left a comment

jtcohen6 Aug 10, 2021

mdesmet commented Aug 10, 2021

jtcohen6 commented Aug 11, 2021

mdesmet commented Aug 11, 2021

jtcohen6 commented Aug 11, 2021

jtcohen6 commented Aug 11, 2021

bachng2017 commented Aug 20, 2021

mdesmet commented Aug 20, 2021

Support to add with properties when creating Trino/Presto tables #58

Support to add with properties when creating Trino/Presto tables #58

Conversation

mdesmet commented Jul 23, 2021

cla-bot bot commented Jul 23, 2021

bachng2017 commented Aug 8, 2021

bachng2017 commented Aug 8, 2021

mdesmet commented Aug 8, 2021

mdesmet commented Aug 8, 2021

bachng2017 commented Aug 8, 2021

mdesmet commented Aug 9, 2021

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Aug 10, 2021

Choose a reason for hiding this comment

mdesmet commented Aug 10, 2021

jtcohen6 commented Aug 11, 2021

mdesmet commented Aug 11, 2021

jtcohen6 commented Aug 11, 2021

jtcohen6 commented Aug 11, 2021

bachng2017 commented Aug 20, 2021

mdesmet commented Aug 20, 2021