Expose location, clustered_by to dbt-spark #43

NielsZeilemaker · 2019-12-22T09:50:31Z

Implemented Location and Clustered By options which were not exposed as options.
https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html

NielsZeilemaker · 2019-12-22T09:53:02Z

I have a question though, how can I add some tests to verify if the sql statements which are generated are correct?

NielsZeilemaker · 2020-01-03T08:42:48Z

@drewbanin I've added some unit-tests for the macros as well

dbt/include/spark/macros/adapters.sql

aaronsteers · 2020-01-28T03:29:38Z

I'm extremely excited about this feature! See my question above regarding a common location prefix for all tables in a project or schema.

test/unit/test_macros.py

jtcohen6

Thanks for the tremendous lift on this, @NielsZeilemaker! This will flesh out the dbt-spark plugin with a lot of sensible configuration options that work differently/only on Spark.

I have two asks:

Could you add an AdapterSpecificConfigs block to imply.py that sets all the Spark-specific config? You can check out the ones for Redshift, Snowflake, BigQuery, and Postgres. Basically, this is what tells dbt to grab certain configuration keywords from dbt_project.yml and combine with the ones in model config blocks. You should include configs added in this PR as well as the ones already supported by the plugin (file_format, partition_by), because we're currently missing those, too :)
Could you add information to the "Model Configuration" section of the README describing the config options you're adding in this PR?

dbt/include/spark/macros/adapters.sql

NielsZeilemaker · 2020-02-06T16:14:09Z

@jtcohen6 I've implemented the changes, let me know if I can do anything else

jtcohen6

This is really close. Could you:

Update location to location_root in two places (README + AdapterSpecificConfigs)
Fix pep8 errors

dbt/adapters/spark/impl.py

README.md

NielsZeilemaker · 2020-02-06T20:03:27Z

@jtcohen6 done, fixed renamed location and fixed pep8

jtcohen6

Thank you for the great work!

aaronsteers · 2020-02-07T03:39:01Z

Fantastic!!! 🥇

Dandandan · 2020-02-07T07:08:12Z

Maybe would be good to document the persist_docs feature as well?

NielsZeilemaker · 2020-02-07T07:25:08Z

Its not specific to spark, bigquery has the same feature. Eg i guess it should be documented at the dbt level?

Dandandan · 2020-02-07T07:33:07Z

For BQ it is on the main dbt website indeed.

We might want to add the same info here https://docs.getdbt.com/docs/profile-spark

Dandandan · 2020-02-07T09:00:04Z

I suggested additions of persist_docs to the spark-specific page.

Dandandan · 2020-02-07T09:04:15Z

Btw, this feature (root_location) is incredible useful for the work at our current client!

Expose location, clustered_by to dbt-spark

5642965

NielsZeilemaker added 2 commits December 22, 2019 10:54

Fixup

ad676ea

Add unit-test for macros

8b0d1e0

Support persist_docs

b167e5f

aaronsteers reviewed Jan 28, 2020

View reviewed changes

dbt/include/spark/macros/adapters.sql Outdated Show resolved Hide resolved

Dandandan mentioned this pull request Jan 28, 2020

Add support for creating/dropping schema's #40

Merged

aaronsteers reviewed Jan 29, 2020

View reviewed changes

test/unit/test_macros.py Outdated Show resolved Hide resolved

jtcohen6 reviewed Jan 29, 2020

View reviewed changes

dbt/include/spark/macros/adapters.sql Outdated Show resolved Hide resolved

dbt/include/spark/macros/adapters.sql Outdated Show resolved Hide resolved

dbt/include/spark/macros/adapters.sql Outdated Show resolved Hide resolved

jtcohen6 mentioned this pull request Jan 30, 2020

0.15.0 upgrade #46

Merged

NielsZeilemaker added 5 commits February 4, 2020 14:17

Fixup, removed log statement

29615d4

Fixup, readme + adapter specific config

f23901c

Make explicit that buckets is required if clustered_by is specified

db59850

Revert spark__get_columns_in_relation

c814994

Switch to location_root

50e1378

jtcohen6 reviewed Feb 6, 2020

View reviewed changes

dbt/adapters/spark/impl.py Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

NielsZeilemaker added 2 commits February 6, 2020 20:59

Fixup, location_root

81c0ef0

Fixup, pep8

f7f8d84

jtcohen6 approved these changes Feb 6, 2020

View reviewed changes

jtcohen6 merged commit c1a53dd into dbt-labs:master Feb 6, 2020

aaronsteers mentioned this pull request Feb 7, 2020

Materialized tables creation fails on EMR #21

Closed

jtcohen6 mentioned this pull request Apr 3, 2020

Update readme (0.16.0) #73

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose location, clustered_by to dbt-spark #43

Expose location, clustered_by to dbt-spark #43

NielsZeilemaker commented Dec 22, 2019

NielsZeilemaker commented Dec 22, 2019

NielsZeilemaker commented Jan 3, 2020

aaronsteers commented Jan 28, 2020

jtcohen6 left a comment

NielsZeilemaker commented Feb 6, 2020

jtcohen6 left a comment

NielsZeilemaker commented Feb 6, 2020

jtcohen6 left a comment

aaronsteers commented Feb 7, 2020

Dandandan commented Feb 7, 2020

NielsZeilemaker commented Feb 7, 2020

Dandandan commented Feb 7, 2020

Dandandan commented Feb 7, 2020

Dandandan commented Feb 7, 2020

Expose location, clustered_by to dbt-spark #43

Expose location, clustered_by to dbt-spark #43

Conversation

NielsZeilemaker commented Dec 22, 2019

NielsZeilemaker commented Dec 22, 2019

NielsZeilemaker commented Jan 3, 2020

aaronsteers commented Jan 28, 2020

jtcohen6 left a comment

Choose a reason for hiding this comment

NielsZeilemaker commented Feb 6, 2020

jtcohen6 left a comment

Choose a reason for hiding this comment

NielsZeilemaker commented Feb 6, 2020

jtcohen6 left a comment

Choose a reason for hiding this comment

aaronsteers commented Feb 7, 2020

Dandandan commented Feb 7, 2020

NielsZeilemaker commented Feb 7, 2020

Dandandan commented Feb 7, 2020

Dandandan commented Feb 7, 2020

Dandandan commented Feb 7, 2020