Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose location, clustered_by to dbt-spark #43

Merged
merged 11 commits into from
Feb 6, 2020

Conversation

NielsZeilemaker
Copy link
Collaborator

Implemented Location and Clustered By options which were not exposed as options.
https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html

@NielsZeilemaker
Copy link
Collaborator Author

I have a question though, how can I add some tests to verify if the sql statements which are generated are correct?

@NielsZeilemaker
Copy link
Collaborator Author

@drewbanin I've added some unit-tests for the macros as well

@aaronsteers
Copy link
Contributor

I'm extremely excited about this feature! See my question above regarding a common location prefix for all tables in a project or schema.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tremendous lift on this, @NielsZeilemaker! This will flesh out the dbt-spark plugin with a lot of sensible configuration options that work differently/only on Spark.

I have two asks:

  • Could you add an AdapterSpecificConfigs block to imply.py that sets all the Spark-specific config? You can check out the ones for Redshift, Snowflake, BigQuery, and Postgres. Basically, this is what tells dbt to grab certain configuration keywords from dbt_project.yml and combine with the ones in model config blocks. You should include configs added in this PR as well as the ones already supported by the plugin (file_format, partition_by), because we're currently missing those, too :)
  • Could you add information to the "Model Configuration" section of the README describing the config options you're adding in this PR?

dbt/include/spark/macros/adapters.sql Outdated Show resolved Hide resolved
dbt/include/spark/macros/adapters.sql Outdated Show resolved Hide resolved
dbt/include/spark/macros/adapters.sql Outdated Show resolved Hide resolved
@jtcohen6 jtcohen6 mentioned this pull request Jan 30, 2020
@NielsZeilemaker
Copy link
Collaborator Author

@jtcohen6 I've implemented the changes, let me know if I can do anything else

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really close. Could you:

  • Update location to location_root in two places (README + AdapterSpecificConfigs)
  • Fix pep8 errors

dbt/adapters/spark/impl.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@NielsZeilemaker
Copy link
Collaborator Author

@jtcohen6 done, fixed renamed location and fixed pep8

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great work!

@jtcohen6 jtcohen6 merged commit c1a53dd into dbt-labs:master Feb 6, 2020
@aaronsteers
Copy link
Contributor

Fantastic!!! 🥇

@Dandandan
Copy link
Contributor

Maybe would be good to document the persist_docs feature as well?

@NielsZeilemaker
Copy link
Collaborator Author

Its not specific to spark, bigquery has the same feature. Eg i guess it should be documented at the dbt level?

@Dandandan
Copy link
Contributor

For BQ it is on the main dbt website indeed.

We might want to add the same info here https://docs.getdbt.com/docs/profile-spark

@Dandandan
Copy link
Contributor

I suggested additions of persist_docs to the spark-specific page.

@Dandandan
Copy link
Contributor

Btw, this feature (root_location) is incredible useful for the work at our current client!

@jtcohen6 jtcohen6 mentioned this pull request Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants