-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose location, clustered_by to dbt-spark #43
Conversation
I have a question though, how can I add some tests to verify if the sql statements which are generated are correct? |
@drewbanin I've added some unit-tests for the macros as well |
I'm extremely excited about this feature! See my question above regarding a common location prefix for all tables in a project or schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tremendous lift on this, @NielsZeilemaker! This will flesh out the dbt-spark
plugin with a lot of sensible configuration options that work differently/only on Spark.
I have two asks:
- Could you add an
AdapterSpecificConfigs
block toimply.py
that sets all the Spark-specific config? You can check out the ones for Redshift, Snowflake, BigQuery, and Postgres. Basically, this is what tells dbt to grab certain configuration keywords fromdbt_project.yml
and combine with the ones in modelconfig
blocks. You should include configs added in this PR as well as the ones already supported by the plugin (file_format
,partition_by
), because we're currently missing those, too :) - Could you add information to the "Model Configuration" section of the README describing the config options you're adding in this PR?
@jtcohen6 I've implemented the changes, let me know if I can do anything else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really close. Could you:
- Update
location
tolocation_root
in two places (README +AdapterSpecificConfigs
) - Fix pep8 errors
@jtcohen6 done, fixed renamed location and fixed pep8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the great work!
Fantastic!!! 🥇 |
Maybe would be good to document the |
Its not specific to spark, bigquery has the same feature. Eg i guess it should be documented at the dbt level? |
For BQ it is on the main dbt website indeed. We might want to add the same info here https://docs.getdbt.com/docs/profile-spark |
I suggested additions of |
Btw, this feature (root_location) is incredible useful for the work at our current client! |
Implemented Location and Clustered By options which were not exposed as options.
https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html