-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use built-in adapter functionality for datatypes #586
Conversation
This turned out awesome! 💪
Is there anything we can do to make this feel more good? e.g., like adding in some kind of unit testing?
It seems like package authors will be able to use any of those 3 no matter what, right? But we are just try to choose which of those we want to encourage? Will the package authors have a variable in hand (like
Otherwise, this is nice and pithy:
|
Definitely! I'll look into how we might do this
Correct! Which of these do we want to encourage, in a world where they actually all do the same thing? Just a question of the right syntactic sugar to sprinkle in. |
32a2409
to
9a245dc
Compare
I've added tests for these data types (and removed the previous placeholders). These tests check both:
Two hesitations with the current implementation:
All of the actual changes for this PR are happening in this repo only, for now. Things we could do in core/plugins to make the code here slightly cleaner:
Not sure if ready for final review, but possibly another look! |
macros/cross_db_utils/datatypes.sql
Outdated
{% macro default__type_numeric() %} | ||
numeric(28, 6) | ||
{{ return(api.Column.numeric_type("numeric", 28, 6)) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: SparkSQL wants this to be called decimal
instead of numeric
. Investigate whether that works on other standard DBs, or if we should use translate_type
for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel good about this when you do.
macros/cross_db_utils/datatypes.sql
Outdated
{%- macro get_data_type(dtype) -%} | ||
{# if there is no translation for 'dtype', it just returns 'dtype' #} | ||
{{ return(api.Column.translate_type(dtype)) }} | ||
{%- endmacro -%} | ||
|
||
{# string ------------------------------------------------- #} | ||
|
||
{%- macro type_string() -%} | ||
{{ return(adapter.dispatch('type_string', 'dbt_utils')()) }} | ||
{%- endmacro -%} | ||
|
||
{% macro default__type_string() %} | ||
string | ||
{% endmacro %} | ||
|
||
{%- macro redshift__type_string() -%} | ||
varchar | ||
{%- endmacro -%} | ||
|
||
{% macro postgres__type_string() %} | ||
varchar | ||
{% endmacro %} | ||
|
||
{% macro snowflake__type_string() %} | ||
varchar | ||
{{ return(dbt_utils.get_data_type("string")) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would feel good about pushing both get_data_type(dtype)
and type_{X}
into dbt-core.
But I'm not sure which risks or maintenance burden it would impose to have both.
They each seem to have their use-cases:
type_{X}
is both pithy and clear but doesn't accept a variabledtype
get_data_type(dtype)
is a little more verbose but does accept a variabledtype
If we could only choose one option to push into dbt-core, I would choose type_{X}
because:
- it is compact and clear
- we can always utilize the
api.Column.translate_type(dtype)
syntax for cases when we need a variabledtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think you make a compelling case in favor of type_{X}
macros!
api.Column.translate_type(dtype)
will accept ANY input string, even api.Column.translate_type('fake_type_xyz')
. It will translate that input string if it's recognized, or just return the input string if it isn't.
So, there is benefit to the "stronger typing" achieved by macros that have the standard type right in the name, even if (behind-the-scenes) it still just shells out to api.Column.translate_type
.
0c97f43
to
6f19550
Compare
I removed |
Looks great to me! @jtcohen6 Do these sound like the right next steps?
|
@dbeatty10 That sounds right to me! After merging the |
dev-requirements.txt
Outdated
git+https://github.com/dbt-labs/dbt-bigquery.git | ||
git+https://github.com/dbt-labs/dbt-core.git@jerco/data-type-macros#egg=dbt-core&subdirectory=core | ||
git+https://github.com/dbt-labs/dbt-core.git@jerco/data-type-macros#egg=dbt-tests-adapter&subdirectory=tests/adapter | ||
git+https://github.com/dbt-labs/dbt-bigquery.git@jerco/data-type-macros |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the dbt-core
PR is merged, only the dbt-bigquery
PR (dbt-labs/dbt-bigquery#214) is actually blocking this one, since the Redshift + Snowflake PRs don't have substantive changes (just inheriting tests).
TODO: Add back imports from main
. There was no reason to remove these.
git+https://github.com/dbt-labs/dbt-redshift.git
git+https://github.com/dbt-labs/dbt-snowflake.git
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 Just pushed these in a commit and CI is running now.
Also added back this one along with dbt-redshift
and dbt-snowflake
:
git+https://github.com/dbt-labs/dbt-core.git#egg=dbt-postgres&subdirectory=plugins/postgres
Please let me know if I shouldn't have added that one in -- obviously easy to pull it back out.
resolves #598
This is a:
All pull requests from community contributors should target the
main
branch (default).Description & motivation
Follow-up to TODOs left from #577. Experiment with using
api.Column.translate_type
for our existingtype_*
macros.Feels good:
dbt_utils
macros, so that a project / package maintainer can still intervene / override if needed.What feels less good:
{{ api.Column.translate_type('string') }}
,{{ get_data_type('string') }}
, or{{ type_string() }}
?dbt-core
: We should aim to reconcile / consolidate theagate
type conversion methods withColumn
class type translation.Checklist
star()
source)limit_zero()
macro in place of the literal string:limit 0
dbt_utils.type_*
macros instead of explicit datatypes (e.g.dbt_utils.type_timestamp()
instead ofTIMESTAMP
— hah!