Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3263] [Feature] User space Jinja macro for converting DB API 2.0 type_codes into data_types #8900

Open
3 tasks done
dbeatty10 opened this issue Oct 25, 2023 · 1 comment
Labels
enhancement New feature or request model_contracts Refinement Maintainer input needed

Comments

@dbeatty10
Copy link
Contributor

dbeatty10 commented Oct 25, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Different data platforms may have specialized or non-started data types (think MONEY, UUID, GEOMETRY, GEOGRAPHY, etc.) This will cause issues when the type_code from the associated Python database driver is unrecognized.

See below for a prototype that allows converting an unrecognized type_code into the desired data_type.

Example prototype

#8887 (comment)

This prototype can be implemented directly by users. Alternatively, it could be re-written in such a way that it could be customized by adapter maintainers as-needed.

Decision criteria for where adapters-related logic should live

From @MichelleArk here:

In case it's helpful, my decision criteria for where adapters-related logic should live boils down to:

  1. If the implementation does not or should not vary by adapter -> dbt-core
  2. If it's adapter specific + has only one objectively correct implementation -> adapter interface, implemented in python for maintainers to overwrite as necessary.
    • This is also easier to unit test for adapter maintainers!
  3. If it's adapter specific + should be flexible to end-user tweaks and customizations -> adapter interface, implemented in jinja (user space) for end-users to customize entirely how they see fit for their dbt project.

As much as I'd like this mapping of type_code to data_type to have a single 'correct' implementation per adapter, I can see how user-defined types put a bit of a wrench in that. It may be possible to split the implementation into base types in python (could follow a similar pattern to #8357) and user-defined types, but that may introduce more complexity than it's worth...

One possible approach (to be refined)

  1. Create an adapter-specific base lookup table of type_codedata_type (within Python or Jinja)
  2. Allow an adapter-specific supplemental lookup table of type_codedata_type (in user-space Jinja)
  3. Create a dbt-core implementation that combines 1) and 2) above (with the latter taking precedence)
  4. Use 3) above within adapter-specific data_type_code_to_name (Python) implementations (see here and here)

Alternative approach 1

  1. As above
  2. As above
  3. Update get_column_schema_from_query to take 1) and 2) into account (with the latter taking precedence)

Alternative approaches 2 and 3

Same as the two approaches above, but with only a single adapter-specific lookup table (instead of separate base and supplemental lookup tables)

I haven't considered any of the above options deeply enough to have a preference yet.

Describe alternatives you've considered

The main alternative is doing these lookups with user-provided (or package-provided) macros.

The example prototype above can be entirely done by users without there being any interfaces added to dbt-core or adapters.

These macros can be defined in two places:

  1. macros directory of the user's dbt project
  2. within a dbt package installed by the user

It would be easier if dbt can provide base implementations and documentation though.

Who will this benefit?

A primary use-case is being able to properly display the actual/expected data types when contract enforcement fails (c.f. #8887).

A two-fer-one use-case is direct usage of get_column_schema_from_query (like described in #8877) so that Column.data_type contains legit data types.

This will benefit users of dbt-postgres that have data types like money, uuid, etc that don't automatically have a type_code lookup within psycopg2. It also would cover the case of geometry within the postgis distribution of postgres as well as user-defined types.

There might be other adapters besides dbt-postgres that have similar needs.

Are you interested in contributing this feature?

No response

Anything else?

No response

@dbeatty10 dbeatty10 added enhancement New feature or request triage labels Oct 25, 2023
@github-actions github-actions bot changed the title [Feature] User space Jinja macro for converting DB API 2.0 type_codes into data_types [CT-3263] [Feature] User space Jinja macro for converting DB API 2.0 type_codes into data_types Oct 25, 2023
@dbeatty10
Copy link
Contributor Author

@graciegoheen and @jtcohen6 discussed the following in in a video call earlier today:

Impact & utility

This seems primarily for users of postgres-based data platforms that support user-defined types or other atypical types (geography, money, etc). It might not be widely useful outside of dbt-postgres.

Priority

While this wouldn't be a high-priority for us, we're generally in favor of surfacing a macro that users can override that contains their own translation table.

Refinement

Marking as "Refinement" to determine which precise approach we'd like to take. The easiest would be a supplemental definition like this that takes precedence over any other definition (using "merge" rather than "clobber"):

{# Define your lookup table here #}
{% macro get_type_code_lookup_table() %}
  {% set lookup_table = {
      '790': 'MONEY',
      '2950': 'UUID',
  } %}
  {{ return(lookup_table) }}
{% endmacro %}

@dbeatty10 dbeatty10 added Refinement Maintainer input needed and removed triage labels Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model_contracts Refinement Maintainer input needed
Projects
None yet
Development

No branches or pull requests

2 participants