Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array macros #5823

Merged
merged 12 commits into from
Sep 26, 2022
Merged

Array macros #5823

merged 12 commits into from
Sep 26, 2022

Conversation

dbeatty10
Copy link
Contributor

@dbeatty10 dbeatty10 commented Sep 13, 2022

related to #5520

Description

All the related pull requests:

Checklist

@dbeatty10 dbeatty10 added the Skip Changelog Skips GHA to check for changelog file label Sep 13, 2022
@cla-bot cla-bot bot added the cla:yes label Sep 13, 2022
@dbeatty10 dbeatty10 removed the Skip Changelog Skips GHA to check for changelog file label Sep 13, 2022
@jtcohen6 jtcohen6 added the Team:Adapters Issues designated for the adapter area of the code label Sep 13, 2022
@dbeatty10 dbeatty10 marked this pull request as ready for review September 22, 2022 01:22
@dbeatty10 dbeatty10 requested review from a team as code owners September 22, 2022 01:22
@dbeatty10 dbeatty10 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Sep 22, 2022
@dbeatty10
Copy link
Contributor Author

This is ready for review.

After it is approved and merged, I will revert the changes to dev-requirements.txt in each of the adapter PRs before merging each of them:

models__array_append_expected_sql = """
select 1 as id, {{ array_construct([1,2,3,4]) }} as array_col union all
select 2 as id, {{ array_construct([4]) }} as array_col
""".lstrip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the .lstrip() for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! In this particular case, it isn't needed, and I can remove it :)

More detail:
.lstrip() strips off the leading whitespace so that there isn't an empty newline at the beginning of this "file" (string that ends up being materialized as a file for pytest'ing purposes).

It's useful for seed CSV "files" in particular because it allows the text to be more readable. You can see examples with git grep "lstrip" ./.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation!

# check types equal
expected_cols = get_relation_columns(project.adapter, "expected")
actual_cols = get_relation_columns(project.adapter, "actual")
print(f"Expected: {expected_cols}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to print these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly necessary. Copied from here.

Having this printing in here is particularly handy if there is a test failure in the future. Definitely came in handy for me during development too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general leaving print statements in can result in debug logs being full of extraneous statements that you then have to sift through so I'd recommend removing it (or just commenting it out so if someone wants to use it to debug they can just remove the comments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think print() statements in pytest tests only go to go to standard output (stdout) in two cases:

  1. when there is a test failure
  2. if you add the --show-capture command-line option (like pytest ... -s)

Otherwise, pytest "captures" all output so that you don't see anything other than the test summary.

{# all inputs must be the same data type to match postgres functionality #}
{% macro default__array_construct(inputs, data_type) -%}
{% if inputs|length > 0 %}
array[ {{ inputs|join(' , ') }} ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: do we generally indent within an if block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we have a style guide for internal Jinja or not. This particular implementation was copied-pasted from dbt-utils as-is.

{% if inputs|length > 0 %}
array[ {{ inputs|join(' , ') }} ]
{% else %}
array[]::{{data_type}}[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the adapter specific PRs don't have the functionality to return an empty array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dbt-core PR includes multiple instances of array_construct([]), namely within these tests:

  • tests/adapter/dbt/tests/adapter/utils/fixture_array_construct.py
  • tests/adapter/dbt/tests/adapter/utils/fixture_array_concat.py
  • tests/adapter/dbt/tests/adapter/utils/test_array_append.py

I was hoping that these are sufficient to establish that the adapters handle the base case of an empty array. But if you take a look at those and it doesn't seem sufficient, then let's discuss it further to update the tests accordingly!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as we're validating via tests that we don't need to implement the if condition for each adapter that's good to me

{%- endmacro %}

{% macro default__array_concat(array_1, array_2) -%}
array_cat({{ array_1 }}, {{ array_2 }})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if array_concat is more common should we use that as the default instead of postgres syntax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between Postgres, Snowflake, BigQuery, and Redshift, it's six of one, half a dozen of the other.

Spark/Databricks refuses to break the tie by using a unique function name for array concatenation:

I don't think this function name is covered in the SQL standard either.


Original implementations in dbt-utils

In an alternate universe:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such is life (and sql standards)!

@@ -0,0 +1,12 @@
# array_construct

models__array_construct_expected_sql = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how other folks feel but the python standard for constants is to use upper case, see. As I understand the previous convention was based on emulating how we match macros in jinja but since we're not doing that here it's not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lowercase capitalization you see here is consistent with the tests I'm familiar with. You can see a sample of those existing tests via git grep '_sql = """' ./.

In terms of broader compliance with the PEP 8 style guide, we do use flake8 checks in CI and in pre-commit checks. I haven't read the nitty gritty details of what flake8 does/doesn't cover though. You can see our flake8 config here.

Copy link
Contributor

@colin-rogers-dbt colin-rogers-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style/testing nits but overall LGTM

@dbeatty10 dbeatty10 merged commit 0487b96 into main Sep 26, 2022
@dbeatty10 dbeatty10 deleted the dbeatty/lift-shift-array-macros branch September 26, 2022 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants