Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce normalization integration tests #3025

Merged
merged 19 commits into from
Apr 27, 2021

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Apr 22, 2021

What

Closes #2750
Integration tests replicating the standard destination tests but focused on normalization (dbt) instead.

How

I will also add more texts in the readme to describe what this is introducing

You can look at an example PR fixing a bug and affecting the integration tests outputs that are in git with this PR here: #3027

Pre-merge Checklist

  • Run integration tests
  • Publish Docker images

Recommended reading order

  1. test.java
  2. component.ts
  3. the rest

@auto-assign auto-assign bot requested review from davinchia and jrhizor April 22, 2021 17:30
@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Apr 23, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/777177153
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/777177153

@ChristopheDuong ChristopheDuong marked this pull request as draft April 23, 2021 14:09
@ChristopheDuong ChristopheDuong marked this pull request as ready for review April 23, 2021 18:15
@auto-assign auto-assign bot requested a review from sherifnada April 23, 2021 18:15
@ChristopheDuong
Copy link
Contributor Author

I'm still finishing to write documentation/readme on this

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Apr 23, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/778628229
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/778628229

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopheDuong this is pretty cool!! I appreciate the readme that is a guide to how to discover all of the tests. I think that will make it much easier for us to navigate.

I remain a little worried that it will be a bit hard to learn some of the rules of normalization because of the way the test cases are split across so many files. Each test case relies on 3 files, so I need to open all of them at once and carefully track how each thing changes as opposed to just having a test that has a name or a comment that explains the behavior we are looking for. That all being said, I think I appreciate how what I'm describing would be hard to do for each different database. You have a clever way with the file naming conventions of handling the different behaviors for different databases. Let's go with the approach as you have it now and we can always iterate from there if we need to.

I would love for you to demo how this works for the team. Especially

  1. how they can use the docs you've written to find out the rules of normalization
  2. live demo of how the diffing stuff works so that you can see how changes you make it normalization change the dbt models.

do you want to figure out a time during one of the syncs next week to present it? this would also be good tooling to demo to the community as well.

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Apr 27, 2021

/publish connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/788346118
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/788346118

Copy link
Contributor

@jrhizor jrhizor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this makes it a lot safer to make changes to normalization!

secrets/

# ignore files copied from dbt-project-template
integration_tests/normalization_test_output/*/*/macros
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to exclude all of integration_tests/normalization_test_output and specifically include paths that match the generated files that we want included?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

I wasn't aware we could do that... thanks

airbyte-integrations/bases/base-normalization/README.md Outdated Show resolved Hide resolved
airbyte-integrations/bases/base-normalization/README.md Outdated Show resolved Hide resolved
airbyte-integrations/bases/base-normalization/README.md Outdated Show resolved Hide resolved
by normalization and dbt (directly in PR too). (_Simply refer to your test suite name in the
`git_versionned_tests` variable in the `base-normalization/integration_tests/test_normalization.py` file_)

We would typically choose small and meaningful test suites to include in git while others more complex tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it bad to include the larger test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would generate a lot more files and makes PR heavier to review, so it's up to us to decide what to include/exclude, we have the choice :)

)
def test_normalization(integration_type: str, test_resource_name: str, setup_test_path):
print("Testing normalization")
destination_type = DestinationType.from_string(integration_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: couldn't test_normalization be parameterized in terms of DestinationType values?



@pytest.fixture(scope="package", autouse=True)
def before_all_tests(request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be nicer to have fixture creation nearer to the start of the file

if the test_resource_name is part of git_versionned_tests, then dbt models and final sql outputs
will be written to a folder included in airbyte git repository.

Non-versionned tests will be written in /tmp folders instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Non-versionned tests will be written in /tmp folders instead.
Non-versioned tests will be written in /tmp folders instead.

- see additional macros for testing here: https://github.com/fishtown-analytics/dbt-utils#schema-tests
- Data tests are added in .sql files from the data_tests directory and should return 0 records to be successful

We use this mecanism to verify the output of our integration tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We use this mecanism to verify the output of our integration tests.
We use this mechanism to verify the output of our integration tests.

@@ -178,10 +178,11 @@ def write_yaml_sources_file(self, schema_to_source_tables: Dict[str, Set[str]]):
Generate the sources.yaml file as described in https://docs.getdbt.com/docs/building-a-dbt-project/using-sources/
"""
schemas = []
for schema in schema_to_source_tables:
for entry in sorted(schema_to_source_tables.items(), key=lambda kv: kv[1]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these need to be sorted?

Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a map of table names, this is to keep it consistent across runs, so they appear in the same order every time in the file and don't cause unnecessary diffs because their position were just inverted randomly

ChristopheDuong and others added 2 commits April 27, 2021 10:23
@ChristopheDuong ChristopheDuong merged commit c2fa3e4 into master Apr 27, 2021
@ChristopheDuong ChristopheDuong deleted the chris/normalization-integration-tests branch April 27, 2021 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests for Normalization
4 participants