-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce normalization integration tests #3025
Introduce normalization integration tests #3025
Conversation
…, will be covering it in integration tests instead
787a4a3
to
3d85a54
Compare
/test connector=bases/base-normalization
|
I'm still finishing to write documentation/readme on this |
/test connector=bases/base-normalization
|
...integrations/bases/base-normalization/integration_tests/resources/exchange_rate/catalog.json
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChristopheDuong this is pretty cool!! I appreciate the readme that is a guide to how to discover all of the tests. I think that will make it much easier for us to navigate.
I remain a little worried that it will be a bit hard to learn some of the rules of normalization because of the way the test cases are split across so many files. Each test case relies on 3 files, so I need to open all of them at once and carefully track how each thing changes as opposed to just having a test that has a name or a comment that explains the behavior we are looking for. That all being said, I think I appreciate how what I'm describing would be hard to do for each different database. You have a clever way with the file naming conventions of handling the different behaviors for different databases. Let's go with the approach as you have it now and we can always iterate from there if we need to.
I would love for you to demo how this works for the team. Especially
- how they can use the docs you've written to find out the rules of normalization
- live demo of how the diffing stuff works so that you can see how changes you make it normalization change the dbt models.
do you want to figure out a time during one of the syncs next week to present it? this would also be good tooling to demo to the community as well.
* Version generated/output files from normalization integration tests * simplify cast of float columns to string when used as partition key (#3027) * bump version of normalization image
/publish connector=bases/base-normalization
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this makes it a lot safer to make changes to normalization!
secrets/ | ||
|
||
# ignore files copied from dbt-project-template | ||
integration_tests/normalization_test_output/*/*/macros |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to exclude all of integration_tests/normalization_test_output
and specifically include paths that match the generated files that we want included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
I wasn't aware we could do that... thanks
by normalization and dbt (directly in PR too). (_Simply refer to your test suite name in the | ||
`git_versionned_tests` variable in the `base-normalization/integration_tests/test_normalization.py` file_) | ||
|
||
We would typically choose small and meaningful test suites to include in git while others more complex tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it bad to include the larger test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would generate a lot more files and makes PR heavier to review, so it's up to us to decide what to include/exclude, we have the choice :)
) | ||
def test_normalization(integration_type: str, test_resource_name: str, setup_test_path): | ||
print("Testing normalization") | ||
destination_type = DestinationType.from_string(integration_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: couldn't test_normalization
be parameterized in terms of DestinationType
values?
|
||
|
||
@pytest.fixture(scope="package", autouse=True) | ||
def before_all_tests(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might be nicer to have fixture creation nearer to the start of the file
if the test_resource_name is part of git_versionned_tests, then dbt models and final sql outputs | ||
will be written to a folder included in airbyte git repository. | ||
|
||
Non-versionned tests will be written in /tmp folders instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-versionned tests will be written in /tmp folders instead. | |
Non-versioned tests will be written in /tmp folders instead. |
- see additional macros for testing here: https://github.com/fishtown-analytics/dbt-utils#schema-tests | ||
- Data tests are added in .sql files from the data_tests directory and should return 0 records to be successful | ||
|
||
We use this mecanism to verify the output of our integration tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use this mecanism to verify the output of our integration tests. | |
We use this mechanism to verify the output of our integration tests. |
@@ -178,10 +178,11 @@ def write_yaml_sources_file(self, schema_to_source_tables: Dict[str, Set[str]]): | |||
Generate the sources.yaml file as described in https://docs.getdbt.com/docs/building-a-dbt-project/using-sources/ | |||
""" | |||
schemas = [] | |||
for schema in schema_to_source_tables: | |||
for entry in sorted(schema_to_source_tables.items(), key=lambda kv: kv[1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these need to be sorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a map of table names, this is to keep it consistent across runs, so they appear in the same order every time in the file and don't cause unnecessary diffs because their position were just inverted randomly
Co-authored-by: Jared Rhizor <jared@dataline.io>
What
Closes #2750
Integration tests replicating the standard destination tests but focused on normalization (dbt) instead.
How
I will also add more texts in the readme to describe what this is introducing
You can look at an example PR fixing a bug and affecting the integration tests outputs that are in git with this PR here: #3027
Pre-merge Checklist
Recommended reading order
test.java
component.ts