Introduce normalization integration tests #3025

ChristopheDuong · 2021-04-22T17:30:51Z

What

Closes #2750
Integration tests replicating the standard destination tests but focused on normalization (dbt) instead.

How

I will also add more texts in the readme to describe what this is introducing

You can look at an example PR fixing a bug and affecting the integration tests outputs that are in git with this PR here: #3027

Pre-merge Checklist

Run integration tests
Publish Docker images

Recommended reading order

test.java
component.ts
the rest

…, will be covering it in integration tests instead

ChristopheDuong · 2021-04-23T09:02:18Z

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/777177153
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/777177153

ChristopheDuong · 2021-04-23T18:15:30Z

I'm still finishing to write documentation/readme on this

ChristopheDuong · 2021-04-23T18:15:48Z

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/778628229
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/778628229

...integrations/bases/base-normalization/integration_tests/resources/exchange_rate/catalog.json

cgardens

@ChristopheDuong this is pretty cool!! I appreciate the readme that is a guide to how to discover all of the tests. I think that will make it much easier for us to navigate.

I remain a little worried that it will be a bit hard to learn some of the rules of normalization because of the way the test cases are split across so many files. Each test case relies on 3 files, so I need to open all of them at once and carefully track how each thing changes as opposed to just having a test that has a name or a comment that explains the behavior we are looking for. That all being said, I think I appreciate how what I'm describing would be hard to do for each different database. You have a clever way with the file naming conventions of handling the different behaviors for different databases. Let's go with the approach as you have it now and we can always iterate from there if we need to.

I would love for you to demo how this works for the team. Especially

how they can use the docs you've written to find out the rules of normalization
live demo of how the diffing stuff works so that you can see how changes you make it normalization change the dbt models.

do you want to figure out a time during one of the syncs next week to present it? this would also be good tooling to demo to the community as well.

* Version generated/output files from normalization integration tests * simplify cast of float columns to string when used as partition key (#3027) * bump version of normalization image

ChristopheDuong · 2021-04-27T08:02:59Z

/publish connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/788346118
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/788346118

jrhizor

I feel like this makes it a lot safer to make changes to normalization!

jrhizor · 2021-04-26T16:15:44Z

airbyte-integrations/bases/base-normalization/.gitignore

+secrets/
+
+# ignore files copied from dbt-project-template
+integration_tests/normalization_test_output/*/*/macros


Would it make more sense to exclude all of integration_tests/normalization_test_output and specifically include paths that match the generated files that we want included?

yes!

I wasn't aware we could do that... thanks

airbyte-integrations/bases/base-normalization/README.md

jrhizor · 2021-04-26T16:24:10Z

airbyte-integrations/bases/base-normalization/README.md

+by normalization and dbt (directly in PR too). (_Simply refer to your test suite name in the
+`git_versionned_tests` variable in the `base-normalization/integration_tests/test_normalization.py` file_)
+
+We would typically choose small and meaningful test suites to include in git while others more complex tests


Is it bad to include the larger test cases?

it would generate a lot more files and makes PR heavier to review, so it's up to us to decide what to include/exclude, we have the choice :)

jrhizor · 2021-04-26T16:41:35Z

airbyte-integrations/bases/base-normalization/integration_tests/test_normalization.py

+)
+def test_normalization(integration_type: str, test_resource_name: str, setup_test_path):
+    print("Testing normalization")
+    destination_type = DestinationType.from_string(integration_type)


nit: couldn't test_normalization be parameterized in terms of DestinationType values?

jrhizor · 2021-04-26T16:45:41Z

airbyte-integrations/bases/base-normalization/integration_tests/test_normalization.py

+
+
+@pytest.fixture(scope="package", autouse=True)
+def before_all_tests(request):


nit: might be nicer to have fixture creation nearer to the start of the file

jrhizor · 2021-04-27T07:54:54Z

airbyte-integrations/bases/base-normalization/integration_tests/test_normalization.py

+    if the test_resource_name is part of git_versionned_tests, then dbt models and final sql outputs
+    will be written to a folder included in airbyte git repository.
+
+    Non-versionned tests will be written in /tmp folders instead.


Suggested change

Non-versionned tests will be written in /tmp folders instead.

Non-versioned tests will be written in /tmp folders instead.

jrhizor · 2021-04-27T07:59:22Z

airbyte-integrations/bases/base-normalization/integration_tests/test_normalization.py

+        - see additional macros for testing here: https://github.com/fishtown-analytics/dbt-utils#schema-tests
+    - Data tests are added in .sql files from the data_tests directory and should return 0 records to be successful
+
+    We use this mecanism to verify the output of our integration tests.


Suggested change

We use this mecanism to verify the output of our integration tests.

We use this mechanism to verify the output of our integration tests.

jrhizor · 2021-04-27T08:00:56Z

...e-integrations/bases/base-normalization/normalization/transform_catalog/catalog_processor.py

@@ -178,10 +178,11 @@ def write_yaml_sources_file(self, schema_to_source_tables: Dict[str, Set[str]]):
        Generate the sources.yaml file as described in https://docs.getdbt.com/docs/building-a-dbt-project/using-sources/
        """
        schemas = []
-        for schema in schema_to_source_tables:
+        for entry in sorted(schema_to_source_tables.items(), key=lambda kv: kv[1]):


Why do these need to be sorted?

it's a map of table names, this is to keep it consistent across runs, so they appear in the same order every time in the file and don't cause unnecessary diffs because their position were just inverted randomly

Co-authored-by: Jared Rhizor <jared@dataline.io>

ChristopheDuong added 3 commits April 21, 2021 16:35

Speed normalization unit tests by dropping hubspot catalog (too heavy…

c7d03bc

…, will be covering it in integration tests instead

Add integration tests for normalization

2a0054b

Add dedup test case

88eebac

auto-assign bot requested review from davinchia and jrhizor April 22, 2021 17:30

ChristopheDuong added 2 commits April 22, 2021 19:32

adjust build.gradle

64fe322

add readme for normalization

3d85a54

ChristopheDuong force-pushed the chris/normalization-integration-tests branch from 787a4a3 to 3d85a54 Compare April 22, 2021 17:32

This was referenced Apr 22, 2021

Normalization integration tests output #3026

Merged

Simplify normalization dedup in bigquery #3027

Merged

ChristopheDuong added 3 commits April 22, 2021 20:27

Merge remote-tracking branch 'origin/master' into normalization-tests

73277e8

Share PATH env variable with subprocess calls

7b43243

Handle git non-versionned tests vs versionned ones

13d8eeb

Format code

87aeb5c

ChristopheDuong marked this pull request as draft April 23, 2021 14:09

ChristopheDuong added 2 commits April 23, 2021 20:11

Add tests check to normalization integration tests

e388d53

Merge remote-tracking branch 'origin/master' into normalization-tests

6985cd9

ChristopheDuong marked this pull request as ready for review April 23, 2021 18:15

auto-assign bot requested a review from sherifnada April 23, 2021 18:15

Add docs

fc54ef6

michel-tricot reviewed Apr 23, 2021

View reviewed changes

...integrations/bases/base-normalization/integration_tests/resources/exchange_rate/catalog.json Outdated Show resolved Hide resolved

cgardens approved these changes Apr 24, 2021

View reviewed changes

ChristopheDuong added 5 commits April 26, 2021 11:15

Merge remote-tracking branch 'origin/master' into normalization-tests

05667a2

complete docs on normalization integration tests

096f780

format code

17dfca2

Normalization integration tests output (#3026)

998d434

* Version generated/output files from normalization integration tests * simplify cast of float columns to string when used as partition key (#3027) * bump version of normalization image

Merge remote-tracking branch 'origin/master' into normalization-tests

f96abd7

jrhizor approved these changes Apr 27, 2021

View reviewed changes

ChristopheDuong and others added 2 commits April 27, 2021 10:23

Apply suggestions from code review

92a066e

Co-authored-by: Jared Rhizor <jared@dataline.io>

Apply suggestions from code review

ab068ea

ChristopheDuong merged commit c2fa3e4 into master Apr 27, 2021

ChristopheDuong deleted the chris/normalization-integration-tests branch April 27, 2021 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce normalization integration tests #3025

Introduce normalization integration tests #3025

ChristopheDuong commented Apr 22, 2021 •

edited

Loading

ChristopheDuong commented Apr 23, 2021 •

edited by github-actions bot

Loading

ChristopheDuong commented Apr 23, 2021

ChristopheDuong commented Apr 23, 2021 •

edited by github-actions bot

Loading

cgardens left a comment

ChristopheDuong commented Apr 27, 2021 •

edited by github-actions bot

Loading

jrhizor left a comment

jrhizor Apr 26, 2021

ChristopheDuong Apr 27, 2021

jrhizor Apr 26, 2021

ChristopheDuong Apr 27, 2021

jrhizor Apr 26, 2021

jrhizor Apr 26, 2021

jrhizor Apr 27, 2021

jrhizor Apr 27, 2021

jrhizor Apr 27, 2021

ChristopheDuong Apr 27, 2021 •

edited

Loading



		@pytest.fixture(scope="package", autouse=True)
		def before_all_tests(request):

	Non-versionned tests will be written in /tmp folders instead.
	Non-versioned tests will be written in /tmp folders instead.

	We use this mecanism to verify the output of our integration tests.
	We use this mechanism to verify the output of our integration tests.

Introduce normalization integration tests #3025

Introduce normalization integration tests #3025

Conversation

ChristopheDuong commented Apr 22, 2021 • edited Loading

What

How

Pre-merge Checklist

Recommended reading order

ChristopheDuong commented Apr 23, 2021 • edited by github-actions bot Loading

ChristopheDuong commented Apr 23, 2021

ChristopheDuong commented Apr 23, 2021 • edited by github-actions bot Loading

cgardens left a comment

Choose a reason for hiding this comment

ChristopheDuong commented Apr 27, 2021 • edited by github-actions bot Loading

jrhizor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChristopheDuong Apr 27, 2021 • edited Loading

Choose a reason for hiding this comment

ChristopheDuong commented Apr 22, 2021 •

edited

Loading

ChristopheDuong commented Apr 23, 2021 •

edited by github-actions bot

Loading

ChristopheDuong commented Apr 23, 2021 •

edited by github-actions bot

Loading

ChristopheDuong commented Apr 27, 2021 •

edited by github-actions bot

Loading

ChristopheDuong Apr 27, 2021 •

edited

Loading