Catch empty state in incremental SAT #22353

bnchrch · 2023-02-03T02:55:08Z

What

Improves the incremental SAT test to catch the case in which the STATE message yields no cursor value

Whats left todo

Add unit tests
Break test into own test case
Discuss concerns with team

Concerns

@airbytehq/connector-operations I have a few areas im unsure on with these tests that I want to discuss before making any large changes

1. `records_with_state` can emit an empty list even if called with many records

One of the issues here is that the records_with_state function uses the continue function if no state_value is found for a record.

If this happens for all records then BOOM empty list.

In the previous context of the test_two_sequential_reads test that is considered a pass.

But to me that seems like an error, if no state can be infered from any record using the cursor, then they havent given us a passing test config.

Is that right?

Impact if so: changing the SAT to account for it may cause previously passing connectors to fail

2. In `test_two_sequential_reads` we do not ensure that we get data on the second read

We should make sure that the test case can test for two reads that result in data.

Is that right?

Impact if so: changing the SAT to account for it WILL cause previously passing connectors to fail. Latest version of sentry for example

github-actions · 2023-02-03T02:57:34Z

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

Run integration tests
Bump connector or module version
Add changelog
Publish the new version

❌ Sources (63)

Connector	Version	Changelog	Publish
`source-airtable`	`2.0.3`	✅	✅
`source-amazon-ads`	`1.0.0`	✅	✅
`source-amazon-seller-partner`	`0.2.31`	✅	✅
`source-amazon-sqs`	`0.1.0`	✅	✅
`source-amplitude`	`0.1.21`	✅	✅
`source-appsflyer`	`0.1.0`	✅	✅
`source-asana`	`0.1.5`	✅	✅
`source-azure-table`	`0.1.3`	✅	✅
`source-braintree`	`0.1.3`	✅	✅
`source-cart`	`0.2.0`	✅	✅
`source-chargebee`	`0.1.16`	✅	✅
`source-commercetools`	`0.1.0`	✅	✅
`source-confluence`	`0.1.1`	✅	✅
`source-datadog`	`0.1.0`	✅	✅
`source-delighted`	`0.2.0`	✅	✅
`source-drift`	`0.2.5`	✅	✅
`source-facebook-marketing`	`0.2.84`	✅	✅
`source-freshcaller`	`0.1.0`	✅	✅
`source-freshsales`	`0.1.2`	✅	✅
`source-freshservice`	`0.1.1`	✅	✅
`source-github`	`0.4.1`	✅	✅
`source-gitlab`	`1.0.2`	✅	✅
`source-google-ads`	`0.2.9`	✅	✅
`source-google-search-console`	`0.1.20`	✅	✅
`source-greenhouse`	`0.3.0`	✅	✅
`source-harvest`	`0.1.15`	✅	✅
`source-instagram`	`1.0.1`	✅	✅
`source-iterable`	`0.1.23`	✅	✅
`source-klaviyo`	`0.1.12`	✅	✅
`source-lemlist`	`0.1.1`	✅	✅
`source-lever-hiring`	`0.1.3`	✅	✅
`source-linnworks`	`0.1.5`	✅	✅
`source-mailchimp`	`0.3.4`	✅	✅
`source-mailgun`	`0.1.0`	✅	✅
`source-monday`	`0.2.2`	✅	✅
`source-notion`	`1.0.1`	✅	✅
`source-okta`	`0.1.14`	✅	✅
`source-onesignal`	`0.1.2`	✅	✅
`source-openweather`	`0.1.6`	✅	✅
`source-outreach`	`0.1.2`	✅	✅
`source-pardot`	`0.1.1`	✅	✅
`source-paystack`	`0.1.1`	✅	✅
`source-pinterest`	`0.2.2`	✅	✅
`source-pipedrive`	`0.1.13`	✅	✅
`source-plaid`	`0.3.2`	✅	✅
`source-posthog`	`0.1.8`	✅	✅
`source-prestashop`	`0.3.0`	✅	✅
`source-quickbooks-singer`	`0.1.5`	✅	✅
`source-recharge`	`0.2.6`	✅	❌ (diff seed version)
`source-retently`	`0.1.3`	✅	✅
`source-salesforce`	`2.0.1`	✅	✅
`source-salesloft`	`0.1.3`	✅	✅
`source-sendgrid`	`0.3.1`	✅	✅
`source-sentry`	`0.1.11`	✅	✅
`source-strava`	`0.1.2`	✅	✅
`source-surveymonkey`	`0.1.14`	✅	✅
`source-tplcentral`	`0.1.1`	✅	✅
`source-twilio`	`0.1.15`	✅	✅
`source-weatherstack`	`0.1.0`	✅	✅
`source-youtube-analytics`	`0.1.3`	✅	✅
`source-zendesk-sunshine`	`0.1.1`	✅	✅
`source-zendesk-talk`	`0.1.6`	✅	✅
`source-zenloop`	`0.1.4`	✅	✅

See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (0)

Connector	Version	Changelog	Publish

See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category	Status	Actionable Item
Version	❌ mismatch	The version of the connector is different from its normal variant. Please bump the version of the connector.
	⚠ doc not found	The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like `source-jdbc` is not published or documented). Please double-check to make sure that it is not a bug.
Changelog	⚠ doc not found	The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like `source-jdbc` is not published or documented). Please double-check to make sure that it is not a bug.
	❌ changelog missing	There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish	⚠ not in seed	The connector is not in the seed file (e.g. `source_definitions.yaml`), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.
	❌ diff seed version	The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the `/publish` command to publish the latest version.

erohmensing · 2023-02-03T16:14:12Z

Caveat: I'm also unsure, I haven't done much on incremental tests yet. That being said, my interpretation:

If this happens for all records then BOOM empty list.
In the previous context of the test_two_sequential_reads test that is considered a pass.
But to me that seems like an error, if no state can be infered from any record using the cursor, then they havent given us a passing test config.
Is that right?

I think that sounds right - if the cursor_path supplied in the acceptance-test-config.yml doesn't actually point to anything, that sounds like something that should fail and be updated 🤔

In test_two_sequential_reads we do not ensure that we get data on the second read
We should make sure that the test case can test for two reads that result in data.
Is that right?

Reading brian's message, it sounds to me like ensuring we don't get data on the second read was intentional - basically saying if we do the sync and output the state message that we already processed all the data, if the state message was correct, on a second run, we shouldn't have more data to process. I think we would have to artificially inject new data after the first sync in order to have newly updated cursor values (e.g. updated_at) to process in the second sync?

bnchrch · 2023-02-03T17:23:28Z

Reading brian's message, it sounds to me like ensuring we don't get data on the second read was intentional - basically saying if we do the sync and output the state message that we already processed all the data, if the state message was correct, on a second run, we shouldn't have more data to process. I think we would have to artificially inject new data after the first sync in order to have newly updated cursor values (e.g. updated_at) to process in the second sync?

Ok so that makes sense that we want to test the case second read of same data with new cursor yields no records

Should we explicitly assert that?

Currently in the code:

           # Redacted above: Run first read and assert valid records and state

            output = docker_runner.call_read_with_state(connector_config, configured_catalog_for_incremental, state=state_input)
            records = filter_output(output, type_=Type.RECORD)

            for record_value, state_value, stream_name in records_with_state(records, complete_state, stream_mapping, cursor_paths):
                assert compare_cursor_with_threshold(
                    record_value, state_value, threshold_days
                ), f"Second incremental sync should produce records older or equal to cursor value from the state. Stream: {stream_name}"

records = filter_output(output, type_=Type.RECORD) on second read can, and in the case of sentry:v0.1.9, does return None.

But sometimes it does not.

And only then we test for the case Second incremental sync should produce records older or equal to cursor value from the state

My worry here is by not having test cases explicit for

Cursor from records applied to same records yields None
Second read leads to records ahead of cursor from first read

We end up only ever testing one case but not the other.

😅 But I'm also worried that I dont understand the system well enough here, and this might be a non-issue

bnchrch · 2023-02-03T20:44:04Z

Just going to ping @evantahler and @pedroslopez to get your thoughts.

pedroslopez · 2023-02-03T21:56:12Z

records_with_state on second read can, and in the case of sentry, does return None.

But sometimes it does not.

And only then we test for the case Second incremental sync should produce records older or equal to cursor value from the state

This is intentionally structured this way because the implementation is kind of left up to the connector in how it wants to use state. Specifically, in whether > or >= is used when retrieving records from the API.

An example is, say you save state with the date 2020-01-01. Since this date isn't very granular, the implementation would likely use a "greater than or equal to" to retrieve the records so that we wouldn't miss any new record that may have been added on that date. This means that we may also get repeated records, but that's ok since we have "at least once" deliverability. If there wasn't any data at this point, or the connector implements the state using ">", then we would not get any records.

evantahler · 2023-02-03T21:56:30Z

I'm having trouble parsing that python code (I think the test might be backwards? Is that the new state or starting state that's being compared?) but I think the None case is weird. If the CDK were stronger-typed, I think READ should always produce at least [] rather than None, perhaps?

I also think that Airbyte is a "deliver at least once" system, and we should always be choosing to re-send a record if it's the safer thing to do. So, I think that all sources probably should be using >= cursor comparisons ant not >, and the test should always also produce at least one record from the previous read? I guess that's not a hard rule if the source can be sure that cursors are high-resolution and un-changing. There's another test with "abnormally large state" that should produce no records, where the cursor is far higher than what would be reasonable (e.g. updated_at=9999-01-01)

bnchrch · 2023-02-03T22:16:01Z

@pedroslopez

I realized I had a typo in the message you quoted. (Sorry!)

Instead of records_with_state I meant records = filter_output(output, type_=Type.RECORD)

The second read from test_incremental_two_sequential_reads for sentry currently returns no records. Even before we filter for state.

(I updated the previous comment to correct this)

Does that change your answer at all?

pedroslopez · 2023-02-03T22:18:26Z

@bechurch the filtering I'm referring to happens on the source side or by the API it calls, not something in our testing code. It could still come back with or without records for the reasons I mentioned (no new records have been emitted / using > in the source would lead to no records; or using >= in the source would lead to repeated records)

bnchrch · 2023-02-03T22:22:30Z

@pedroslopez Ok that makes sense.

I'm still stuck on one thing though.

Don't we want to have connectors give us enough data to be able to test in the context of "sequential reads, that both return data"?

bnchrch · 2023-02-03T23:15:28Z

FYI from my wonderful call with Pedro

I'm going to keep this PR focused around empty state during first pass
Im going to add an issue to grooming around testing the return the of second read for data as its likely a useful test case we aren't currently supporting
We likely want to split these cases out into separate named tests

bnchrch · 2023-02-08T17:45:01Z

@connector-operations This is ready for review again.

alafanechere

Nice, could you please run a couple of /test and the connectors that have this problem?

bnchrch · 2023-02-09T22:17:54Z

/test connector=connectors/source-sentry

Note: This test will fail.... but not for the test ive added

🕑 connectors/source-sentry https://github.com/airbytehq/airbyte/actions/runs/4138900260
❌ connectors/source-sentry https://github.com/airbytehq/airbyte/actions/runs/4138900260
🐛 https://gradle.com/s/ajeiok3rxxvnu

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestBasicRead::test_read[inputs0] - Failed: Stream proje...
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:98: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:507: The previous and actual discovered catalogs are identical.
============== 1 failed, 35 passed, 2 skipped in 60.31s (0:01:00) ==============

bnchrch · 2023-02-09T22:18:55Z

/test connector=connectors/source-airtable

🕑 connectors/source-airtable https://github.com/airbytehq/airbyte/actions/runs/4138906771
✅ connectors/source-airtable https://github.com/airbytehq/airbyte/actions/runs/4138906771
Python tests coverage:

Name                                Stmts   Miss  Cover
-------------------------------------------------------
source_airtable/__init__.py             2      0   100%
source_airtable/streams.py             92      1    99%
source_airtable/source.py              38      1    97%
source_airtable/schema_helpers.py      48      9    81%
source_airtable/auth.py                21     10    52%
-------------------------------------------------------
TOTAL                                 201     21    90%
	 Name                                                    Stmts   Miss  Cover   Missing
	 -------------------------------------------------------------------------------------
	 connector_acceptance_test/base.py                          12      4    67%   16-19
	 connector_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 connector_acceptance_test/conftest.py                     217    101    53%   37, 43-45, 50, 55, 78, 84, 90-92, 111, 116-118, 124-126, 132-133, 138-139, 144, 150, 159-168, 174-179, 194, 218, 249, 255, 263-271, 279-292, 300-313, 318-324, 331-342, 349-365
	 connector_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 connector_acceptance_test/tests/test_core.py              476    117    75%   53, 58, 97-108, 113-120, 124-125, 129-130, 380, 400, 438, 476-493, 506-517, 521-526, 532, 565-570, 608-615, 658-660, 663, 728-736, 748-751, 756, 812-813, 819, 822, 858-868, 881-906
	 connector_acceptance_test/tests/test_incremental.py       162     14    91%   58-65, 70-83, 252
	 connector_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 connector_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 connector_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 connector_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 connector_acceptance_test/utils/json_schema_helper.py     114     13    89%   31-32, 39, 42, 66-69, 97, 121, 203-205
	 -------------------------------------------------------------------------------------
	 TOTAL                                                    1711    347    80%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: Incremental syncs are not supported on this connector.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:98: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:507: The previous and actual discovered catalogs are identical.
================== 44 passed, 4 skipped in 142.99s (0:02:22) ===================

bnchrch · 2023-02-09T22:20:06Z

/test connector=connectors/source-twilio

🕑 connectors/source-twilio https://github.com/airbytehq/airbyte/actions/runs/4138914837
✅ connectors/source-twilio https://github.com/airbytehq/airbyte/actions/runs/4138914837
Python tests coverage:

Name                        Stmts   Miss  Cover
-----------------------------------------------
source_twilio/auth.py           8      0   100%
source_twilio/__init__.py       2      0   100%
source_twilio/source.py        28      1    96%
source_twilio/streams.py      283     22    92%
-----------------------------------------------
TOTAL                         321     23    93%
	 Name                                                    Stmts   Miss  Cover   Missing
	 -------------------------------------------------------------------------------------
	 connector_acceptance_test/base.py                          12      4    67%   16-19
	 connector_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 connector_acceptance_test/conftest.py                     217    101    53%   37, 43-45, 50, 55, 78, 84, 90-92, 111, 116-118, 124-126, 132-133, 138-139, 144, 150, 159-168, 174-179, 194, 218, 249, 255, 263-271, 279-292, 300-313, 318-324, 331-342, 349-365
	 connector_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 connector_acceptance_test/tests/test_core.py              476    117    75%   53, 58, 97-108, 113-120, 124-125, 129-130, 380, 400, 438, 476-493, 506-517, 521-526, 532, 565-570, 608-615, 658-660, 663, 728-736, 748-751, 756, 812-813, 819, 822, 858-868, 881-906
	 connector_acceptance_test/tests/test_incremental.py       162     14    91%   58-65, 70-83, 252
	 connector_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 connector_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 connector_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 connector_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 connector_acceptance_test/utils/json_schema_helper.py     114     13    89%   31-32, 39, 42, 66-69, 97, 121, 203-205
	 -------------------------------------------------------------------------------------
	 TOTAL                                                    1711    347    80%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:98: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:507: The previous and actual discovered catalogs are identical.
================== 38 passed, 2 skipped in 1991.53s (0:33:11) ==================

* Catch state being empty * Update test_two_sequential_reads to catch empty state on first read * Add integration test of empty state * Fix legacy state test * Move state_name to variable * Clean up * Format * Fix rogue test

bnchrch added the connectors/source/test label Feb 3, 2023

bnchrch requested a review from a team February 3, 2023 02:55

bnchrch self-assigned this Feb 3, 2023

bnchrch force-pushed the bechurch/21863-add-incremental-sync-state-empty-test branch from 3d053b3 to 7eda185 Compare February 8, 2023 17:43

bnchrch marked this pull request as ready for review February 8, 2023 17:44

bnchrch changed the title ~~Draft: Catch empty state in incremental SAT~~ Catch empty state in incremental SAT Feb 8, 2023

bnchrch force-pushed the bechurch/21863-add-incremental-sync-state-empty-test branch from 7eda185 to df81f30 Compare February 8, 2023 22:04

bnchrch removed the request for review from a team February 9, 2023 16:37

bnchrch added 8 commits February 9, 2023 08:41

Catch state being empty

dd69775

Update test_two_sequential_reads to catch empty state on first read

c58a5f5

Add integration test of empty state

7e204b8

Fix legacy state test

bfcca79

Move state_name to variable

b7bb47d

Clean up

de643bf

Format

4447a46

Fix rogue test

657ccf9

bnchrch force-pushed the bechurch/21863-add-incremental-sync-state-empty-test branch from 9672dbc to 657ccf9 Compare February 9, 2023 16:45

bnchrch requested review from a team and brianjlai February 9, 2023 16:52

bnchrch requested a review from pedroslopez February 9, 2023 18:25

alafanechere approved these changes Feb 9, 2023

View reviewed changes

bnchrch merged commit ce770d3 into master Feb 9, 2023

bnchrch deleted the bechurch/21863-add-incremental-sync-state-empty-test branch February 9, 2023 23:19

octavia-squidington-iii mentioned this pull request Feb 10, 2023

Bump Airbyte version from 0.40.32 to 0.40.33 #22689

Closed

github-actions bot mentioned this pull request Feb 10, 2023

Bump helm chart version reference to 0.43.36 #22806

Closed

octavia-squidington-iii mentioned this pull request Feb 10, 2023

Bump Airbyte version from 0.40.32 to 0.40.33 #22839

Closed

alafanechere mentioned this pull request Feb 15, 2023

connector-acceptance-test: incremental tests failures on java sources #23098

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch empty state in incremental SAT #22353

Catch empty state in incremental SAT #22353

bnchrch commented Feb 3, 2023 •

edited

Loading

github-actions bot commented Feb 3, 2023 •

edited

Loading

❌ Sources (63)

✅ Destinations (0)

✅ Other Modules (0)

Actionable Items

erohmensing commented Feb 3, 2023 •

edited

Loading

bnchrch commented Feb 3, 2023 •

edited

Loading

bnchrch commented Feb 3, 2023

pedroslopez commented Feb 3, 2023

evantahler commented Feb 3, 2023 •

edited

Loading

bnchrch commented Feb 3, 2023

pedroslopez commented Feb 3, 2023 •

edited

Loading

bnchrch commented Feb 3, 2023

bnchrch commented Feb 3, 2023

bnchrch commented Feb 8, 2023

alafanechere left a comment

bnchrch commented Feb 9, 2023 •

edited by github-actions bot

Loading

bnchrch commented Feb 9, 2023 •

edited by github-actions bot

Loading

bnchrch commented Feb 9, 2023 •

edited by github-actions bot

Loading

Catch empty state in incremental SAT #22353

Catch empty state in incremental SAT #22353

Conversation

bnchrch commented Feb 3, 2023 • edited Loading

What

Whats left todo

Concerns

1. records_with_state can emit an empty list even if called with many records

2. In test_two_sequential_reads we do not ensure that we get data on the second read

github-actions bot commented Feb 3, 2023 • edited Loading

Affected Connector Report

❌ Sources (63)

✅ Destinations (0)

✅ Other Modules (0)

Actionable Items

erohmensing commented Feb 3, 2023 • edited Loading

bnchrch commented Feb 3, 2023 • edited Loading

bnchrch commented Feb 3, 2023

pedroslopez commented Feb 3, 2023

evantahler commented Feb 3, 2023 • edited Loading

bnchrch commented Feb 3, 2023

pedroslopez commented Feb 3, 2023 • edited Loading

bnchrch commented Feb 3, 2023

bnchrch commented Feb 3, 2023

bnchrch commented Feb 8, 2023

alafanechere left a comment

Choose a reason for hiding this comment

bnchrch commented Feb 9, 2023 • edited by github-actions bot Loading

Build Failed

bnchrch commented Feb 9, 2023 • edited by github-actions bot Loading

Build Passed

bnchrch commented Feb 9, 2023 • edited by github-actions bot Loading

Build Passed

bnchrch commented Feb 3, 2023 •

edited

Loading

1. `records_with_state` can emit an empty list even if called with many records

2. In `test_two_sequential_reads` we do not ensure that we get data on the second read

github-actions bot commented Feb 3, 2023 •

edited

Loading

erohmensing commented Feb 3, 2023 •

edited

Loading

bnchrch commented Feb 3, 2023 •

edited

Loading

evantahler commented Feb 3, 2023 •

edited

Loading

pedroslopez commented Feb 3, 2023 •

edited

Loading

bnchrch commented Feb 9, 2023 •

edited by github-actions bot

Loading

bnchrch commented Feb 9, 2023 •

edited by github-actions bot

Loading

bnchrch commented Feb 9, 2023 •

edited by github-actions bot

Loading