Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source Iterable: Add email validation to list_users stream method execution #8380

Merged
merged 12 commits into from
Dec 6, 2021

Conversation

htrueman
Copy link
Contributor

@htrueman htrueman commented Dec 1, 2021

What

Closes https://github.com/airbytehq/oncall/issues/43

The broken emails may appear in https://api.iterable.com/api/lists/getUsers response. That affects the Events stream read as we we use these emails to generate the Events stream urls so we get the 404 Error.

How

  • We need to validate the https://api.iterable.com/api/lists/getUsers response emails on our side to exclude the broken emails. This way we would avoid the requests with broken url (we use the email as part of the url in Events stream).
  • As solution, the email validation on ListUsers.parse_response() method execution was added. So now we yield records with valid email only.

Recommended reading order

  1. api.py
  2. test_streams.py

🚨 User Impact 🚨

None

Pre-merge Checklist

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Dec 1, 2021
@htrueman htrueman requested review from sherifnada and removed request for sherifnada December 1, 2021 12:57
@jrhizor jrhizor temporarily deployed to more-secrets December 1, 2021 12:58 Inactive
Copy link
Contributor

@sergei-solonitcyn sergei-solonitcyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unittests for this functionality, please.

@htrueman htrueman temporarily deployed to more-secrets December 1, 2021 15:00 Inactive
@htrueman htrueman changed the title 🎉 Source Iterable: Add email validation to list_users stream method execution. 🎉 Source Iterable: Add email validation to list_users stream method execution Dec 1, 2021
@keu
Copy link
Contributor

keu commented Dec 1, 2021

does this fix mean that we now skip some records? can't we fix broken emails somehow? or use another way of retrieving this data?

Add unit test `test_events_slices_response`.
@htrueman htrueman temporarily deployed to more-secrets December 3, 2021 15:03 Inactive
@htrueman
Copy link
Contributor Author

htrueman commented Dec 4, 2021

@sherifnada

I've found the solution on how to properly resolve the issue.
There is actually another endpoint which takes email as a query parameter. So it accepts any string.

My fault that I could not found it before.

@jrhizor jrhizor temporarily deployed to more-secrets December 4, 2021 14:49 Inactive
Update setup.py requirements.
@htrueman htrueman temporarily deployed to more-secrets December 4, 2021 14:58 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 4, 2021 15:00 Inactive
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great find and unit testing. My only question is about schema changes:

is this schema change going to break existing user workflows? How to guarantee in the future we understand the schema coming from Iterable? It seems every event can have a different schema. I took the events from the account in question on the OC ticket and counted how often the fields appear in the events coming back from the API:

{'createdAt': 4, 'signupSource': 1, 'emailListIds': 2, 'itblInternal': 4, '_type': 4, 'messageTypeIds': 2, 'channelIds': 2, 'email': 4, 'profileUpdatedAt': 1, 'productRecommendationCount': 1, 'campaignId': 2, 'contentId': 1, 'messageId': 3, 'messageBusId': 1, 'templateId': 3, 'messageTypeId': 1, 'catalogCollectionCount': 1, 'catalogLookupCount': 1, 'channelId': 1, 'recipientState': 2, 'unsubSource': 1}

did you have a strategy in mind when changing the schema for how to handle this?

docs/integrations/sources/iterable.md Outdated Show resolved Hide resolved
@@ -1,7 +1,97 @@
{
"properties": {
"data": {
"type": ["null", "object"]
"lastName": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this schema described anywhere in the API? is it guaranteed that the schema will be the same for a customer's account?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not described anywhere, that what I've got from the records output.
Previously we had this schema:

{
  "properties": {
    "data": {
      "type": ["null", "object"]
    }
  },
  "type": ["null", "object"]
}

which doesn't describe any fields at all.


The output records in previous and current endpoints are almost the same. Except the eventType in prev endpoint output became _type in current endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So perhaps the only solution on how to be sure about the schema is to put the old schema back.

@@ -1,7 +1,97 @@
{
"properties": {
"data": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this field removed? is the schema between the two endpoints different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, previously we had output records in the following format:
{"type": "RECORD", "record": {"stream": "events", "data": {"data": {...}}, "emitted_at": 123}}.
It actually doesn't have any sense, just an odd additional wrapper.

@htrueman htrueman temporarily deployed to more-secrets December 6, 2021 11:53 Inactive
@htrueman htrueman temporarily deployed to more-secrets December 6, 2021 14:49 Inactive
@htrueman
Copy link
Contributor Author

htrueman commented Dec 6, 2021

/test connector=connectors/source-iterable

🕑 connectors/source-iterable https://github.com/airbytehq/airbyte/actions/runs/1545161208
✅ connectors/source-iterable https://github.com/airbytehq/airbyte/actions/runs/1545161208
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        76      8    89%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              235     95    60%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     24    56%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  946    442    53%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                  Stmts   Miss  Cover
	 ---------------------------------------------------------
	 source_iterable/__init__.py               2      0   100%
	 source_iterable/api.py                  160     67    58%
	 source_iterable/iterable_streams.py     107     15    86%
	 source_iterable/slice_generators.py      73      0   100%
	 source_iterable/source.py                15      6    60%
	 ---------------------------------------------------------
	 TOTAL                                   357     88    75%

@jrhizor jrhizor temporarily deployed to more-secrets December 6, 2021 14:56 Inactive
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htrueman resolving the schema is a blocking issue here, without describing the schema the output will not be useful for the user. My suggestion:

  1. for fields which we know exist for all events put them in the top level schema
  2. All other properties which are event dependent should go inside a dict called data or additional_properties inside the record

Update events stream unit tests.
Update events stream schema.
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @htrueman !

@htrueman htrueman temporarily deployed to more-secrets December 6, 2021 18:18 Inactive
@htrueman
Copy link
Contributor Author

htrueman commented Dec 6, 2021

/publish connector=connectors/source-iterable

🕑 connectors/source-iterable https://github.com/airbytehq/airbyte/actions/runs/1546064756
✅ connectors/source-iterable https://github.com/airbytehq/airbyte/actions/runs/1546064756

@htrueman htrueman temporarily deployed to more-secrets December 6, 2021 18:35 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 6, 2021 18:37 Inactive
@htrueman htrueman merged commit 17504e9 into master Dec 6, 2021
@htrueman htrueman deleted the htrueman/fix-iterable-connection-failing-with-error branch December 6, 2021 19:07
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
…xecution (airbytehq#8380)

* Update iterable.md docs.

* Update `Events` stream to use `export/userEvents` endpoint.
Update `events` stream schema.
Add `test_events_parse_response` unit test.

* Update events parse_response data collection.
Update events stream unit tests.
Update events stream schema.

* Bump docker version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants