Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐍 Source Mixpanel: certification preparations #30149

Merged

Conversation

davydov-d
Copy link
Collaborator

@davydov-d davydov-d commented Sep 5, 2023

What

#29746

How

  • Set the checkpointing interval to 15 records
  • Limit the max number of retries by 3
  • Make credentials required
  • Change project_id to the required parameter for the Service Account type of Authentication
  • Improve input config validation
  • Cover input config validation with unit tests
  • Unpin Airbyte CDK to be sure its version is always up to date
  • Add suggested streams
  • Fix date-time field values based on unexpected value reports from datadoghq

🚨 User Impact 🚨

No breaking changes. Config schema was changed, but config migrations will handle it.

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/mixpanel labels Sep 5, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@davydov-d davydov-d marked this pull request as ready for review September 5, 2023 09:26
@davydov-d davydov-d added the breaking-change Don't merge me unless you are ready. label Sep 5, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2023

source-mixpanel test report (commit 59b5da5bec) - ❌

⏲️ Total pipeline duration: 257mn24s

Step Result
Connector package install
Build source-mixpanel docker image for platform linux/x86_64
Unit tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/source-mixpanel/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mixpanel test

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2023

source-hubplanner test report (commit 3570a2e28c) - ❌

⏲️ Total pipeline duration: 38.58s

Step Result
Connector package install
Build source-hubplanner docker image for platform linux/x86_64
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/source-hubplanner/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-hubplanner test

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2023

source-mixpanel test report (commit 3570a2e28c) - ❌

⏲️ Total pipeline duration: 132mn28s

Step Result
Connector package install
Build source-mixpanel docker image for platform linux/x86_64
Unit tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/source-mixpanel/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mixpanel test

Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this looks really good! Solid UX improvements. The only blocking comment I have at the moment is around swallowing the config errors, since they won't get raised. I did leave some other suggestions as well, though.

Comment on lines -15 to -16
- config_path: "secrets/config_old.json"
status: "succeed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remember to delete this afterwards if its not going to be used anymore. It makes sense that this old config won't be valid anymore.

dockerImageTag: 0.1.38
dockerImageTag: 1.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to need a releases: breakingChanges: entry for this change - the validation should also tell you that, though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment on lines +80 to +85
raise_config_error("Please provide a valid True/False value for the `Select properties by default` parameter.")

if not isinstance(attribution_window, int) or attribution_window < 0:
raise_config_error("Please provide a valid integer for the `Attribution window` parameter.")
if not isinstance(date_window_size, int) or date_window_size < 1:
raise_config_error("Please provide a valid integer for the `Date slicing window` parameter.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So many specific config errors - this is great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be be specific on what is valid? Looks like for attribution window, valid is positive or 0, and for date slicing window, it must be positive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments to spec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works too. UX-wise someone might have an opinion on where it should go as to not be cluttered, but I think as long as the info is somewhere, its good for now

Comment on lines 116 to 117
except Exception as e:
return False, e
return False, e.message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately doing this here negates all the good work of our config errors! Instead we should remove the exception wrapping here.

Then when we hit a config error, it will be raised and shown to the user (instead of only sending a connection status failed message, which would not contain the traceback).

If we hit an error in the above methods that we don't have handling for, the CDK's check() method which wraps this method will do the proper error handling of emitting an AirbyteTraceMessage (of a "system" error instead of a "config" error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -84,7 +114,7 @@ def check_connection(self, logger: AirbyteLogger, config: Mapping[str, Any]) ->
config = self._validate_and_transform(config)
auth = self.get_authenticator(config)
except Exception as e:
return False, e
return False, e.message

# https://github.com/airbytehq/airbyte/pull/27252#discussion_r1228356872
# temporary solution, testing access for all streams to avoid 402 error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below testing only checks one stream before early returning - I don't think it will actually check all streams. This might be worth looking at, in addition to the exception/error handling down here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is workaround for this issue: #27252 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the comment says it is testing access for all streams, but it is only testing access for one stream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably, it used to break if successfully connected to one stream, but now returns True:

https://github.com/airbytehq/airbyte/pull/27252/files#diff-ec2bcf88395f3b6aa5b5016d6d83b6b0c0270640e6cbb699c04034b9c9c4e0d4R95

That doesn't look like it was introduced in this PR, so I won't be a stickler about it, but a follow up issue would be good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snippet from your comment behaves similarly but breaks instead of returning. The comment indicates that the system tests to see if any stream functions correctly. If at least one does, the check is deemed successful. I'll rephrase it for better clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, if thats the case, then yes I've misinterpreted the comment - thanks

I guess I was thinking continue (onto the next one), not break

@@ -60,14 +61,14 @@
"project_id": {
"order": 1,
"title": "Project ID",
"description": "Your project ID number. See the <a href=\"https://help.mixpanel.com/hc/en-us/articles/115004490503-Project-Settings#project-id\">docs</a> for more information on how to obtain this.",
"description": "Your project ID number. See the <a href=\"https://help.mixpanel.com/hc/en-us/articles/115004490503-Project-Settings#project-id\">docs</a> for more information on how to obtain this. Required if you are using a service account to authenticate.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to leverage groups to make this conditionally required? (General question, I don't know the answer to this. But if there is, I'd recommend we do it). Someone from @airbytehq/connector-extensibility might be able to answer this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed parameter to required

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that looks like it will do it, I think - thanks! in that case we can probably remove the Required if you are using a service account to authenticate.

Comment on lines 29 to 30
# we assume there's at least 10 records per request returned in average. Given that each request is followed by a 60 seconds
# sleep, we'll have to emit state once per 150 records
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each request is followed by a 60 second sleep anyway, perhaps we could emit state after every request? The 15 minutes is a maximum, and it seems like it won't slow down the connector, since the bottleneck is the API rate-limiting us

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decreased checkpoint interval to emit state every request

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm it looks like it still says we're emitting it every 15 records?

@github-actions
Copy link
Contributor

source-hubplanner test report (commit 7e56b58e00) - ❌

⏲️ Total pipeline duration: 02mn12s

Step Result
Connector package install
Build source-hubplanner docker image for platform linux/x86_64
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/source-hubplanner/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-hubplanner test

@github-actions
Copy link
Contributor

source-mixpanel test report (commit 7e56b58e00) - ❌

⏲️ Total pipeline duration: 83mn48s

Step Result
Connector package install
Build source-mixpanel docker image for platform linux/x86_64
Unit tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/source-mixpanel/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mixpanel test

@vercel
Copy link

vercel bot commented Sep 20, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Sep 27, 2023 9:33am

@tolik0 tolik0 force-pushed the ddavydov/#29746-source-mixpanel-prepare-for-certification branch from 1c34027 to 603d9c7 Compare September 20, 2023 17:49
@github-actions
Copy link
Contributor

source-mixpanel test report (commit c4835f5f73) - ❌

⏲️ Total pipeline duration: 53.65s

Step Result
Connector package install
Build source-mixpanel docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/source-mixpanel/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mixpanel test

Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I reviewed:

  • breaking change metadata
  • new config errors

No more blocking changes from me

releases:
breakingChanges:
1.0.0:
message: In this release, the "datetime" column in the "engage" stream will transition from Date/Time/Datetime types to string due to Mixpanel inconsistencies. Additionally, the "credentials" field is now mandatory, and the "project_id" path has been updated, though a config migration will minimize disruptions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice message. Should the user reset the engage stream as a result of the type change? Maybe we're doing it for them?

I think you'll need a migrations file - QA checks should let you know this

Comment on lines +80 to +85
raise_config_error("Please provide a valid True/False value for the `Select properties by default` parameter.")

if not isinstance(attribution_window, int) or attribution_window < 0:
raise_config_error("Please provide a valid integer for the `Attribution window` parameter.")
if not isinstance(date_window_size, int) or date_window_size < 1:
raise_config_error("Please provide a valid integer for the `Date slicing window` parameter.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works too. UX-wise someone might have an opinion on where it should go as to not be cluttered, but I think as long as the info is somewhere, its good for now

Comment on lines 29 to 30
# we assume there's at least 10 records per request returned in average. Given that each request is followed by a 60 seconds
# sleep, we'll have to emit state once per 150 records
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm it looks like it still says we're emitting it every 15 records?

@@ -60,14 +61,14 @@
"project_id": {
"order": 1,
"title": "Project ID",
"description": "Your project ID number. See the <a href=\"https://help.mixpanel.com/hc/en-us/articles/115004490503-Project-Settings#project-id\">docs</a> for more information on how to obtain this.",
"description": "Your project ID number. See the <a href=\"https://help.mixpanel.com/hc/en-us/articles/115004490503-Project-Settings#project-id\">docs</a> for more information on how to obtain this. Required if you are using a service account to authenticate.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that looks like it will do it, I think - thanks! in that case we can probably remove the Required if you are using a service account to authenticate.

@@ -51,7 +51,8 @@ Syncing huge date windows may take longer due to Mixpanel's low API rate-limits

| Version | Date | Pull Request | Subject |
|:--------|:-----------|:---------------------------------------------------------|:------------------------------------------------------------------------------------------------------------|
| 0.1.39 | 2023-09-15 | [30469](https://github.com/airbytehq/airbyte/pull/30469) | Add default primary key `distinct_id` to `Export` stream |
| 1.0.0 | 2023-09-20 | [30149](https://github.com/airbytehq/airbyte/pull/30149) | Prepare for certifying to the next stage |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we describe what changed in the actual code instead of referring to certification?

@@ -84,7 +114,7 @@ def check_connection(self, logger: AirbyteLogger, config: Mapping[str, Any]) ->
config = self._validate_and_transform(config)
auth = self.get_authenticator(config)
except Exception as e:
return False, e
return False, e.message

# https://github.com/airbytehq/airbyte/pull/27252#discussion_r1228356872
# temporary solution, testing access for all streams to avoid 402 error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the comment says it is testing access for all streams, but it is only testing access for one stream?

@@ -84,7 +114,7 @@ def check_connection(self, logger: AirbyteLogger, config: Mapping[str, Any]) ->
config = self._validate_and_transform(config)
auth = self.get_authenticator(config)
except Exception as e:
return False, e
return False, e.message

# https://github.com/airbytehq/airbyte/pull/27252#discussion_r1228356872
# temporary solution, testing access for all streams to avoid 402 error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably, it used to break if successfully connected to one stream, but now returns True:

https://github.com/airbytehq/airbyte/pull/27252/files#diff-ec2bcf88395f3b6aa5b5016d6d83b6b0c0270640e6cbb699c04034b9c9c4e0d4R95

That doesn't look like it was introduced in this PR, so I won't be a stickler about it, but a follow up issue would be good

@erohmensing
Copy link
Contributor

❓ Question: Is it feasible for us to separate out non-breaking and breaking changes and merge the non-breaking ones first?

In general, these certification PRs are huge. I would love to see them implemented in a more incremental way, so that if there are any issues with any part of the PR, we can roll back as necessary without undoing everything. As a bonus we can check each thing off as it's done, and ship more values to users earlier. cc @katmarkham

@github-actions
Copy link
Contributor

source-mixpanel test report (commit 07b5b24246) - ❌

⏲️ Total pipeline duration: 66mn24s

Step Result
Connector package install
Build source-mixpanel docker image for platform linux/x86_64
Unit tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/source-mixpanel/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mixpanel test

@tolik0 tolik0 force-pushed the ddavydov/#29746-source-mixpanel-prepare-for-certification branch from 00a22be to 3dcd3a7 Compare September 26, 2023 18:15
@airbyte-oss-build-runner
Copy link
Collaborator

source-mixpanel test report (commit 5ca55e43f0) - ❌

⏲️ Total pipeline duration: 70mn11s

Step Result
Connector package install
Build source-mixpanel docker image for platform linux/x86_64
Unit tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/source-mixpanel/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mixpanel test

@airbyte-oss-build-runner
Copy link
Collaborator

source-mixpanel test report (commit 0550d05799) - ✅

⏲️ Total pipeline duration: 129mn33s

Step Result
Connector package install
Build source-mixpanel docker image for platform linux/x86_64
Unit tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/source-mixpanel/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mixpanel test

@tolik0 tolik0 changed the title 🚨 🚨 Source Mixpanel: certification preparations 🐍 Source Mixpanel: certification preparations Sep 27, 2023
@@ -75,7 +75,7 @@ class Export(DateSlicesMixin, IncrementalMixpanelStream):
3 queries per second and 60 queries per hour.
"""

primary_key: str = "distinct_id"
primary_key: Iterable[str] = ["distinct_id", "event", "time"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change I believe

ISO_FORMAT_PATTERN = re.compile(r"^(\d{4}-\d{2}-\d{2})[ t](\d{2}:\d{2}:\d{2})$")


def to_iso_format(s: str) -> str:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some unit tests please for fix_date_time and to_iso_format

@tolik0 tolik0 force-pushed the ddavydov/#29746-source-mixpanel-prepare-for-certification branch from 0550d05 to 87e2a6e Compare September 27, 2023 09:27
@airbyte-oss-build-runner
Copy link
Collaborator

source-mixpanel test report (commit 7760a54497) - ✅

⏲️ Total pipeline duration: 73mn16s

Step Result
Connector package install
Build source-mixpanel docker image for platform linux/x86_64
Unit tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/source-mixpanel/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mixpanel test

@tolik0 tolik0 merged commit f8c3e76 into master Sep 27, 2023
29 checks passed
@tolik0 tolik0 deleted the ddavydov/#29746-source-mixpanel-prepare-for-certification branch September 27, 2023 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/mixpanel team/connectors-python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants