Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Auto-Disable Failing Connections feature #11099

Merged
merged 23 commits into from
Mar 18, 2022

Conversation

terencecho
Copy link
Contributor

@terencecho terencecho commented Mar 14, 2022

What

Adds feature to automatically disable failing connections. This addresses issue 9715. When this feature is turned on, it will disable a connection after MAX_FAILED_JOBS_IN_A_ROW_BEFORE_CONNECTION_DISABLE consecutive failures (currently defaults to 100 jobs) or after MAX_DAYS_OF_ONLY_FAILED_JOBS_BEFORE_CONNECTION_DISABLE days of only failed jobs (currently defaults to 14 days).

todo:

  • Notify user when connection has been auto-disable. Will be followed up through this issue.
  • figure out why listJobStatusWithConnection is not filtering by timestamp in unit tests
  • move max job failure limits to config

How

When the AUTO_DISABLE_FAILING_CONNECTIONS feature flag is turned on, after a job failure, the DisableActivity will check if either of the above criteria has been met. If so, it will set the connection status to INACTIVE.

Recommended reading order

  1. airbyte-scheduler/persistence/src/main/java/io/airbyte/scheduler/persistence/DefaultJobPersistence.java
  2. airbyte-workers/src/main/java/io/airbyte/workers/temporal/scheduling/activities/AutoDisableConnectionActivityImpl.java
  3. the rest

@github-actions github-actions bot added area/platform issues related to the platform area/scheduler area/worker Related to worker labels Mar 14, 2022
@terencecho terencecho temporarily deployed to more-secrets March 14, 2022 08:20 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 14, 2022 08:20 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 14, 2022 16:31 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 14, 2022 16:31 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 14, 2022 17:05 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 14, 2022 17:05 Inactive
@terencecho terencecho marked this pull request as ready for review March 14, 2022 21:44
final String JobStatusSelect = "SELECT status FROM jobs ";
return jobDatabase.query(ctx -> ctx
.fetch(JobStatusSelect + "WHERE " +
"CAST(scope AS VARCHAR) = ? AND " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to cast as a VARCHAR? scope = '?' should be working (same with the config type).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that was the convention since most of the other queries casted the scope, but just ran the unit tests without it and had no problems, so i'll remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misspoke earlier, looks like the convention is not to cast the scope. But ConfigType is normally casted.

@terencecho terencecho temporarily deployed to more-secrets March 16, 2022 22:10 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 16, 2022 22:10 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 16, 2022 23:19 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 16, 2022 23:19 Inactive
* connection to prevent wasting resources
*/
@ActivityMethod
boolean autoDisableFailingConnection(AutoDisableConnectionActivityInput input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporal recommends to use a POJO for the input/output. will help with adding element in the input or the output. This shouldn't require versioning if if we add something in the POJO. Even if it is likely that we will use one it might help to avoid potential issues if we forget it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added an output object

@terencecho terencecho temporarily deployed to more-secrets March 17, 2022 21:27 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 17, 2022 21:27 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 17, 2022 21:32 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 17, 2022 21:32 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 17, 2022 22:03 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 17, 2022 22:03 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 18, 2022 00:42 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 18, 2022 00:42 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 18, 2022 05:23 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 18, 2022 05:23 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 18, 2022 05:36 Inactive
@terencecho terencecho temporarily deployed to more-secrets March 18, 2022 05:36 Inactive
@terencecho terencecho merged commit f4bb7b2 into master Mar 18, 2022
@terencecho terencecho deleted the tcho/disable-failing-connections branch March 18, 2022 18:13
terencecho added a commit that referenced this pull request Mar 18, 2022
* Fix incorrect var name

* Fix unit test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/scheduler area/worker Related to worker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants