Skip to content

Conversation

@StephanEwen
Copy link
Contributor

What is the purpose of the change

This PR introduces Operator Coordinators, as a part of FLIP-27

Operator Coordinators are instances that exist once per operator. While the operators run on the TaskManagers, the coordinator runs on the JobManager. The coordinator communicates via events with the operators, typically to assign work.

The first user for those coordinators would be the new source interface. The OperatorCoordinator will run the Source's Split Enumerator.
This change will also allow us to remove InputSplits and intializeOnMaster / finalizeOnMaster logic in a future step.

Further users we envision are sinks (for coordinated commits of metadata), or iterations (gather progress and coordinate supersteps) as well as simple approximate alignments between streams (event time alignment).

Brief change log

  • Introduce the OperatorCoordinator interface.
  • Add a way to attach a factory (Provider) for the Coordinator to the JobVertex of the JobGraph.
  • Integrate the OperatorCoordinatorwith theExecutionJobVertex` and the new scheduler (note, integration with the legacy scheduler is not planned)
  • Add the OperatorEvent and support for sending sending bidirectional events between Coordinator and Operator.
  • On the TaskManager / runtime side, Operators register themselves at the OperatorEventDispatcher and obtain a Gateway to send events. That way, operators are not (more strongly than already) tied to the heavyweight Environment object.

Verifying this change

This change is internal only so far (a building block for other features, like the new Source API).
The change is tested mainly through some units tests, most importantly
flink-runtime : org.apache.flink.runtime.scheduler.OperatorCoordinatorSchedulerTest.java

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: yes
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? internal feature
  • If yes, how is the feature documented? not applicable (docs will be added for the new source interface)

@StephanEwen StephanEwen changed the title Operator coordinators [FLINK-15099][runtime] (FLIP-27) Add Operator Coordinators and Events Dec 8, 2019
@flinkbot
Copy link
Collaborator

flinkbot commented Dec 8, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit f7d3793 (Sun Dec 08 04:13:43 UTC 2019)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!
  • Invalid pull request title: No valid Jira ID provided

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 8, 2019

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@becketqin becketqin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephanEwen Thanks for the patch. The existing code looks good overall. I only have some minor comments. Some of the missing java docs and parameter / variable renaming could be done in a follow-up patch given the release 1.10 code freeze is approaching. We probably also need some more unit test cases for the event passing itself.

@StephanEwen StephanEwen force-pushed the operator_coordinators branch from f7d3793 to 774d4ca Compare February 11, 2020 15:58
@StephanEwen StephanEwen force-pushed the operator_coordinators branch 5 times, most recently from b0a384d to ab016b8 Compare February 13, 2020 20:32
* }
* </pre>
*/
public final class AutoContextClassLoader implements AutoCloseable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate of TemporaryClassLoaderContext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would consolidate that in a separate commit. If that gets used beyond plugins, it should reside in a more "universal" package. The AutoContextClassLoader is a tad bit nicer, especially javadoc wise.

Operator Coordinators are instances that exist once per operator. While the operators run on the TaskManagers, the
coordinator runs on the JobManager. The coordinator communicates via events with the operators, typicalls to
assign work.

The first user for those coordinators would be the new source interface.
Further users we envision are sinks (for coordinated commits of metadata), or iterations (gather progress and
steer supersteps) as well as simple approximate alignments between streams (event time alignment).
@StephanEwen StephanEwen force-pushed the operator_coordinators branch from ab016b8 to 627d2c1 Compare February 13, 2020 22:03
@StephanEwen
Copy link
Contributor Author

Manually merged in 41b6bfa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants