-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-25206] Add configuration option to disable configuration in user programs #18043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit ba15885 (Tue Dec 07 13:31:33 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
AHeise
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k first round of review. Please separate refactors into a seperate commit. It makes review unnecessarily hard.
flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/JobStartupFailedException.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Override | ||
| public void submitFailedJob(JobStartupFailedException exception) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not submit but rather register?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep the name to be consistent with submitJob because the lifecycle is pretty similar to submitting a job that immediately fails.
flink-clients/src/main/java/org/apache/flink/client/ClientUtils.java
Outdated
Show resolved
Hide resolved
| }) | ||
| .exceptionally( | ||
| t -> { | ||
| final Optional<JobStartupFailedException> jobStartupFailedOpt = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right commit for the change? Maybe it should be 3 commits. I more thought that the scope of this commit is ending with PackageProgram...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the change is not that big anymore. Do you see it as a blocker?
flink-clients/src/main/java/org/apache/flink/client/program/StreamContextEnvironment.java
Outdated
Show resolved
Hide resolved
|
|
||
| private int jobCounter; | ||
|
|
||
| private final Collection<JobValidationError> errors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik this field is unneeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is set in the ctor with the forbidden configuration coming from the instantiation of the StreamEnvironment i.e. StreamExecutionEnvironment.getExecutionEnvironment(Configuration config)
.../java/org/apache/flink/client/deployment/application/ApplicationDispatcherBootstrapTest.java
Outdated
Show resolved
Hide resolved
| if (allowConfigurations) { | ||
| return errors; | ||
| } | ||
| final MapDifference<String, String> diff = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is originalConfiguration not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the call of the parent ctor the StreamExecutionEnvironment translates the configuration into the equivalent sub configurations. [1]
[1]
Line 277 in 7641c23
| this.configure(this.configuration, this.userClassloader); |
2cf1a66 to
96f31d5
Compare
dmvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @fapaul, I think this is headed in the right direction 👍 I've added few comments, PTAL
...main/java/org/apache/flink/client/deployment/application/ApplicationDispatcherBootstrap.java
Outdated
Show resolved
Hide resolved
| resetContextEnvironment(); | ||
| } | ||
|
|
||
| private List<String> collectNotAllowedConfigurations() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the approach with the configuration diffing here 🤔
Would it make sense to simply make the configurations objects exposed to the user immutable? (something along the lines of java.util.Collections#unmodifiableCollection)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we can do this because it would break the behavior of users modifying the configurations. In general, I agree 100% the current situation is definitely not good that users can mutate these objects and we have to handle it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it actually make more sense? We simply don't want user to be able to mutate any configuration.
The only difference would be failing earlier and providing user with a full stack trace of the problematic call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be also a half-way option of letting user "mutate" the configuration as long as it doesn't change the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea but I think the effort will be significant because we need to monkey patch all methods in the CheckpointConfig[1] and ExecutionConfig[2].
We deemed it as less maintainable.
[1] https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java
[2] https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to rework the configuration management a bit, unfortunately, did not succeed. The problem is that some configurations you can set at the ExecutionConfig or CheckpointConfig do not have direct ConfigOption. All the configurations basically load a user class like some custom serializer.
We would need to consolidate the resolution of these classes first before unifying that all configurations are reflected through ConfigOption but I do not see this PR as a good point to rework the configuration management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, until we get rid of Execution and Checkpoint config, this would be tricky to achieve.
| } | ||
|
|
||
| @Override | ||
| public void submitFailedJob(FatalProgramInvocationException exception) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this method? We didn't really execute the job, so there is nothing to archive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the user can receive the application result so we have to submit something containing the error message.
| + " (either successfully or as result of a failure). Has no effect for other deployment modes."); | ||
|
|
||
| public static final ConfigOption<Boolean> ALLOW_CLIENT_JOB_CONFIGURATIONS = | ||
| ConfigOptions.key("execution.allow-client-job-configurations") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
execution.immutable-configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in Flink there is a general misunderstanding of where the configuration is coming from. I miss which configuration is immutable with immutable-configuration.
| errorMessages.addAll(collectNotAllowedConfigurations()); | ||
| if (!errorMessages.isEmpty()) { | ||
| // HACK: We shortcut the StreamGraph to jobgraph translation because we already | ||
| // know that the job needs to fail and can derive the jobId. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job fails before being submitted, there is no need to generate jobId here. Also wouldn't this break with multiple job submission (we support that in non-ha setups)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right in the case of multiple executes we will only capture the first one #17995 (comment) .
I would see exception handling for multiple jobs out of scope for this PR because therefore we probably have to rethink how the Application mode interacts with the cluster components.
| return false; | ||
| } | ||
| CheckpointConfig that = (CheckpointConfig) o; | ||
| return checkpointInterval == that.checkpointInterval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems fragile, when the new field will be added, this could not get updated. Could we do the comparison of the serialized form (original vs user) instead? I'd expect the serialized form to be stable within the same JVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to using serialization to check whether something has changed.
| * <p>The job will transition to FAILED state, and it will not be recovered. | ||
| */ | ||
| @Internal | ||
| public class FatalProgramInvocationException extends ProgramInvocationException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a new exception type? As far as I can tell this does the same thing as ProgramInvocationException (we don't recover from that one either -> so it's also "fatal").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also PTAL at the UnsuccessfulExecutionException, that seems to be related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fatal might not be the right word. WDYT about HandledProgramInvocationException? I do not really want to make all ProgramInvocationException causing the program to go to a failed state.
I would see the normal ProgramInvocationException as a reason the JM shuts down and HA might recover and I want to introduce a new exception allowing a transition to a failed state.
|
@dmvk I have based this feature now on FLINK-25715. Please have another look. |
dmvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @fapaul, this seems to be headed in a right direction 👍. My biggest concern is about diffing of the configuration, which doesn't seem to behave correctly.
| conf.toMap() | ||
| .forEach( | ||
| (k, v) -> | ||
| errors.add( | ||
| ConfigurationNotAllowedMessage | ||
| .ofConfigurationKeyAndValue(k, v))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would throwing an exception here right away make more sense? It could point user to the exact location where the problem is.
Also it would avoid passing the errors into the stream environment, which could simplify the change-set a bit.
eg.
| conf.toMap() | |
| .forEach( | |
| (k, v) -> | |
| errors.add( | |
| ConfigurationNotAllowedMessage | |
| .ofConfigurationKeyAndValue(k, v))); | |
| throw new MutatedConfigurationException("Supplying a custom configuration for the stream environment is not allowed, because the client-side configuration is disabled."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I like that all errors are collected and users immediately see what they are not allowed to set. Otherwise, it might take multiple submissions until they have seen all errors. Regarding the code complexity, it seems okay to me since we are only adding to the StreamContextEnvironment. In fact, I am a bit surprised that the StreamContextEnvironment is marked as @PublicEvolving not sure when users will ever interact with it.
| resetContextEnvironment(); | ||
| } | ||
|
|
||
| private List<String> collectNotAllowedConfigurations() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, until we get rid of Execution and Checkpoint config, this would be tricky to achieve.
| } | ||
| final MapDifference<String, String> diff = | ||
| Maps.difference(originalConfiguration.toMap(), configuration.toMap()); | ||
| diff.entriesOnlyOnRight() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem correct, it only covers cases where we add new options to the config. I think we need to cover all three cases:
- Config option has been removed (left side only)
- Config option has changed (diff)
- Config option has been added (right side)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I also update the other cases.
flink-clients/src/main/java/org/apache/flink/client/program/StreamContextEnvironment.java
Outdated
Show resolved
Hide resolved
flink-clients/src/main/java/org/apache/flink/client/program/StreamContextEnvironment.java
Show resolved
Hide resolved
|
@dmvk I updated the diffing and addressed your comments. |
dmvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for update @fapaul, LGTM overall 👍 Could you please add a simple unit test for StreamContextEnviroment that ensures we've properly covered different violation scenarios (CheckpointConfig, ExecutionConfig, add / remove / edit config options)?
… configurable Add configuration option to disable configuration in user jars. The submission will fail instantly before the job creation.
|
@dmvk I have added a unit test to verify the different violation scenarios. |
dmvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Thanks for the PR, good job ;)
What is the purpose of the change
This PR adds an option to disable programmatic configurations in a user program when running with the Application mode. In case, the option is enabled program changing the configuration will result in a failed job.
By default, this configuration is turned off to not break existing setups.
This subsumes #17995
Brief change log
Verifying this change
Added tests to verify the exception is thrown in the correct scenarios and the job result is retrievable after the exception was thrown.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation