Skip to content

NIFI-15901 Disable controller services before updating#11201

Open
adwk67 wants to merge 3 commits intoapache:mainfrom
stackabletech:NIFI-15901
Open

NIFI-15901 Disable controller services before updating#11201
adwk67 wants to merge 3 commits intoapache:mainfrom
stackabletech:NIFI-15901

Conversation

@adwk67
Copy link
Copy Markdown

@adwk67 adwk67 commented May 4, 2026

Summary

NIFI-15901

This is similar to #11111 (https://issues.apache.org/jira/browse/NIFI-15801) but applies to controller services.

If a controller service in a child process group shares an identifier with a CS in the root process group, and a node is disconnected, and the child-group CS is deleted on that node, then when the node reconnects a FlowSynchronizationException is thrown because the root CS is not disabled before being updated.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000
  • Pull request contains commits signed with a registered key indicating Verified status

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using ./mvnw clean install -P contrib-check
    • JDK 21
    • JDK 25

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@adwk67 adwk67 marked this pull request as ready for review May 4, 2026 16:51
// disable them (e.g. COMPONENT_ADDED diffs are skipped by AffectedComponentSet).
// We must disable before calling updateControllerService, which calls setProperties
// which calls verifyModifiable and throws IllegalStateException on ENABLED services.
final long stopTimeout = System.currentTimeMillis() + syncOptions.getComponentStopTimeout().toMillis();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this deadline be recomputed per service inside the loop, like processorStopDeadline in synchronizeProcessors, so later iterations don't inherit an exhausted budget?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: d4666a4

// restoring the controller to its pre-update running state.
if (proposedService.getScheduledState() != org.apache.nifi.flow.ScheduledState.DISABLED) {
context.getControllerServiceProvider().enableControllerServicesAsync(servicesToRestart);
context.getControllerServiceProvider().scheduleReferencingComponents(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also call notifyScheduledStateChange for servicesToRestart (ENABLED) and referencesToRestart (RUNNING) here, to match the per-service synchronize(ControllerServiceNode, …) overload?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: d4666a4

// Re-enable services and restart components that were stopped for the update,
// restoring the controller to its pre-update running state.
if (proposedService.getScheduledState() != org.apache.nifi.flow.ScheduledState.DISABLED) {
context.getControllerServiceProvider().enableControllerServicesAsync(servicesToRestart);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does enabling via controllerServiceProvider here race with the subsequent componentScheduler.enableControllerServicesAsync loop, and should both paths use the same scheduler?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone with using the componentScheduler for both as it has already been paused by the caller: d4666a4

// Update all of the Controller Services to match the VersionedControllerService
// Update all Controller Services to match the VersionedControllerService.
// Services may be ENABLED here if the outer "affected components" pass did not
// disable them (e.g. COMPONENT_ADDED diffs are skipped by AffectedComponentSet).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this reference FlowDifferenceFilters.isComponentUpdateRequired instead of AffectedComponentSet, since that's the filter actually populating updatedVersionedComponentIds?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified the comment to keep the salient bit: d4666a4. If I am reading the code correctly then AffectedComponentSet and isComponentUpdateRequired are doing different things (the former marking what gets disabled, the latter marking things for update).

Copy link
Copy Markdown
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @markap14 for additional review. Although there are some similarities to the recent change for Processors, it looks like this needs to be considered in the larger context of calls to the Synchronizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants