DLPX-86188 [Forwardport to 12.0.0.0] - Management service stuck in activating state after deferred upgrade release(10) to develop(12) #289
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
After upgrading from 10 to 11 when there is an enabled fluentd configuration for a plugin such as elasticsearch-7.far, fluentd and the management stack fail to start.
Diagnosis
The fluentd logs show a failure to find a dependency of elasticsearch-7, the gem faraday of major version 1; only version 3.x is available. It turns out that this older gem version, which used to be included in the td-agent package in 10.0 (version 4.4.2-1) is no longer available in 11.0 (version 4.5.0-1). The upgrade of td-agent removes that gem, among others. The fluentd container start script copies the gems from td-agent to a new directory, but the stop script deletes that directory, so the old gems are not available anymore.
Solution
Until we decide how to deal with plugins that are missing dependencies after upgrading (DLPX-86157) and we let the stack start normally even if a plugin fails (DLPX-86156), we'll pin td-agent to its version in 10.0 (version 4.4.2-1).
Specifically, here on linux-pkg, we make sure that this older version of td-agent is available when building the appliance.
Companion app gate review that forces virtualization to depend on this older version: https://github.com/delphix/dlpx-app-gate/pull/728.
Testing Done
Provide a clear description of how this change was tested. At minimum
this should include proof that a computer has executed the changed
lines. Ideally this should include an automated test or an explanation
as to why this pull request has no tests.