Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#280 workaround - Add a configurable delay to avoid race condition in initializer #1204

Merged
merged 4 commits into from Nov 26, 2019

Conversation

jtnord
Copy link
Member

@jtnord jtnord commented Nov 20, 2019

The initializer needs to wait until Jenkins has loaded.
Until Jenkins has more milestones add a configurable delay before configuring from code

hacky workaround for #280 until https://issues.jenkins-ci.org/browse/JENKINS-51856 is implemented.

Completely untested drive by PR. may blow up, std.disclaimers etc etc.

Your checklist for this pull request

🚨 Please review the guidelines for contributing to this repository.

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side) and not your master branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or in Jenkins JIRA
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Did you provide a test-case? That demonstrates feature works or fixes the issue.

Copy link
Member

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some documentation for it so that we can refer it from Jenkins changelogs and upgrade guidelines. W.r.t. the fix itself, I think we can accept it as a workaround as long as it does not change the default behavior

jtnord and others added 2 commits November 21, 2019 14:12
the initializer needs to wait until Jenkins has loaded.
Until Jenkins has more milestones add a configurable delay before configuring from code
Copy link
Member

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve it as a workaround if there is a decision to not revert the patch from the core. Such workaround will definitely cause some issues for users, but we are in the "pick your poison" state. Data loss risk is a significant concern for those who use JCasC outside immutable images.

@Casz @timja Please do not merge until the Jenkins core discussion is concluded in https://groups.google.com/forum/#!topic/jenkinsci-dev/2hPMwmZDZFg

CC @olivergondza

@jtnord
Copy link
Member Author

jtnord commented Nov 21, 2019

@Casz @timja Please do not merge until the Jenkins core discussion is concluded in https://groups.google.com/forum/#!topic/jenkinsci-dev/2hPMwmZDZFg

Regardless of any core decision I think this can be merged if tested (i have not done any). This may allow a workaround for #1171 (so hopefully fixes real issues until JENKINS-51856 lands (which would force a bump of Jenkins for all users that they may not want (if they are LTS users for example)

@jtnord
Copy link
Member Author

jtnord commented Nov 22, 2019

@runzexia can you confirm that this fix works for you? (https://groups.google.com/forum/#!topic/jenkinsci-dev/O_g1kk39TBQ)
there is a build here if you require a binary

README.md Outdated Show resolved Hide resolved
Co-Authored-By: Tim Jacomb <t.jacomb@kainos.com>
@jtnord jtnord requested a review from timja November 22, 2019 11:03
Copy link
Member

@timja timja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the change, been waiting for feedback on testing of it

1 docs nit though, given we've recently done some work to clean up the README, I'm loathe to add more content to it

README.md Outdated Show resolved Hide resolved
@runzexia
Copy link
Member

I have already built a hpi myself and conducted basic installation tests. The plug-in can work normally. I will continue to test whether the relevant parameters can avoid problems. My environment cannot stably reproduce the exception problem caused by calling save. do you have any good methods to help reproduce the problem?
@jtnord

@jtnord
Copy link
Member Author

jtnord commented Nov 22, 2019

@runzexia The best way I have found when reproducing similar condition is to create a cloud containing a SlowCloud from https://github.com/jenkinsci/make-jenkins-slow-plugin.

I guess you would need this to be in the Jenkins config.xml but not CasC files (otherwise it would slow down both Jenkins and CasC)

@runzexia
Copy link
Member

I tested it in my environment jenkins 2.199.
When I use slow clouds in config.xml (1mins), I can reproduce the exception 100%.

I added the parameter -Dio.jenkins.plugins.casc.ConfigurationAsCode.initialDelay=70000 to my jenkins. After the adjustment, I didn't encounter the exception again.

Copy link
Member

@amuniz amuniz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me.
I think a warning log message when the property is enabled would be good, so people using it do not forget to remove it once JENKINS-51856 is fixed.

@jtnord
Copy link
Member Author

jtnord commented Nov 22, 2019

I think a warning log message when the property is enabled would be good, so people using it do not forget to remove it once JENKINS-51856 is fixed.

When JENKINS-51856 this plugin will need adapting and this workaround can be removed completely, so the property would have no effect.

@jtnord
Copy link
Member Author

jtnord commented Nov 25, 2019

this workaround works where you have the ability to set the configuration property, but it has issues where you are defining the master with code and have no access to set system properties (k8s operator may be such an example)

@oleg-nenashev
Copy link
Member

oleg-nenashev commented Nov 25, 2019 via email

@timja
Copy link
Member

timja commented Nov 25, 2019

@oleg-nenashev ready to ship?
Once this is in let's cut a release

@jetersen
Copy link
Member

jetersen commented Nov 25, 2019

I think we can ship this 👍

(Would be nice, if Jenkins core offered to read a properties file on startup)
(JCasC could also offer to read a properties/yml file before configuration)

@timja
Copy link
Member

timja commented Nov 25, 2019

I think we can ship this 👍

(Would be nice, if Jenkins core offered to read a properties file on startup)
(JCasC could also offer to read a properties/yml file before configuration)

this would cover it:
#81

Copy link
Member

@jetersen jetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to give more room to introduce new steps

Long duration = Long.getLong(ConfigurationAsCode.class.getName() + ".initialDelay");
if (duration != null) {
try {
Thread.sleep(duration);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think this sleep should be added later into the configure step.

To allow for room to introduce configuration of system properties.

Copy link
Member

@jetersen jetersen Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a better place to put the sleep in, on line 719:

private void configureWith(Mapping entries) throws ConfiguratorException {
// Initialize secret sources
SecretSource.all().forEach(SecretSource::init);
// Check input before actually applying changes,
// so we don't let master in a weird state after some ConfiguratorException has been thrown
final Mapping clone = entries.clone();
checkWith(clone);
final ObsoleteConfigurationMonitor monitor = ObsoleteConfigurationMonitor.get();
monitor.reset();
ConfigurationContext context = new ConfigurationContext(registry);
context.addListener(monitor::record);
try (ACLContext acl = ACL.as(ACL.SYSTEM)) {
invokeWith(entries, (configurator, config) -> configurator.configure(config, context));
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is called by configure which is public and can be called when a config is updated to apply it, that would unnecessarily slow down reconfiguration, so would not be the correct place AFAICT (but I'm not a maintainer of the plugin so may be unaware of something else going on here)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with having the sleep inside init it does not allow for much flexibility if we wanted to introduce logic before the sleep happens.

The sleep would have to be modified so it only called during initialization.
A simple static volatile variable would do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am totally wrong @timja @oleg-nenashev ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with having the sleep inside init it does not allow for much flexibility if we wanted to introduce logic before the sleep happens.

The code does not support running anything before initialisation AFAICT so that would itself require a code change. thus when you introduce that code you can move/change/refactor this sleep.

The sleep would have to be modified so it only called during initialization.

this is the initialisation?

    @Initializer(after = InitMilestone.EXTENSIONS_AUGMENTED, before = InitMilestone.JOB_LOADED)
    public static void init() throws Exception {      

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init is the initialisation, however my concern is having the sleep at the very beginning does not allow any room for introducing new functionality like #81

Especially as shown in the example: #81 (comment)

Your of course welcome to push the problem till #81 is implemented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#81 needs to be run in a much earlier on lifecycle as plugins may use system properties to change their behaviour. (It is also not clear how that would work with features in core that are controlled by properties as by the time of plugin loading it may be too late, but that is a different issue)

@jetersen jetersen dismissed their stale review November 25, 2019 19:27

щ(゜ロ゜щ)

Copy link

@varyvol varyvol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some wording suggestions but looks good.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
The required value will be dependant on aspects of your system (cpu/disk) and configuration, and can be found is mostly a trial and error.
A suggestion would be to start with 5000 (5 Seconds) and then increment by 2000 (2 seconds) until you no longer exhibit the issue and finally add 1000 (1 second) for some extra safety.
For example, to delay the configuration by 9 seconds you would use something like the following command `java -Dio.jenkins.plugins.casc.ConfigurationAsCode.initialDelay=9000 -jar jenkins.war`.
Exactly how and where you specify this option depends on the installation method used to install Jenkins.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Exactly how and where you specify this option depends on the installation method used to install Jenkins.
Exactly how and where you specify this option depends on the installation method used to execute Jenkins.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I beleive the original is correct. if you install using a WAR it is different to an RPM. in all cases you may executute using service start jenkins

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point!

@timja
Copy link
Member

timja commented Nov 26, 2019

@jtnord could you please integrate @varyvol suggestions, at least the typo fixes and then we'll merge.

Tried to do it myself but I don't have permission to your fork

Address typos and doc fixes from review

Co-Authored-By: Evaristo Gutiérrez <egutierrez@cloudbees.com>
@jtnord
Copy link
Member Author

jtnord commented Nov 26, 2019

@timja addressed changes. not sure why you couldn't commit allow edits from maintainers is checked
image

@timja
Copy link
Member

timja commented Nov 26, 2019

@timja addressed changes. not sure why you couldn't commit allow edits from maintainers is checked
image

Weird, it gave a permission denied error, will 🚢 this once LGTM finishes running

@timja timja changed the title add a configurable delay to avoid race condition in initializer Add a configurable delay to avoid race condition in initializer Nov 26, 2019
@timja timja added the feature A PR that adds a feature - used by Release Drafter label Nov 26, 2019
@timja timja merged commit 8807f0a into jenkinsci:master Nov 26, 2019
@varyvol
Copy link

varyvol commented Nov 26, 2019

@timja do you plan to release a new version including this?

@oleg-nenashev
Copy link
Member

I can cut a release tomorrow if @timja is busy with other stuff. We have this major fix + Plugin documentation publishing fix which severely impacts plugin users. IMHO they justify the new release

@oleg-nenashev oleg-nenashev changed the title Add a configurable delay to avoid race condition in initializer #2298 workaround - Add a configurable delay to avoid race condition in initializer Nov 26, 2019
@timja
Copy link
Member

timja commented Nov 26, 2019

I can cut a release tomorrow if @timja is busy with other stuff. We have this major fix + Plugin documentation publishing fix which severely impacts plugin users. IMHO they justify the new release

I can do today

@jtnord
Copy link
Member Author

jtnord commented Nov 26, 2019

Thanks timja

@timja
Copy link
Member

timja commented Nov 26, 2019

Currently waiting for release drafter :(
One PR was mislabelled, I fixed it and merged a dep PR to trigger it, 12 minutes and it still hasn't updated :(

@timja
Copy link
Member

timja commented Nov 26, 2019

released https://github.com/jenkinsci/configuration-as-code-plugin/releases/tag/configuration-as-code-1.34

@oleg-nenashev oleg-nenashev changed the title #2298 workaround - Add a configurable delay to avoid race condition in initializer #280 workaround - Add a configurable delay to avoid race condition in initializer Nov 26, 2019
@oleg-nenashev
Copy link
Member

Thanks a lot @timja !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A PR that adds a feature - used by Release Drafter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants