Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the install_start watchdog be 3 hours? #61

Closed
jvillal-amp opened this issue Aug 10, 2020 · 1 comment
Closed

Should the install_start watchdog be 3 hours? #61

jvillal-amp opened this issue Aug 10, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@jvillal-amp
Copy link
Contributor

So the install_start watchdog appears to be set to 3 hours when the install starts. Which seems kind of long for the use cases I do.

Are installs really taking longer than 30 minutes?

The watchdog gets set to 3 hours when the install process tells the beaker lab controller that it is starting the install. So the install has started at that point.

The reason I am asking is we are sometimes seeing issues where our install finishes reboots and then something goes wrong in our firmware and it doesn't boot from the disk. So then it takes about 3 hours for the job to get aborted.

Looking at the watchdog flow it appears to be:

  • 30 minutes when the provision process starts
  • 3 hours when the install kickstart hits the "/install_start/<recipe_id>" on the lab controller
  • 40 minutes (2400 seconds) when Restraint starts
  • Then task specific watchdog time once task starts.

install_start code:

  • def install_start(self, recipe_id=None):
    """ Called from %pre of the test machine. We call
    the server's install_start()
    """
    _debug_id = "(unspecified recipe)" if recipe_id is None else recipe_id
    logger.debug("install_start for R:%s" % _debug_id)
    return self.hub.recipes.install_start(recipe_id)
  • def install_start(self, recipe_id=None):
    """
    Records the start of a recipe's installation. The watchdog is extended
    by 3 hours to allow the installation to complete.
    """
    try:
    recipe = Recipe.by_id(recipe_id)
    except InvalidRequestError:
    raise BX(_("Invalid Recipe ID %s" % recipe_id))
    if not recipe.installation:
    raise BX(_('Recipe %s not provisioned yet') % recipe_id)
    installation = recipe.installation
    if not installation.install_started:
    installation.install_started = datetime.utcnow()
    # extend watchdog by 3 hours 60 * 60 * 3
    kill_time = 10800
    logger.debug('Extending watchdog for %s', recipe.t_id)
    recipe.extend(kill_time)
@jvillal-amp jvillal-amp added the enhancement New feature or request label Aug 10, 2020
@StykMartin
Copy link
Contributor

hi @jvillal-amp,

tl;dr - Yes. The values are Correct.

Long one. All values were precisely picked based on systems used in Red Hat and issues with NFS / HTTP / FTP services servicing all the content. It may look like overkill but we bridged some of the values in the past.

If your system is connected to console, then make sure that the Install detector + Panic detector are turned on. It will abort installation in case you will meet criterium.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants