Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDAS diag jobs failing due to insufficient wallclock #1215

Closed
WalterKolczynski-NOAA opened this issue Jan 6, 2023 · 1 comment
Closed

GDAS diag jobs failing due to insufficient wallclock #1215

WalterKolczynski-NOAA opened this issue Jan 6, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@WalterKolczynski-NOAA
Copy link
Contributor

Expected behavior
GDAS diag jobs (analdiag and ediag) should complete before hitting the wallclock limit.

Current behavior
GDAS diag jobs (analdiag and ediag) are killed after hitting the wallclock limit, even at low resolution.

Machines affected
Orion, possibly others

To Reproduce

  1. Setup a cycled experiment
  2. Wait for diag jobs to fail

Context
Uncovered during refactoring of j-jobs.

Detailed Description
Unsure why this seems to be a problem now even at low resolutions. Maybe only on Orion or in certain situations?

Additional Information

Possible Implementation
Wallclock will be increased in a package of PRs for the j-job refactoring to eliminate the immediate issue. Longer-term, a more thorough evaluation of resources is needed.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the bug Something isn't working label Jan 6, 2023
@WalterKolczynski-NOAA WalterKolczynski-NOAA self-assigned this Jan 6, 2023
WalterKolczynski-NOAA added a commit to WalterKolczynski-NOAA/global-workflow that referenced this issue Jan 6, 2023
Diag jobs were failing due to insufficient wall clock, so the
wall clock is increased until a more complete review of the
resources can be completed.

Refs NOAA-EMC#1215
WalterKolczynski-NOAA added a commit to WalterKolczynski-NOAA/global-workflow that referenced this issue Jan 6, 2023
Diag jobs were failing due to insufficient wall clock, so the
wall clock is increased until a more complete review of the
resources can be completed.

Refs NOAA-EMC#1215
@AndrewEichmann-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA If this is exclusive to Orion, there have been intermittent system problems recently, apparently with the filesystem, that slow everything (such as builds) down

WalterKolczynski-NOAA added a commit that referenced this issue Jan 15, 2023
Diag jobs were failing due to insufficient wall clock, so the
wall clock is increased until a more complete review of the
resources can be completed.

Refs #1215
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants