You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected behavior
GDAS diag jobs (analdiag and ediag) should complete before hitting the wallclock limit.
Current behavior
GDAS diag jobs (analdiag and ediag) are killed after hitting the wallclock limit, even at low resolution.
Machines affected
Orion, possibly others
To Reproduce
Setup a cycled experiment
Wait for diag jobs to fail
Context
Uncovered during refactoring of j-jobs.
Detailed Description
Unsure why this seems to be a problem now even at low resolutions. Maybe only on Orion or in certain situations?
Additional Information
Possible Implementation
Wallclock will be increased in a package of PRs for the j-job refactoring to eliminate the immediate issue. Longer-term, a more thorough evaluation of resources is needed.
The text was updated successfully, but these errors were encountered:
Diag jobs were failing due to insufficient wall clock, so the
wall clock is increased until a more complete review of the
resources can be completed.
Refs NOAA-EMC#1215
Diag jobs were failing due to insufficient wall clock, so the
wall clock is increased until a more complete review of the
resources can be completed.
Refs NOAA-EMC#1215
@WalterKolczynski-NOAA If this is exclusive to Orion, there have been intermittent system problems recently, apparently with the filesystem, that slow everything (such as builds) down
Diag jobs were failing due to insufficient wall clock, so the
wall clock is increased until a more complete review of the
resources can be completed.
Refs #1215
Expected behavior
GDAS diag jobs (analdiag and ediag) should complete before hitting the wallclock limit.
Current behavior
GDAS diag jobs (analdiag and ediag) are killed after hitting the wallclock limit, even at low resolution.
Machines affected
Orion, possibly others
To Reproduce
Context
Uncovered during refactoring of j-jobs.
Detailed Description
Unsure why this seems to be a problem now even at low resolutions. Maybe only on Orion or in certain situations?
Additional Information
Possible Implementation
Wallclock will be increased in a package of PRs for the j-job refactoring to eliminate the immediate issue. Longer-term, a more thorough evaluation of resources is needed.
The text was updated successfully, but these errors were encountered: