Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: extend experiment timeout for slurm test #9601

Merged
merged 10 commits into from
Jul 3, 2024
Merged

Conversation

MikhailKardash
Copy link
Contributor

@MikhailKardash MikhailKardash commented Jul 2, 2024

Ticket

None

Description

Slurm restart fails on main because the underlying trials time out due to image pull. This PR does 2 things:

  1. Bypasses top-level config for trial timeout in the affected test to wait for image pulls.
  2. Adds the affected test suite to be testable on feature branches.

Test Plan

CI passes, specifically: test-e2e-slurm-restart

Checklist

  • Changes have been manually QA'd
  • New features have been approved by the corresponding PM
  • User-facing API changes have the "User-facing API Change" label
  • Release notes have been added as a separate file under docs/release-notes/
    See Release Note for details.
  • Licenses have been included for new code which was copied and/or modified from any external code

@cla-bot cla-bot bot added the cla-signed label Jul 2, 2024
Copy link

netlify bot commented Jul 2, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit cb3eda3
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/66857b801d8dac00086f36a4

@MikhailKardash MikhailKardash changed the title Slurm test ci: extend experiment timeout for slurm test Jul 2, 2024
Copy link

codecov bot commented Jul 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 51.64%. Comparing base (2dc59ca) to head (cb3eda3).
Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9601   +/-   ##
=======================================
  Coverage   51.63%   51.64%           
=======================================
  Files        1255     1255           
  Lines      152631   152631           
  Branches     3092     3091    -1     
=======================================
+ Hits        78815    78820    +5     
+ Misses      73659    73654    -5     
  Partials      157      157           
Flag Coverage Δ
backend 43.96% <ø> (+<0.01%) ⬆️
harness 72.76% <ø> (ø)
web 48.63% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 2 files with indirect coverage changes

Copy link
Contributor

@jgongd jgongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@carolinaecalderon carolinaecalderon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but let @NicholasBlaskey know what you've decided on about "skipping the collect logs step as we do for test-e2e-managed-devcluster". If you include that in this PR, it could fix the failing CI

@MikhailKardash MikhailKardash merged commit 000c679 into main Jul 3, 2024
92 of 104 checks passed
@MikhailKardash MikhailKardash deleted the slurm_test branch July 3, 2024 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants