Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Add nightly test stability workflow (aka The Gauntlet) #343

Merged
merged 20 commits into from
Mar 19, 2022

Conversation

bryphe-coder
Copy link
Contributor

@kylecarbs has done some awesome work to stamp out and fix intermittent failures to keep our CI stable, fast, and green.

This adds a nightly stability run - where we'll kick off a high-iteration run each night and report to datadog (with a stability- prefix) to keep an eye on any intermittencies that crop up.

@bryphe-coder bryphe-coder self-assigned this Feb 21, 2022
- name: Test with Mock Database
shell: bash
env:
GOCOUNT: 25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may want to make this a workflow dispatch variable with a default, so when you run things you can tune the count you want: https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/

more docs for how to define inputs here: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#providing-inputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neato! I didn't know about this - thanks @jawnsy

Giving it a test run here: 5b867a1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found out apparently those inputs don't get populated for pull requests (thread here: https://github.community/t/how-to-set-a-default-value-for-a-workflow-dispatch-variable/156961) - just for workflows, so added a default here too: a0c1800

@codecov
Copy link

codecov bot commented Feb 21, 2022

Codecov Report

Merging #343 (3dd10a4) into main (22f820c) will increase coverage by 0.09%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #343      +/-   ##
==========================================
+ Coverage   67.98%   68.08%   +0.09%     
==========================================
  Files         164      164              
  Lines        9309     9309              
  Branches       85       85              
==========================================
+ Hits         6329     6338       +9     
+ Misses       2353     2344       -9     
  Partials      627      627              
Flag Coverage Δ
unittest-go-macos-latest 62.68% <ø> (+0.04%) ⬆️
unittest-go-ubuntu-latest 67.36% <ø> (-0.04%) ⬇️
unittest-go-windows-2022 62.24% <ø> (+0.08%) ⬆️
unittest-js 63.63% <ø> (ø)
Impacted Files Coverage Δ
peerbroker/dial.go 75.43% <0.00%> (-7.02%) ⬇️
peerbroker/listen.go 81.35% <0.00%> (-2.55%) ⬇️
peer/conn.go 80.05% <0.00%> (-0.26%) ⬇️
provisionerd/provisionerd.go 79.77% <0.00%> (+2.20%) ⬆️
provisioner/echo/serve.go 57.50% <0.00%> (+2.50%) ⬆️
codersdk/provisionerdaemons.go 61.53% <0.00%> (+3.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22f820c...3dd10a4. Read the comment docs.

.github/workflows/coder-test-stability.yaml Outdated Show resolved Hide resolved
runs-on: ${{ matrix.os }}
strategy:
matrix:
os:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to consider adding another variable for a run number, so you can run multiple in parallel (e.g. instance: [1, 2] would let you run 2x each base OS, for a total of 6 runs happening in parallel). since this is happening in the background, the main cost is monetary, and running more instances in parallel will help surface flakes sooner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can try it out. Since this is a nightly job, the performance isn't as important as for a PR gate - so it might be actually be OK to run them serially. Trying it out in: e5c4e18

Copy link
Member

@kylecarbs kylecarbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big fan! Gotta catch em' all

@bryphe-coder bryphe-coder marked this pull request as ready for review February 23, 2022 05:16
env:
DATADOG_API_KEY: ${{ secrets.DATADOG_API_KEY }}
DD_DATABASE: fake
GIT_COMMIT_MESSAGE: stability-${{ github.event.head_commit.message }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a custom tag instead, that might be better!

It'll require minor edits to the DataDog script I wrote, but that should be simple enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that sounds good - like maybe an additional trait we could send up?

"test.traits": fmt.Sprintf(`{"database":["%s"]}`, os.Getenv("DD_DATABASE")),

I also noticed we're sending up the workflow:

"ci.pipeline.name":   os.Getenv("GITHUB_WORKFLOW"),

That might be sufficient by itself to differentiate the stability run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ya, it looks like that is available to differentiate and filter these runs:
image

So I'll remove this superfluous stability- prefix then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 14345df

@stale
Copy link

stale bot commented Mar 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no activity occurs in the next 5 days.

@stale stale bot added the stale This issue is like stale bread. label Mar 9, 2022
@stale stale bot closed this Mar 14, 2022
@bryphe-coder bryphe-coder reopened this Mar 14, 2022
@stale stale bot removed the stale This issue is like stale bread. label Mar 14, 2022

pull_request:
branches:
# TODO: Remove before merge
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

.github/workflows/coder-test-stability.yaml Outdated Show resolved Hide resolved
@greyscaled greyscaled enabled auto-merge (squash) March 19, 2022 23:05
@greyscaled greyscaled merged commit 64d9d00 into main Mar 19, 2022
@greyscaled greyscaled deleted the bryphe/refactor/add-stability-workflow branch March 19, 2022 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants