Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Windows CI #904

Merged
merged 9 commits into from
Aug 24, 2022
Merged

Fix Windows CI #904

merged 9 commits into from
Aug 24, 2022

Conversation

MattiSG
Copy link
Member

@MattiSG MattiSG commented Aug 23, 2022

Since early August, tests fail for Windows in CI. This changeset implements the GitHub Actions recommended solution. However, tests (actions/runner-images#5949 (comment)) show that this solution is flaky, and we might have to re-run tests when they fail. The observed frequency of failure is about ¼. This is still a significant improvement over the current 100% failure rate. Read below for an in-depth explanation of the root cause.

This changeset also unifies both Linux and Windows steps to use the default, pre-installed version of MongoDB v5 on the GitHub Actions runners. This means faster tests in CI for Linux, decreased resources consumption, and unifying the MongoDB versions: until now, we were specifying v4 for Linux and relying on the pre-installed v5 for Windows. Unfortunately, the tests are not faster for Windows, as starting the MongoDB service is long (actions/runner-images#5949 (comment)).

Finally, this changeset updates the Windows runners from Windows Server 2019 to Windows Server 2022, removes dependencies on third-party actions, and brings MongoDB logs on Linux runs.

Root cause analysis

After investigation, this seems to be a consequence of actions/runner-images#5949, where GitHub Actions changed the default status of a MongoDB service that is preinstalled on their Windows runners from “started” to “stopped”. We used to install MongoDB through chocolatey, a package manager for Windows, however it seems this never did anything beyond wasting resources, since the package was already preinstalled (actions/runner-images#20) and started. This change from GitHub revealed that we never actually started the service.

Implementation choices

The most elegant solution to use MongoDB would be to use the services instruction, as illustrated in 50d8a24.

However, Docker is not supported on Windows runners (actions/runner#904). We thus cannot use the services instruction, which otherwise is very useful to declaratively start a service.

The alternative would be to start mongod in a dedicated step. However, this has to be done in the background, otherwise the step is blocking. Unfortunately, the --fork option is not supported on Windows and using &, while supported in PowerShell, does not seem to start the server on GitHub Actions.

This is all detailed in https://github.com/orgs/community/discussions/30083.

I decided to keep the different major tracks followed in the Git history, so alternatives that were explored can be retrieved.

Long-term solution

I recommend discussing the relevance of dropping MongoDB tests on platforms other than Linux. This would speed up the pipeline significantly, and I believe the cost is acceptable: MongoDB is an optimisation for large-scale libraries. Those are most likely to be run on Linux, and considering our available resources, it sounds fair to me that we don't invest in testing this specific setup cross-platform.

Copy link
Member

@martinratinaud martinratinaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me and seems to have been a very tough investigation.
Thanks for that

.github/workflows/test.yml Outdated Show resolved Hide resolved
@martinratinaud
Copy link
Member

Considering your proposals, I suggest we pick the Long Term solution you're proposing.
Whenever a reuser wants to use Windows in conjunction with mongoDB, we always can rework this.

@MattiSG
Copy link
Member Author

MattiSG commented Aug 24, 2022

I just ran a benchmark with three different syntaxes. They all show a ~10% failure rate (details: actions/runner-images#5949 (comment)), so I switched back to a more readable syntax that hopefully addresses your review comment.

As for the long term solution, let's see how this 10% failure rate behaves in “real life” testing. If it is a problem to manually restart builds 10% of the time (or we want to speed up the pipeline by about 1 minute), then let's invest in disabling MongoDB tests per platform indeed.

@MattiSG MattiSG merged commit 881d899 into main Aug 24, 2022
@MattiSG MattiSG deleted the fix-windows-ci branch August 24, 2022 10:45
@MattiSG
Copy link
Member Author

MattiSG commented Aug 24, 2022

I just saw that our infrastructure setup already considers deploying MongoDB on anything else than Linux is not supported:

- include_tasks: mongo.yml
  when:
    […]
    - ansible_distribution != 'Debian' or […]

@martinratinaud
Copy link
Member

I think this was put in place for deploying on vagrant on M1, I did not know it was also taking Windows into account.

@Ndpnt
Copy link
Member

Ndpnt commented Aug 29, 2022

I just saw that our infrastructure setup already considers deploying MongoDB on anything else than Linux is not supported:

- include_tasks: mongo.yml
  when:
    […]
    - ansible_distribution != 'Debian' or […]

Not really, it deploys MongoDB on anything else than Debian with ARM architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants