-
Notifications
You must be signed in to change notification settings - Fork 955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] Concurrent uploads fail in CI/CD - Good practices in CI/CD #14175
Comments
Thanks for your qustion. If you are doing parallel uploads, I'd recommend the following to remove the potential race conditions completely:
It is also planned to implement the upload of multiple binaries parallel internally (like Please let us know if the above 2 steps upload helps. Also if you manage to get the server version, that is also relevant. |
Hello @memsharded, thanks for the very quick answer! I will search for a solution to guarantee the sources are uploaded only once. It is not straight-forward though, because it is difficult to separate the build & the upload parts (a job cannot be locked between two steps, and two jobs do not share the conan cache). I see two solutions:
I have a follow-up question on sharing the conan cache: From the conan point of view, would concurrent From what I saw in the conan-1.0 doc, as long as there is no delete operation it should be fine. Is this still true in conan-2.0 ? If so, all jobs can share the cache, and a single subsequent job can upload at once all binaries (using the On the server side: I do not have delete permissions, and I checked the artifactory server version using the {
"version": "7.59.9",
"revision": "75909900",
"entitlements": {
"EVENT_BASED_PULL_REPLICATION": true,
"SMART_REMOTE_TARGET_FOR_EDGE": false,
"REPO_REPLICATION": true,
"MULTIPUSH_REPLICATION": true
}
} NB: The issue does not seem to arise in a larger project (with jobs during a few minutes or more I guess there is enough uncertainty in the final time of each job so that a first upload arrives soon enough compared to the others). |
I am afraid this is not possible. The Conan cache is not concurrent safe. Conan 1.X had some locking that could help to alleviate some cases (but not all), but still it wasn't guaranteed to be good for concurrent usage. Conan 2.0 has initially removed all synchronization mechanisms in the cache, because it had to be redesigned from the ground up to be multi-revision ready, and it is guaranteed to not be safe for concurrent usage. It is in the roadmap to try to make 2.0 cache concurrent, and hopefully the new cache design will help to make it safer for concurrent usage. But this effort hasn't started yet, and it might take a while, there are still other priorities.
yes, the larger the builds, the more unlikely to happen. Note the race condition happens between the check of the existence of the revision in the server and the upload of the files. For most cases this should be quite short, like a couple of seconds at most. It might be a bit higher if the |
I don't see how separating the recipe and binary uploads into two commands would that fix the race? It would seem that I would think using something like X-Checksum-Deploy was the only real fix here (letting artifactory accept the upload from whoever got there first, and return HTTP 201 to any subsequent or concurrent attempts as long as the checksums match). Or I suppose conan could emulate the same - retry the pre-upload check for an existing file with the expected checksum after a failure, and quietly swallow the error and continue on if it would now have decided to skip the upload which failed. |
The idea is that this is done only once, by one agent. It can be done even before the agents start to build, by the main agent, with a
The X-Checksum-Deploy is in place and working, and it will avoid most of the problems, as long as the overwrite/detele permissions allow it. But the DELETE permissions (to overwrite) seems to be rejecting |
Ok, then I will go for the retry solution that at least works fine. I do not see another simple solution that does not imply copying a lot of stuff between independent CI/CD jobs. I hope it will be possible to make the conan cache concurrent safe at one point, as it could drastically improve CI/CD performance for projects with a lot of dependencies by sharing a single conan cache across job instances.
The pathological case I discovered is on a small package but with a very quick CI (about 30s in total). On a large package there is no problem, it would be interesting to see if on a sane build adding large "data.txt" is enough to make the concurrency problem appear again. But from the discussions above I guess it would.
Thanks for the tip, I will keep this in mind as well :) |
Regarding this, now in latest Conan 2.0.17, there is the |
Hi,
I have an issue on a Gitlab CI/CD pipeline building and uploading a simple C++ package. I have 4 parallel jobs (they do not share any data, in particular they have independent conan caches) that all build & upload the same package with different configurations (typically debug/release + static/shared library).
By doing so, I randomly get failing jobs returning a "needs DELETE permission" error at the
![image](https://private-user-images.githubusercontent.com/92524713/249223589-c68cd4ca-21b6-4f4c-b628-607b814ce170.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEzNDc1MTQsIm5iZiI6MTcyMTM0NzIxNCwicGF0aCI6Ii85MjUyNDcxMy8yNDkyMjM1ODktYzY4Y2Q0Y2EtMjFiNi00ZjRjLWI2MjgtNjA3YjgxNGNlMTcwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE5VDAwMDAxNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFjYmZlZTVlZTk2NGRlYTdlNDUzYzE4NGFlZGI1OTQ3MzViYjhhYmJmNjZhZDRjZDEzYjU3N2Q3NGI0MTA4MDUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.fECZHeLlNtyGyPbWj5fqwHbjuX2tcebIwVswdIcHlFw)
conan upload
step. Sometimes all jobs pass, sometimes one or two are failing with the same error:When re-run independently, these failed jobs are successful. To be more specific: the error arises during the upload of sources (
conan_sources.tgz
), that is the same between all four jobs (only the build config is different). Example of error I get at when doingconan upload
:It seems to be a concurrency issue on the (artifactory) server side, and I naturally wonder if my use of conan is well defined or not. Having completely independent jobs firing uploads of a same recipe revision at the same time might not be a good idea, although it is supposed to happen in general. Is there a good practice I am missing here to automate the upload of several conan packages ? Otherwise I guess I have to dig into the artifactory server configuration ?
OS: Linux
Docker image: conanio/gcc10
Conan version: 2.0.7
Minimal example to reproduce the bug: I used the template project given by
conan new cmake_lib -d name=test_parallel_ci -d version=0.1
and a minimal gitlab CI with 4 independent parallel jobs building debug/release, static/shared versions of the library.NB: I have no particular knowledge on the artifactory server I am using.
Have you read the CONTRIBUTING guide?
The text was updated successfully, but these errors were encountered: