-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Renovating CI and usage of HZDR GitLab CI #967
Comments
That's my idea, how we can provide the environment for the CI through a Container:
|
I suggest to create one container for each architecture e.g. BaseNvidia, BaseX86, BaseOpenPower and BaseArm. |
Do you mean, the CI Job should look something like this: $> docker pull x86container
$> docker run x86container
$container> spack load boost@1.67
$container> spack load cuda@10.2
$container> git clone alpaka
$container> cmake && make && ctest |
Yes, cmake should also be loaded by spack but in general this should be the easiest way to maintain a huge amount of different versions for the dependencies without creating a container for each combination. @ax3l Could you please have a look to our ideas. |
Okay, than we have to check, if a CI can build this container, because it needs a lot of compute power. I think free runner, like GitHub Actions and GitLab.com Runner will simply get a time out and I'm note sure, if our Runner can build Docker or Singularity container because of security restrictions. |
As I know our CI can do it or we use our dev system in the office to build the container. |
Okay. Last time, I had some problems to build Singularity container but something is changed in the meantime. The dev system should not be the solution, because it has a restricted access and does not allow a flexible development. |
Sounds good! I am not sure if you already need spack since the goal is to keep Alpaka very-low on dependencies, besides vendor libs. For an optimal usage of free CI services and distribution by tasks (e.g. one service sanitizers, one service windows, one service HPC stack, etc.) I recommend checking out ADIOS2's CI integration into various services. It's quite brilliant and does not oversubscribe a single one, so you avoid backlog due to multiple jobs in the build matrix. If you want to run some things within docker containers, CircleCI is a good candidate that supports this by default and a bit better than travis. Also use GH action extensively, it's relyable and has a lot of resources. On top, use Azure pipelines, also a lot of free resources for the three major OS. If you want to use Spack for CI, be aware that cloud provider constantly change your micro-arch. In order to use Spack efficiently, disable micro-arch optimizations and go with generic x86, as shown in openPMD-api. I also run a nightly integration test of openPMD-api via azure pipelines to check if all my packages can be installed. Does this help a little? |
Thanks for the hints. At the moment, I work on a prototype for the mega spack container (I have a strange problem with alpaka and I'm not sure if my environment is corrupted, so I need the container for verification). I had also recognize the target problem. At the moment, I use For the container I implemented a incremental container build system, otherwise using spack is not possible, because the install routine is not stable enough. I nice side effect is, that we can distribute the build on small jobs like in the CI. |
Spack will be only used to maintain the X boost, compiler and maybe CMake version. As @SimeonEhrig said the architecture of the packages can be something generric.
We will stay with a few test within free instances, e.g. for Windows and OSX but move complex matrix to our inhouse CI to reduce the load of the free instances. |
I implemented a first version of the Alpaka Spack CI container and get some information: @psychocoderHPC In general, I think we need to split the container. It is possible, that the container is to big for a registry and CI Runner. Maybe we can split it by the compiler, e.g. a gcc container, a clang container, a CUDA container ... |
Splitting by compiler makes IMO sense. I can test the container on our dev system. Thats the only system with docker where I have access.
Am 31. März 2020 17:25:16 MESZ schrieb Simeon Ehrig <notifications@github.com>:
…I implemented a first version of the Alpaka Spack CI container and get
some information:
ComputationalRadiationPhysics/crp-container#1
@psychocoderHPC
Can you please verify, if the container works. I get an error with CUDA
but I had the same error on another system. So, I'm not sure, if it is
something wrong with my environments or it is an error in Alpaka. I
sent you the test instruction via Mattermost.
In general, I think we need to split the container. It is possible,
that the container is to big for a registry and CI Runner. Maybe we can
split it by the compiler, e.g. a gcc container, a clang container, a
CUDA container ...
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/ComputationalRadiationPhysics/alpaka/issues/967#issuecomment-606696681
|
All the build jobs have now been moved to github actions. |
@ax3l @BenjaminW3
|
I would really like to use the pre-installed docker images in the github actions. This would make the CI much faster. |
@SimeonEhrig already prepared docker container for alpaka. We currently use those in cupla. |
alpaka has enough files to compile. We should not group multiple tests together, one job per configuration. |
When I set up the GitLab CI, it should no much extra work, to use the images also for GitHub Actions.
In case of Alpaka, it could make sense. I think, we need to experiment a little bit. |
Especially since we want to run CUDA and HIP tests on the internal gitlab CI because we can not run those tests on free hardware, we have long enough compile times and also test run-times for those builds. |
I know we go for docker containers, just want to mention that in Spack a new feature for GitLab called |
The time out should be not the problem. We can set the time out up to 3h. The problem is the number of runners. At the moment, we have just one AMD and Nvdia x86 runner. But if we generate enough utilization, it should be not a problem, to buy new HW. |
There is some new pressure to implement CUDA and HIP runtime test: #1340 @BenjaminW3 For the runtime CI, I want to use prebuild docker contatiner, which I already create here. We use it already for cupla and PIConGPU. But I have constantly problems with server timeouts. Therefore, I want to ask, if I can overtake the travis_retry.sh script for the container project? |
travis_retry originates from the open source travis repository. The licenses looked compatible to me back when I added it. I do not know where the license header with my name came from. We once added license headers to all files automatically. Maybe it slipped through. |
GitLab CI introduces MacOS runners: https://about.gitlab.com/blog/2021/08/23/build-cloud-for-macos-beta/ |
Discussions points
The text was updated successfully, but these errors were encountered: