Renovating CI and usage of HZDR GitLab CI #967

SimeonEhrig · 2020-03-27T13:13:44Z

Discussions points

how to distribute the build matrix to GitHub Actions and HZDR CI
what can be shared between the CIs, e.g. docker container
...

SimeonEhrig · 2020-03-27T13:24:46Z

That's my idea, how we can provide the environment for the CI through a Container:

describe the containers with hpccm -> it's a recipe generator, which allows to describe the recipes in Python and generate Docker and Singularity Recipes from it
build the container with the environment one time and not for every test run
- means the container contains the right compiler (version), boost version, CUDA and so one
- the built container will be pushed to a container registry
- the CI simply pull the container, clone Alpaka and build and run the test
it can be done, by a separate CI pipeline or in an extra Repo

psychocoderHPC · 2020-03-27T15:34:13Z

I suggest to create one container for each architecture e.g. BaseNvidia, BaseX86, BaseOpenPower and BaseArm.
All container should have compiled the dependencies (e.g. Boost, CMake) in all needed versions (best with spack).
Not sure if we should provide for each compiler a separate container. Maybe it is easier for us to providethe base container with multiple compiler, compiled with spack.

SimeonEhrig · 2020-03-27T15:52:19Z

Do you mean, the CI Job should look something like this:

$> docker pull x86container
$> docker run x86container
$container> spack load boost@1.67
$container> spack load cuda@10.2
$container> git clone alpaka
$container> cmake && make && ctest

psychocoderHPC · 2020-03-27T18:12:00Z

Do you mean, the CI Job should look something like this:

$> docker pull x86container
$> docker run x86container
$container> spack load boost@1.67
$container> spack load cuda@10.2
$container> git clone alpaka
$container> cmake && make && ctest

Yes, cmake should also be loaded by spack but in general this should be the easiest way to maintain a huge amount of different versions for the dependencies without creating a container for each combination.

@ax3l Could you please have a look to our ideas.

SimeonEhrig · 2020-03-30T07:39:27Z

Okay, than we have to check, if a CI can build this container, because it needs a lot of compute power. I think free runner, like GitHub Actions and GitLab.com Runner will simply get a time out and I'm note sure, if our Runner can build Docker or Singularity container because of security restrictions.

psychocoderHPC · 2020-03-30T08:02:06Z

As I know our CI can do it or we use our dev system in the office to build the container.

SimeonEhrig · 2020-03-30T08:10:08Z

Okay. Last time, I had some problems to build Singularity container but something is changed in the meantime.

The dev system should not be the solution, because it has a restricted access and does not allow a flexible development.

ax3l · 2020-03-31T06:26:41Z

@ax3l Could you please have a look to our ideas.

Sounds good!

I am not sure if you already need spack since the goal is to keep Alpaka very-low on dependencies, besides vendor libs.

For an optimal usage of free CI services and distribution by tasks (e.g. one service sanitizers, one service windows, one service HPC stack, etc.) I recommend checking out ADIOS2's CI integration into various services. It's quite brilliant and does not oversubscribe a single one, so you avoid backlog due to multiple jobs in the build matrix. If you want to run some things within docker containers, CircleCI is a good candidate that supports this by default and a bit better than travis. Also use GH action extensively, it's relyable and has a lot of resources. On top, use Azure pipelines, also a lot of free resources for the three major OS.

If you want to use Spack for CI, be aware that cloud provider constantly change your micro-arch. In order to use Spack efficiently, disable micro-arch optimizations and go with generic x86, as shown in openPMD-api. I also run a nightly integration test of openPMD-api via azure pipelines to check if all my packages can be installed.

Does this help a little?

SimeonEhrig · 2020-03-31T07:21:02Z

Thanks for the hints. At the moment, I work on a prototype for the mega spack container (I have a strange problem with alpaka and I'm not sure if my environment is corrupted, so I need the container for verification). I had also recognize the target problem. At the moment, I use sandybridge as target, but your solution is better.

For the container I implemented a incremental container build system, otherwise using spack is not possible, because the install routine is not stable enough. I nice side effect is, that we can distribute the build on small jobs like in the CI.

psychocoderHPC · 2020-03-31T08:41:54Z

I am not sure if you already need spack since the goal is to keep Alpaka very-low on dependencies, besides vendor libs.

Spack will be only used to maintain the X boost, compiler and maybe CMake version. As @SimeonEhrig said the architecture of the packages can be something generric.

For an optimal usage of free CI services

We will stay with a few test within free instances, e.g. for Windows and OSX but move complex matrix to our inhouse CI to reduce the load of the free instances.

SimeonEhrig · 2020-03-31T15:25:00Z

I implemented a first version of the Alpaka Spack CI container and get some information:
ComputationalRadiationPhysics/crp-container#1

@psychocoderHPC
Can you please verify, if the container works. I get an error with CUDA but I had the same error on another system. So, I'm not sure, if it is something wrong with my environments or it is an error in Alpaka. I sent you the test instruction via Mattermost.

In general, I think we need to split the container. It is possible, that the container is to big for a registry and CI Runner. Maybe we can split it by the compiler, e.g. a gcc container, a clang container, a CUDA container ...

psychocoderHPC · 2020-03-31T16:43:43Z

Splitting by compiler makes IMO sense. I can test the container on our dev system. Thats the only system with docker where I have access. Am 31. März 2020 17:25:16 MESZ schrieb Simeon Ehrig <notifications@github.com>:

…

I implemented a first version of the Alpaka Spack CI container and get some information: ComputationalRadiationPhysics/crp-container#1 @psychocoderHPC Can you please verify, if the container works. I get an error with CUDA but I had the same error on another system. So, I'm not sure, if it is something wrong with my environments or it is an error in Alpaka. I sent you the test instruction via Mattermost. In general, I think we need to split the container. It is possible, that the container is to big for a registry and CI Runner. Maybe we can split it by the compiler, e.g. a gcc container, a clang container, a CUDA container ... -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ComputationalRadiationPhysics/alpaka/issues/967#issuecomment-606696681

BenjaminW3 · 2020-05-26T12:09:39Z

All the build jobs have now been moved to github actions.
This should make the builds faster and we get rid of the second platform.
I will try to clean the linux builds up a bit more so that they get faster.

SimeonEhrig · 2020-07-14T13:56:31Z

@ax3l @BenjaminW3
Current status of the GitLab CI:

The concept differs slightly from the current concept of the GitHub Actions CI. The container already contains the necessary dependencies like cmake, boost, gcc, and so on. The test itself simply downloads the image, clones the project repo into it, and build and runs the tests. The concept is still in the testing phase, but I hope that we will get some advantages: reduced test run times, no need for caching mechanisms and the possibility to easily set up the CI environment on a local system via docker pull to perform quick tests and develop applications.
The containers are shared by alpaka, cupla and PIConGPU (https://gitlab.com/hzdr/crp/alpaka-group-container): cupla already uses the infrastructure, @psychocoderHPC is currently integrating it in PIConGPU and alpaka is in the planning stage.
In order to be able to install software into the container as is possible with the current approach, @psychocoderHPC suggests that we extend the bash scripts, that they check if a software package is already available and if not, install the software.
- At the moment we only have specialized runners (1x x86 + AMD GPU, 1x x86 + Nvidia, 1x Power8 + 2x Nvidia GPUs, 1x ARM Clavium Thunder), so we lacks a little bit on x86 CPUs, but with enough utilization it should be possible to buy some x86 nodes for the CI.
Also the distribution of jobs is different from Free Runner. We have few strong runners, so it makes more sense to run many configurations in a single job to save startup time and container initialization time.

BenjaminW3 · 2020-07-14T14:32:23Z

I would really like to use the pre-installed docker images in the github actions. This would make the CI much faster.

psychocoderHPC · 2020-07-14T15:05:56Z

I would really like to use the pre-installed docker images in the github actions. This would make the CI much faster.

@SimeonEhrig already prepared docker container for alpaka. We currently use those in cupla.
We are currently removing the last issues e.g. that it was not possible to select the gcc if the cuda container is used.

psychocoderHPC · 2020-07-14T15:10:22Z

Also the distribution of jobs is different from Free Runner. We have few strong runners, so it makes more sense to run many configurations in a single job to save startup time and container initialization time.

alpaka has enough files to compile. We should not group multiple tests together, one job per configuration.

SimeonEhrig · 2020-07-14T18:25:37Z

I would really like to use the pre-installed docker images in the github actions. This would make the CI much faster.

When I set up the GitLab CI, it should no much extra work, to use the images also for GitHub Actions.

Also the distribution of jobs is different from Free Runner. We have few strong runners, so it makes more sense to run many configurations in a single job to save startup time and container initialization time.

alpaka has enough files to compile. We should not group multiple tests together, one job per configuration.

In case of Alpaka, it could make sense. I think, we need to experiment a little bit.

BenjaminW3 · 2020-07-14T19:43:04Z

Especially since we want to run CUDA and HIP tests on the internal gitlab CI because we can not run those tests on free hardware, we have long enough compile times and also test run-times for those builds.
The goal should still be to have the compile-only builds in free CI since they should also be available to forks. They would largely benefit from the pre-baked test docker image with all the clang, gcc and cuda versions already installed.

ax3l · 2020-07-15T17:34:05Z

I know we go for docker containers, just want to mention that in Spack a new feature for GitLab called spack ci ("Spack Pipelines") landed that might be useful at some point of our pipelines or container partitioning steps (or not): https://spack.readthedocs.io/en/latest/pipelines.html

SimeonEhrig · 2020-07-20T07:02:59Z

Especially since we want to run CUDA and HIP tests on the internal gitlab CI because we can not run those tests on free hardware, we have long enough compile times and also test run-times for those builds.

The time out should be not the problem. We can set the time out up to 3h. The problem is the number of runners. At the moment, we have just one AMD and Nvdia x86 runner. But if we generate enough utilization, it should be not a problem, to buy new HW.

SimeonEhrig · 2021-06-15T14:56:46Z

There is some new pressure to implement CUDA and HIP runtime test: #1340
I try to integrate something in the next weeks.

@BenjaminW3 For the runtime CI, I want to use prebuild docker contatiner, which I already create here. We use it already for cupla and PIConGPU. But I have constantly problems with server timeouts. Therefore, I want to ask, if I can overtake the travis_retry.sh script for the container project?

BenjaminW3 · 2021-06-15T16:28:36Z

travis_retry originates from the open source travis repository. The licenses looked compatible to me back when I added it. I do not know where the license header with my name came from. We once added license headers to all files automatically. Maybe it slipped through.

SimeonEhrig · 2021-08-31T09:25:15Z

GitLab CI introduces MacOS runners: https://about.gitlab.com/blog/2021/08/23/build-cloud-for-macos-beta/

SimeonEhrig assigned SimeonEhrig, psychocoderHPC and BenjaminW3 Mar 27, 2020

SimeonEhrig added Type:Enhancement Type:Testing labels Mar 27, 2020

psychocoderHPC unassigned BenjaminW3 Apr 3, 2020

BenjaminW3 mentioned this issue Apr 9, 2020

Travis: Stages #405

Closed

BenjaminW3 changed the title ~~Renovating and moving CI to GitHub Actions and HZDR GitLab CI~~ Renovating CI and usage of HZDR GitLab CI Jun 7, 2020

SimeonEhrig mentioned this issue Jul 14, 2020

Azure Pipelines #1068

Closed

BenjaminW3 closed this as completed Jul 14, 2020

BenjaminW3 reopened this Jul 14, 2020

SimeonEhrig mentioned this issue Jul 29, 2020

Review the HIP documentation #1105

Open

j-stephan mentioned this issue Dec 17, 2020

Feature wish list #1232

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Renovating CI and usage of HZDR GitLab CI #967

Renovating CI and usage of HZDR GitLab CI #967

SimeonEhrig commented Mar 27, 2020 •

edited

Loading

SimeonEhrig commented Mar 27, 2020

psychocoderHPC commented Mar 27, 2020

SimeonEhrig commented Mar 27, 2020

psychocoderHPC commented Mar 27, 2020

SimeonEhrig commented Mar 30, 2020

psychocoderHPC commented Mar 30, 2020

SimeonEhrig commented Mar 30, 2020

ax3l commented Mar 31, 2020 •

edited

Loading

SimeonEhrig commented Mar 31, 2020

psychocoderHPC commented Mar 31, 2020

SimeonEhrig commented Mar 31, 2020

psychocoderHPC commented Mar 31, 2020 via email

BenjaminW3 commented May 26, 2020

SimeonEhrig commented Jul 14, 2020

BenjaminW3 commented Jul 14, 2020 •

edited

Loading

psychocoderHPC commented Jul 14, 2020

psychocoderHPC commented Jul 14, 2020

SimeonEhrig commented Jul 14, 2020

BenjaminW3 commented Jul 14, 2020

ax3l commented Jul 15, 2020 •

edited

Loading

SimeonEhrig commented Jul 20, 2020

SimeonEhrig commented Jun 15, 2021

BenjaminW3 commented Jun 15, 2021 •

edited

Loading

SimeonEhrig commented Aug 31, 2021

Renovating CI and usage of HZDR GitLab CI #967

Renovating CI and usage of HZDR GitLab CI #967

Comments

SimeonEhrig commented Mar 27, 2020 • edited Loading

Discussions points

SimeonEhrig commented Mar 27, 2020

psychocoderHPC commented Mar 27, 2020

SimeonEhrig commented Mar 27, 2020

psychocoderHPC commented Mar 27, 2020

SimeonEhrig commented Mar 30, 2020

psychocoderHPC commented Mar 30, 2020

SimeonEhrig commented Mar 30, 2020

ax3l commented Mar 31, 2020 • edited Loading

SimeonEhrig commented Mar 31, 2020

psychocoderHPC commented Mar 31, 2020

SimeonEhrig commented Mar 31, 2020

psychocoderHPC commented Mar 31, 2020 via email

BenjaminW3 commented May 26, 2020

SimeonEhrig commented Jul 14, 2020

BenjaminW3 commented Jul 14, 2020 • edited Loading

psychocoderHPC commented Jul 14, 2020

psychocoderHPC commented Jul 14, 2020

SimeonEhrig commented Jul 14, 2020

BenjaminW3 commented Jul 14, 2020

ax3l commented Jul 15, 2020 • edited Loading

SimeonEhrig commented Jul 20, 2020

SimeonEhrig commented Jun 15, 2021

BenjaminW3 commented Jun 15, 2021 • edited Loading

SimeonEhrig commented Aug 31, 2021

SimeonEhrig commented Mar 27, 2020 •

edited

Loading

ax3l commented Mar 31, 2020 •

edited

Loading

BenjaminW3 commented Jul 14, 2020 •

edited

Loading

ax3l commented Jul 15, 2020 •

edited

Loading

BenjaminW3 commented Jun 15, 2021 •

edited

Loading