Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renovating CI and usage of HZDR GitLab CI #967

Open
SimeonEhrig opened this issue Mar 27, 2020 · 24 comments
Open

Renovating CI and usage of HZDR GitLab CI #967

SimeonEhrig opened this issue Mar 27, 2020 · 24 comments

Comments

@SimeonEhrig
Copy link
Member

SimeonEhrig commented Mar 27, 2020

Discussions points

  • how to distribute the build matrix to GitHub Actions and HZDR CI
  • what can be shared between the CIs, e.g. docker container
  • ...
@SimeonEhrig
Copy link
Member Author

That's my idea, how we can provide the environment for the CI through a Container:

  • describe the containers with hpccm -> it's a recipe generator, which allows to describe the recipes in Python and generate Docker and Singularity Recipes from it
  • build the container with the environment one time and not for every test run
    • means the container contains the right compiler (version), boost version, CUDA and so one
    • the built container will be pushed to a container registry
    • the CI simply pull the container, clone Alpaka and build and run the test
  • it can be done, by a separate CI pipeline or in an extra Repo

@psychocoderHPC
Copy link
Member

I suggest to create one container for each architecture e.g. BaseNvidia, BaseX86, BaseOpenPower and BaseArm.
All container should have compiled the dependencies (e.g. Boost, CMake) in all needed versions (best with spack).
Not sure if we should provide for each compiler a separate container. Maybe it is easier for us to providethe base container with multiple compiler, compiled with spack.

@SimeonEhrig
Copy link
Member Author

Do you mean, the CI Job should look something like this:

$> docker pull x86container
$> docker run x86container
$container> spack load boost@1.67
$container> spack load cuda@10.2
$container> git clone alpaka
$container> cmake && make && ctest

@psychocoderHPC
Copy link
Member

Do you mean, the CI Job should look something like this:

$> docker pull x86container
$> docker run x86container
$container> spack load boost@1.67
$container> spack load cuda@10.2
$container> git clone alpaka
$container> cmake && make && ctest

Yes, cmake should also be loaded by spack but in general this should be the easiest way to maintain a huge amount of different versions for the dependencies without creating a container for each combination.

@ax3l Could you please have a look to our ideas.

@SimeonEhrig
Copy link
Member Author

Okay, than we have to check, if a CI can build this container, because it needs a lot of compute power. I think free runner, like GitHub Actions and GitLab.com Runner will simply get a time out and I'm note sure, if our Runner can build Docker or Singularity container because of security restrictions.

@psychocoderHPC
Copy link
Member

As I know our CI can do it or we use our dev system in the office to build the container.

@SimeonEhrig
Copy link
Member Author

Okay. Last time, I had some problems to build Singularity container but something is changed in the meantime.

The dev system should not be the solution, because it has a restricted access and does not allow a flexible development.

@ax3l
Copy link
Member

ax3l commented Mar 31, 2020

@ax3l Could you please have a look to our ideas.

Sounds good!

I am not sure if you already need spack since the goal is to keep Alpaka very-low on dependencies, besides vendor libs.

For an optimal usage of free CI services and distribution by tasks (e.g. one service sanitizers, one service windows, one service HPC stack, etc.) I recommend checking out ADIOS2's CI integration into various services. It's quite brilliant and does not oversubscribe a single one, so you avoid backlog due to multiple jobs in the build matrix. If you want to run some things within docker containers, CircleCI is a good candidate that supports this by default and a bit better than travis. Also use GH action extensively, it's relyable and has a lot of resources. On top, use Azure pipelines, also a lot of free resources for the three major OS.

If you want to use Spack for CI, be aware that cloud provider constantly change your micro-arch. In order to use Spack efficiently, disable micro-arch optimizations and go with generic x86, as shown in openPMD-api. I also run a nightly integration test of openPMD-api via azure pipelines to check if all my packages can be installed.

Does this help a little?

@SimeonEhrig
Copy link
Member Author

Thanks for the hints. At the moment, I work on a prototype for the mega spack container (I have a strange problem with alpaka and I'm not sure if my environment is corrupted, so I need the container for verification). I had also recognize the target problem. At the moment, I use sandybridge as target, but your solution is better.

For the container I implemented a incremental container build system, otherwise using spack is not possible, because the install routine is not stable enough. I nice side effect is, that we can distribute the build on small jobs like in the CI.

@psychocoderHPC
Copy link
Member

I am not sure if you already need spack since the goal is to keep Alpaka very-low on dependencies, besides vendor libs.

Spack will be only used to maintain the X boost, compiler and maybe CMake version. As @SimeonEhrig said the architecture of the packages can be something generric.

For an optimal usage of free CI services

We will stay with a few test within free instances, e.g. for Windows and OSX but move complex matrix to our inhouse CI to reduce the load of the free instances.

@SimeonEhrig
Copy link
Member Author

I implemented a first version of the Alpaka Spack CI container and get some information:
ComputationalRadiationPhysics/crp-container#1

@psychocoderHPC
Can you please verify, if the container works. I get an error with CUDA but I had the same error on another system. So, I'm not sure, if it is something wrong with my environments or it is an error in Alpaka. I sent you the test instruction via Mattermost.

In general, I think we need to split the container. It is possible, that the container is to big for a registry and CI Runner. Maybe we can split it by the compiler, e.g. a gcc container, a clang container, a CUDA container ...

@psychocoderHPC
Copy link
Member

psychocoderHPC commented Mar 31, 2020 via email

@BenjaminW3
Copy link
Member

All the build jobs have now been moved to github actions.
This should make the builds faster and we get rid of the second platform.
I will try to clean the linux builds up a bit more so that they get faster.

@BenjaminW3 BenjaminW3 changed the title Renovating and moving CI to GitHub Actions and HZDR GitLab CI Renovating CI and usage of HZDR GitLab CI Jun 7, 2020
@SimeonEhrig
Copy link
Member Author

@ax3l @BenjaminW3
Current status of the GitLab CI:

  • The concept differs slightly from the current concept of the GitHub Actions CI. The container already contains the necessary dependencies like cmake, boost, gcc, and so on. The test itself simply downloads the image, clones the project repo into it, and build and runs the tests. The concept is still in the testing phase, but I hope that we will get some advantages: reduced test run times, no need for caching mechanisms and the possibility to easily set up the CI environment on a local system via docker pull to perform quick tests and develop applications.
  • The containers are shared by alpaka, cupla and PIConGPU (https://gitlab.com/hzdr/crp/alpaka-group-container): cupla already uses the infrastructure, @psychocoderHPC is currently integrating it in PIConGPU and alpaka is in the planning stage.
  • In order to be able to install software into the container as is possible with the current approach, @psychocoderHPC suggests that we extend the bash scripts, that they check if a software package is already available and if not, install the software.
    • At the moment we only have specialized runners (1x x86 + AMD GPU, 1x x86 + Nvidia, 1x Power8 + 2x Nvidia GPUs, 1x ARM Clavium Thunder), so we lacks a little bit on x86 CPUs, but with enough utilization it should be possible to buy some x86 nodes for the CI.
  • Also the distribution of jobs is different from Free Runner. We have few strong runners, so it makes more sense to run many configurations in a single job to save startup time and container initialization time.

@BenjaminW3
Copy link
Member

BenjaminW3 commented Jul 14, 2020

I would really like to use the pre-installed docker images in the github actions. This would make the CI much faster.

@psychocoderHPC
Copy link
Member

I would really like to use the pre-installed docker images in the github actions. This would make the CI much faster.

@SimeonEhrig already prepared docker container for alpaka. We currently use those in cupla.
We are currently removing the last issues e.g. that it was not possible to select the gcc if the cuda container is used.

@psychocoderHPC
Copy link
Member

  • Also the distribution of jobs is different from Free Runner. We have few strong runners, so it makes more sense to run many configurations in a single job to save startup time and container initialization time.

alpaka has enough files to compile. We should not group multiple tests together, one job per configuration.

@SimeonEhrig
Copy link
Member Author

I would really like to use the pre-installed docker images in the github actions. This would make the CI much faster.

When I set up the GitLab CI, it should no much extra work, to use the images also for GitHub Actions.

  • Also the distribution of jobs is different from Free Runner. We have few strong runners, so it makes more sense to run many configurations in a single job to save startup time and container initialization time.

alpaka has enough files to compile. We should not group multiple tests together, one job per configuration.

In case of Alpaka, it could make sense. I think, we need to experiment a little bit.

@BenjaminW3
Copy link
Member

Especially since we want to run CUDA and HIP tests on the internal gitlab CI because we can not run those tests on free hardware, we have long enough compile times and also test run-times for those builds.
The goal should still be to have the compile-only builds in free CI since they should also be available to forks. They would largely benefit from the pre-baked test docker image with all the clang, gcc and cuda versions already installed.

@ax3l
Copy link
Member

ax3l commented Jul 15, 2020

I know we go for docker containers, just want to mention that in Spack a new feature for GitLab called spack ci ("Spack Pipelines") landed that might be useful at some point of our pipelines or container partitioning steps (or not): https://spack.readthedocs.io/en/latest/pipelines.html

@SimeonEhrig
Copy link
Member Author

Especially since we want to run CUDA and HIP tests on the internal gitlab CI because we can not run those tests on free hardware, we have long enough compile times and also test run-times for those builds.

The time out should be not the problem. We can set the time out up to 3h. The problem is the number of runners. At the moment, we have just one AMD and Nvdia x86 runner. But if we generate enough utilization, it should be not a problem, to buy new HW.

@SimeonEhrig
Copy link
Member Author

There is some new pressure to implement CUDA and HIP runtime test: #1340
I try to integrate something in the next weeks.

@BenjaminW3 For the runtime CI, I want to use prebuild docker contatiner, which I already create here. We use it already for cupla and PIConGPU. But I have constantly problems with server timeouts. Therefore, I want to ask, if I can overtake the travis_retry.sh script for the container project?

@BenjaminW3
Copy link
Member

BenjaminW3 commented Jun 15, 2021

travis_retry originates from the open source travis repository. The licenses looked compatible to me back when I added it. I do not know where the license header with my name came from. We once added license headers to all files automatically. Maybe it slipped through.

@SimeonEhrig
Copy link
Member Author

GitLab CI introduces MacOS runners: https://about.gitlab.com/blog/2021/08/23/build-cloud-for-macos-beta/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants