Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃惓 automated docker image build fails for e-mission-server #926

Closed
shankari opened this issue Jun 25, 2023 · 18 comments
Closed

馃惓 automated docker image build fails for e-mission-server #926

shankari opened this issue Jun 25, 2023 · 18 comments

Comments

@shankari
Copy link
Contributor

Example run:
https://github.com/e-mission/e-mission-server/actions/runs/5371438586/jobs/9744656517

the build looks like it succeeded, and the image was pushed successfully to the server.
But in reality, the image is corrupted because it does not have the conda environment installed

See error around line 382

Successfully installed at /root/miniconda-23.1.0. Please activate with 'source setup/activateXXX.sh' in every terminal where you want to use conda
For conda, found 23.1.0, expected 23.1.0, all is good!
Installing using conda now
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... /root/miniconda-23.1.0/etc/profile.d/conda.sh: line 1:    88 Killed                  ( "$CONDA_EXE" $_CE_M $_CE_CONDA "$@" )
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

After that, only a few packages are targeted for download

  environment location: /root/miniconda-23.1.0

  added / updated specs:
    - cryptography=40.0.2
    - wheel=0.40.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |       2_kmp_llvm           6 KB  conda-forge
    ca-certificates-2023.5.7   |       hbcca054_0         145 KB  conda-forge
    certifi-2023.5.7           |     pyhd8ed1ab_0         149 KB  conda-forge
    conda-23.1.0               |   py39hf3d152e_0         906 KB  conda-forge
    cryptography-40.0.2        |   py39h079d5ae_0         1.4 MB  conda-forge
    libgcc-ng-13.1.0           |       he5830b7_0         758 KB  conda-forge
    llvm-openmp-12.0.1         |       h4bd325d_1         2.8 MB  conda-forge
    openssl-3.1.1              |       hd590300_0         2.5 MB  conda-forge
    python-3.9.7               |hf930737_3_cpython        27.5 MB  conda-forge
    python_abi-3.9             |           3_cp39           6 KB  conda-forge
    wheel-0.40.0               |     pyhd8ed1ab_0          54 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        36.3 MB

If run manually, the image builds successfully.

$ docker build -t shankari/e-mission-server:gis-based-mode-detection-jun-25-manual-build .
[+] Building 789.0s (18/18) FINISHED
 => [internal] load build definition from Dockerfile                                                     0.0s
 => => transferring dockerfile: 934B                                                                     0.0s
 => [internal] load .dockerignore                                                                        0.0s
 => => transferring context: 2B                                                                          0.0s
 => [internal] load metadata for docker.io/library/ubuntu:jammy                                          2.8s
 => [auth] library/ubuntu:pull token for registry-1.docker.io                                            0.0s
 => [ 1/12] FROM docker.io/library/ubuntu:jammy@sha256:6120be6a2b7ce665d0cbddc3ce6eae60fe94637c6a669853  0.0s
 => => resolve docker.io/library/ubuntu:jammy@sha256:6120be6a2b7ce665d0cbddc3ce6eae60fe94637c6a66985312  0.0s
 => => sha256:6120be6a2b7ce665d0cbddc3ce6eae60fe94637c6a66985312d1f02f63cc0bcd 1.13kB / 1.13kB           0.0s
 => => sha256:83f0c2a8d6f266d687d55b5cb1cb2201148eb7ac449e4202d9646b9083f1cee0 424B / 424B               0.0s
 => => sha256:99284ca6cea039c7784d1414608c6e846dd56830c2a13e1341be681c3ffcc8ac 2.30kB / 2.30kB           0.0s
 => [internal] load build context                                                                        2.8s
 => => transferring context: 221.89MB                                                                    2.8s
 => [ 2/12] WORKDIR /usr/src/app                                                                         0.1s
 => [ 3/12] RUN apt-get -y -qq update                                                                    6.5s
 => [ 4/12] RUN apt-get install -y -qq curl                                                              6.4s
 => [ 5/12] RUN apt-get install -y -qq wget                                                              2.4s
 => [ 6/12] RUN apt-get install -y jq                                                                    2.6s
 => [ 7/12] RUN apt-get -y remove --purge build-essential                                                1.3s
 => [ 8/12] RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*                           0.2s
 => [ 9/12] COPY . .                                                                                     3.3s
 => [10/12] RUN chmod u+x ./.docker/setup_config.sh                                                      0.2s
 => [11/12] RUN bash -c "./.docker/setup_config.sh"                                                    745.3s
 => [12/12] RUN chmod u+x ./.docker/docker_start_script.sh                                               0.3s
 => exporting to image                                                                                  17.3s
 => => exporting layers                                                                                 17.3s
 => => writing image sha256:825375a911c07c43851824e7cfe80da4b0cb6469027f09d806b21d46cbd9f25d             0.0s
 => => naming to docker.io/shankari/e-mission-server:gis-based-mode-detection-jun-25-manual-build                         0.0s

Issues:

  • If one of the steps of the image build did not work, then why did the image build not fail?
  • why is the image build failing in the first place? Is it a memory issue? I do see an error Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
    • will using -m work to increase memory and allow the build to complete?
    • has something else gone wrong with the docker build (maybe deprecated packages?) that is causing it to be so slow? Note that I have seen lots of automated test install failures over the weekend that passed on retry
    • will switching to libmamba (or mamba in general) solve our problems?
@shankari
Copy link
Contributor Author

for (1) I know that the job fails if docker commands fail in general (e.g. https://github.com/e-mission/e-mission-server/actions/runs/5370984366/jobs/9743442408) but it is not failing if this particular command fails. In the case where it correctly failed, we see Error: Process completed with exit code 1.
In this case, there was a killed process in the middle of a setup script

Successfully installed at /root/miniconda-23.1.0. Please activate with 'source setup/activateXXX.sh' in every terminal where you want to use conda
For conda, found 23.1.0, expected 23.1.0, all is good!
Installing using conda now
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... /root/miniconda-23.1.0/etc/profile.d/conda.sh: line 1:    88 Killed                  ( "$CONDA_EXE" $_CE_M $_CE_CONDA "$@" )
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

which is probably called from setup/setup.sh or something in the Dockerfile

So the fix for this is probably to use set -e so that you can fail the setup script if any line within it fails.

@shankari
Copy link
Contributor Author

for the (2) which is preventing the process from being killed in the first place, conda/conda#8051 seems to indicate that the long term fix is to switch to libmamba. We may also want to just switch to mamba in general because it also supports conda-tree, which will allow us to optimize/remove unused conda dependencies.

@shankari
Copy link
Contributor Author

for testing (1), you already have a PR. It is not building the docker image workflow (probably needs to already be merged to master) but it does build a docker image in test-with-docker so you should be able to see if that builds successfully.

In terms of seeing "If one of the steps of the image build did not work, then why did the image build not fail?" you can also run the docker command locally and build an image manually. You can then introduce a failure in the scripts run from docker and try and see if the local docker build fails.

This can happen in parallel with the testing using Github Actions.

@shankari
Copy link
Contributor Author

shankari commented Jun 29, 2023

assuming that it is a memory issue (which seems likely) you can also try to force the error by reducing the resources for docker on your laptop. Get it down to 4GB or something and it will probably fail the same way. And if it deosn't fail the same way, it is probably not a memory issue. which is also interesting although harder.

@nataliejschultz
Copy link

The image build is successful on my local machine. I was able to recreate the error in my fork of e-mission-server using GitHub actions. The error is occurring while running setup_config.sh, specifically during the command conda install -c conda-forge cryptography=40.0.2 wheel=0.40.0 . I've started trying different commands in the setup script to see what happens.

@nataliejschultz
Copy link

nataliejschultz commented Jul 5, 2023

Adding "set -e" before the command did not cause the setup to fail. Instead, the program completely skipped this section that we saw before:

Collecting package metadata (current_repodata.json): ...working... done Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve. Collecting package metadata (repodata.json): ...working... done Solving environment: ...working... done

But continued with the next steps and said that the image was successfully built.

Adding set -eE and trap 'echo error' ERR command as suggested here didn't do anything.

Since I was able to find the exact command that is erroring, I added -vv to the Conda install to see the verbose output. This resulted in a different error
Solving environment: ...working... /root/miniconda-23.1.0/etc/profile.d/conda.sh: line 1: 89 Killed ( "$CONDA_EXE" $_CE_M $_CE_CONDA "$@" ):

@shankari
Copy link
Contributor Author

shankari commented Jul 7, 2023

Tried to test how scripts error out:

  1. Created a bash script with the single line tail -f /dev/null
  2. Ran it

Test 1

  1. In another terminal, killed the bash script
  2. The return code was 143, which is an error return code
$ bash /tmp/test_kill.sh
Terminated: 15
$ echo $?
143

$ ps -aef | grep kill
1901711793  7639  6985   0 11:39AM ttys022    0:00.01 bash /tmp/test_kill.sh
1901711793  7752  7674   0 11:39AM ttys028    0:00.00 grep kill
$ kill 7639

Test 2:

  1. in another terminal, killed the tail command
  2. the return code was still 143, which is an error return code
$ bash /tmp/test_kill.sh
/tmp/test_kill.sh: line 1:  8230 Terminated: 15          tail -f /dev/null
$ echo $?
143

$ ps -aef | grep tail
1901711793  7640     1   0 11:39AM ttys022    0:00.01 tail -f /dev/null
1901711793  8230  8229   0 11:41AM ttys022    0:00.01 tail -f /dev/null
1901711793  8278  7674   0 11:41AM ttys028    0:00.00 grep tail
$ kill 8230

@shankari
Copy link
Contributor Author

shankari commented Jul 7, 2023

Now, add this bash script to setup_config.sh and kill it

Bingo!

$ git diff
diff --git a/.docker/setup_config.sh b/.docker/setup_config.sh
index 22d64b1d..45cfa34e 100644
--- a/.docker/setup_config.sh
+++ b/.docker/setup_config.sh
@@ -1,4 +1,6 @@
 echo "About to start conda update, this may take some time..."
+
+bash /tmp/test_kill.sh
 source setup/setup_conda.sh Linux-x86_64
 # now install the emission environment
 source setup/setup.sh

And then

$ ps -aef | grep tail
1901711793 10862 10861   0 11:50AM ttys022    0:00.00 tail -f /dev/null
1901711793 10903  7674   0 11:50AM ttys028    0:00.01 grep tail
$ kill 10862

Generates

$ bash .docker/setup_config.sh
About to start conda update, this may take some time...
/tmp/test_kill.sh: line 1: 10862 Terminated: 15          tail -f /dev/null
Installing for version 23.1.0 and platform Linux-x86_64
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 28 66.6M   28 19.0M    0     0  2316k      0  0:00:29  0:00:08  0:00:21 2673k^C

So when the process is killed, the script does not error out but instead continues to the next step.

@shankari
Copy link
Contributor Author

shankari commented Jul 7, 2023

So what if we try set -e as suggested

$ git diff
diff --git a/.docker/setup_config.sh b/.docker/setup_config.sh
index 22d64b1d..2570ad00 100644
--- a/.docker/setup_config.sh
+++ b/.docker/setup_config.sh
@@ -1,4 +1,7 @@
+set -e
 echo "About to start conda update, this may take some time..."
+
+bash /tmp/test_kill.sh
 source setup/setup_conda.sh Linux-x86_64
 # now install the emission environment
 source setup/setup.sh

And then

$ ps -aef | grep tail
1901711793 11890 11889   0 11:53AM ttys022    0:00.00 tail -f /dev/null
1901711793 11917  7674   0 11:53AM ttys028    0:00.00 grep tail
$ kill 11890

Generates

$ bash .docker/setup_config.sh
About to start conda update, this may take some time...
/tmp/test_kill.sh: line 1: 11890 Terminated: 15          tail -f /dev/null
$ echo $?
143

@shankari
Copy link
Contributor Author

shankari commented Jul 7, 2023

Let's verify whether docker build actually does fail if one of the commands fails

Trying to build with

$ git diff
diff --git a/.docker/setup_config.sh b/.docker/setup_config.sh
index 22d64b1d..bf2b82f6 100644
--- a/.docker/setup_config.sh
+++ b/.docker/setup_config.sh
@@ -1,4 +1,7 @@
+set -e
 echo "About to start conda update, this may take some time..."
+
+exit 143
 source setup/setup_conda.sh Linux-x86_64
 # now install the emission environment
 source setup/setup.sh

we get

$ docker build -t shankari/e-mission-server:test-failure .
[+] Building 90.8s (16/17)
 => [internal] load build definition from Dockerfile                                                                                0.0s
 => => transferring dockerfile: 43B                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                   0.0s
 => => transferring context: 2B                                                                                                     0.0s
 => [internal] load metadata for docker.io/library/ubuntu:jammy                                                                     2.6s
 => [auth] library/ubuntu:pull token for registry-1.docker.io                                                                       0.0s
 => [ 1/12] FROM docker.io/library/ubuntu:jammy@sha256:0bced47fffa3361afa981854fcabcd4577cd43cebbb808cea2b1f33a3dd7f508            49.6s
 => => resolve docker.io/library/ubuntu:jammy@sha256:0bced47fffa3361afa981854fcabcd4577cd43cebbb808cea2b1f33a3dd7f508               0.0s
 => => sha256:0bced47fffa3361afa981854fcabcd4577cd43cebbb808cea2b1f33a3dd7f508 1.13kB / 1.13kB                                      0.0s
 => => sha256:b060fffe8e1561c9c3e6dea6db487b900100fc26830b9ea2ec966c151ab4c020 424B / 424B                                          0.0s
 => => sha256:5a81c4b8502e4979e75bd8f91343b95b0d695ab67f241dbed0d1530a35bde1eb 2.30kB / 2.30kB                                      0.0s
 => => sha256:3153aa388d026c26a2235e1ed0163e350e451f41a8a313e1804d7e1afb857ab4 29.53MB / 29.53MB                                   48.5s
 => => extracting sha256:3153aa388d026c26a2235e1ed0163e350e451f41a8a313e1804d7e1afb857ab4                                           0.9s
 => [internal] load build context                                                                                                   0.8s
 => => transferring context: 21.99MB                                                                                                0.7s
 => [ 2/12] WORKDIR /usr/src/app                                                                                                    0.4s
 => [ 3/12] RUN apt-get -y -qq update                                                                                              14.1s
 => [ 4/12] RUN apt-get install -y -qq curl                                                                                         9.7s
 => [ 5/12] RUN apt-get install -y -qq wget                                                                                         3.2s
 => [ 6/12] RUN apt-get install -y jq                                                                                               3.5s
 => [ 7/12] RUN apt-get -y remove --purge build-essential                                                                           1.8s
 => [ 8/12] RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*                                                      0.3s
 => [ 9/12] COPY . .                                                                                                                4.9s
 => [10/12] RUN chmod u+x ./.docker/setup_config.sh                                                                                 0.3s
 => ERROR [11/12] RUN bash -c "./.docker/setup_config.sh"                                                                           0.3s
------
 > [11/12] RUN bash -c "./.docker/setup_config.sh":
#16 0.286 About to start conda update, this may take some time...
------
executor failed running [/bin/sh -c bash -c "./.docker/setup_config.sh"]: exit code: 143

@shankari
Copy link
Contributor Author

shankari commented Jul 7, 2023

so it seems like this should just work.

  • concretely if a process (e.g. tail -f) is killed, it generates an error code
  • if the shell script that wraps it (e.g. setup_config.sh) has set -e (and only if it has set -e) the shell script also fails with an error code
  • if a shell script run by a Dockerfile during docker build fails, then the docker build fails

We have shown these in the last 4 comments.
So I don't see why set -e does not work for the conda install process being killed.
I would like to see the related logs

One caveat may be that conda itself is a wrapper that does not have set -e but we should be able to figure that out by adding

@@ -9,6 +12,7 @@ source setup/setup.sh
 ## level instead of upgrading to cryptography=40)
 ## So we just manually upgrade the failing dependencies in the base image
 conda install -c conda-forge cryptography=40.0.2 wheel=0.40.0
+echo $?

Look forward to seeing logs soon

@nataliejschultz
Copy link

After researching and running tests, it is apparent that the issue is due to conda not having enough memory to complete the install. The error occurs during setup_config.sh, but does not necessarily have to occur while running a specific command. See this test run, where the error occurs despite commenting out the (modified version of the) conda install command:

-if [[ $(conda install -c conda-forge cryptography=40.0.2 wheel=0.40.0) == *Killed* ]]; then
-    echo "!!!!! Error !!!!!"
-    exit code 1
- fi
# if [[ $(conda install -c conda-forge cryptography=40.0.2 wheel=0.40.0) == *Killed* ]]; then
#     echo "!!!!! Error !!!!!"
#     exit code 1
# fi
# conda install -vv -c conda-forge cryptography=40.0.2 wheel=0.40.0

Installing using conda now
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... finished installing e-mission environment
finished installing conda
/root/miniconda-23.1.0/etc/profile.d/conda.sh: line 1:    88 Killed                  ( "$CONDA_EXE" $_CE_M $_CE_CONDA "$@" )
Will remove 43 (55.0 MB) tarball(s).

This was unexpected, and meant that editing the conda install command to flag the error wouldn't work.

Additionally, I was able to build the docker image successfully by setting my local docker resources to 12GB, rather than the default 8GB. This essentially verified that it was a memory issue, and that migrating to mamba might be the best long-term solution.

Let's compare the outputs of running the docker build with different modifications. For this first local run, I have my docker resources set to 3.8GB, so there is no way it will succeed:

This is a run reflective of the current state of the Dockerfile and setup_config.sh

#15 11.02 Installing using conda now
#15 11.67 Collecting package metadata (repodata.json): ...working... done
#15 105.8 Solving environment: ...working... /root/miniconda-23.1.0/etc/profile.d/conda.sh: line 1:    94 Killed                  ( "$CONDA_EXE" $_CE_M $_CE_CONDA "$@" )
#15 316.2 finished installing e-mission environment
#15 317.1 Collecting package metadata (current_repodata.json): ...working... done
#15 333.4 Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
#15 333.4 Collecting package metadata (repodata.json): ...working... done
#15 345.6 Solving environment: ...working... done

It does not fail despite the killed error, and continues later on to say that it has finished setup_config.sh:

#15 390.7 finished setup_config.sh                   
#15 DONE 390.8s

The computer does not recognize the error:

$ echo $?
0

First try
My first attempt at getting the error to quit the program: adding set -e to the setup_config.sh just above where I thought the issue was occurring in this manner:

-conda install -c conda-forge cryptography=40.0.2 wheel=0.40.0 

set -e 
conda install -c conda-forge cryptography=40.0.2 wheel=0.40.0 

Resulted in a run that appears successful, with no error messages.

Installing using conda now
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

...

finished installing conda

It appears that set -e is not doing what it is supposed to do, and is preventing the error from showing up.

I then tried
this suggestion with set -e in the same location, and added a trap command:

-set -e 
+set -eE

+trap 'echo Something went wrong!' ERR 
+conda install -c conda-forge cryptography=40.0.2 wheel=0.40.0 

Where the error occurred the same as if I hadn't changed anything:

Installing using conda now
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... finished installing e-mission environment
/root/miniconda-23.1.0/etc/profile.d/conda.sh: line 1:    88 Killed                  ( "$CONDA_EXE" $_CE_M $_CE_CONDA "$@" )

A result similar to my first attempt occurred

from a push 2 days ago when un-commenting out set -e at the top of setup_config.sh and running a modified version of the conda install command.
Note: At this point, I didn't know that the command itself wasn't causing the issue, so I was trying to have it echo an error message if it caught the word "Killed" in the output.

-# set -e
+set -e
 echo "About to start conda update, this may take some time..."

- conda install -c conda-forge cryptography=40.0.2 wheel=0.40.0
+if [[ $(conda install -vv -c conda-forge cryptography=40.0.2 wheel=0.40.0) == *Killed* ]]; then
+  echo "!!!!! Error !!!!!"

+fi

When run in GH actions, the run succeeded and the Killed error never showed:

Installing using conda now
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

From these tests, it would appear that set -e does not work as intended. I tried many other iterations, one of which included attempting to cause the Dockerfile itself to catch the error:

-RUN bash -c "./.docker/setup_config.sh" && echo "finished setup_config.sh"

+RUN bash -c "./.docker/setup_config.sh" && echo "finished setup_config.sh" 
+
+RUN if [ -z "$_CE_CONDA" ]; then echo 'Environment variable _CE_CONDA must be specified. Exiting.'; exit 1; fi

Which did error out:

Environment variable _CE_CONDA must be specified. Exiting.
The command '/bin/sh -c if [ -z "$_CE_CONDA" ]; then echo 'Environment variable _CE_CONDA must be specified. Exiting.'; exit 1; fi' returned a non-zero code: 1
Error: Process completed with exit code 1.

However, it was erroring out regardless of memory allocation. After meeting with @shankari, I found out that I had just tried to change too many things at once. set -e errors out the script successfully if it's placed at the very top of setup_config.sh, with no modifications to the conda command or any other part of the process:

Installing using conda now
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... /root/miniconda-23.1.0/etc/profile.d/conda.sh: line 1:    88 Killed                  ( "$CONDA_EXE" $_CE_M $_CE_CONDA "$@" )
The command '/bin/sh -c bash -c "./.docker/setup_config.sh" && echo "finished setup_config.sh"' returned a non-zero code: 137

Error: Process completed with exit code 137.

@shankari
Copy link
Contributor Author

shankari commented Jul 7, 2023

After meeting with @shankari, I found out that I had just tried to change too many things at once. set -e errors out the script successfully if it's placed at the very top of setup_config.sh, with no modifications to the conda command or any other part of the process:

Yay! Yes, one step at a time is the way to go.
And I have always put set commands at the top of the file.
I guess there is no reason why it should not work later (it sets the variable when it is executed), but that's just the convention I have always seen 馃槃

So I should be seeing PR soon? Don't forget to put it into "To Review" in the project when it is ready

@shankari
Copy link
Contributor Author

shankari commented Jul 10, 2023

Options are:

  • libmamba: with conda
    • pro: only need to switch library
    • con: might not be enough
  • mamba: switch everything over, including the CLI
    • pro: better long-term alignment
    • con: change may get more complex that we would like

Quick back of the envelope for how much we would need to replace in the server:
~ 20 locations, all in shell scripts

  • Is it OK to only tackle the server or are there other repos that also need to be changed?
    • would be good to do eventually but will they break if they are not changed right now?
    • probably not, server sets up the environment for everything

Decision: Try libmamba for ~ one day (combined with PR for the first task, one day)
If that doesn't work, switch to full mamba.

@nataliejschultz
Copy link

I have tested libmamba both on GH actions and locally, by adding to the setup_conda.sh script:

hash -r
conda install -n base conda-libmamba-solver
conda config --set solver libmamba
conda config --set always_yes yes

Making libmamba the solver allowed the script to run successfully on GH actions, and was about twice as fast as previous runs. This increase in speed was also observed in my local tests. However, when running locally, there was no change in the need to increase docker resources up to 12GB.

Based on my tests, I think that libmamba could work as a temporary solution, but is not the best long-term solution. I don't know the specifics of how much memory is allocated to docker processes on our server, though, so some input from @shankari would be enlightening.

@shankari
Copy link
Contributor Author

shankari commented Jul 11, 2023

@nataliejschultz we do not install conda on our server. The conda install happens as part of the image build, so as part of the GH actions. Then we just run the image on our server. For additional context, you can see the PR related to the docker image build action (image_build_push.yml), e-mission/e-mission-server#875, fixing #752. You can find this history yourself in the future by using the file history.

If this works, please go ahead and submit a PR. I can merge it before I rebuild the images for the release, and you can have two changes included in the release 馃槃

@shankari
Copy link
Contributor Author

Confirmed that after this merge, the docker build fails as expected.
https://github.com/e-mission/e-mission-server/actions/runs/5514034360/jobs/10052835910

Successfully installed at /root/miniconda-23.1.0. Please activate with 'source setup/activateXXX.sh' in every terminal where you want to use conda
For conda, found 23.1.0, expected 23.1.0, all is good!
Installing using conda now
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... /root/miniconda-23.1.0/etc/profile.d/conda.sh: line 1:    89 Killed                  ( "$CONDA_EXE" $_CE_M $_CE_CONDA "$@" )
The command '/bin/sh -c bash -c "./.docker/setup_config.sh"' returned a non-zero code: 137

Error: Process completed with exit code 137.

@shankari
Copy link
Contributor Author

Test with docker also failed

web-server_1  | Collecting package metadata (repodata.json): ...working... done
db_1          | {"t":{"$date":"2023-07-10T23:41:33.393+00:00"},"s":"I",  "c":"COMMAND",  "id":20499,   "ctx":"ftdc","msg":"serverStatus was very slow","attr":{"timeStats":{"after basic":91,"after asserts":194,"after connections":393,"after electionMetrics":1532,"after extra_info":2729,"after flowControl":3524,"after freeMonitoring":4191,"after globalLock":4191,"after locks":4191,"after logicalSessionRecordCache":4203,"after mirroredReads":4203,"after network":4203,"after opLatencies":4203,"after opReadConcernCounters":4203,"after opcounters":4203,"after opcountersRepl":4203,"after oplogTruncation":4216,"after repl":4216,"after security":4216,"after storageEngine":4216,"after tcmalloc":4216,"after trafficRecording":4226,"after transactions":4226,"after transportSecurity":4226,"after twoPhaseCommitCoordinator":4226,"after wiredTiger":4238,"at end":4263}}}
web-server_1  | Solving environment: ...working... /root/miniconda-23.1.0/etc/profile.d/conda.sh: line 1:    89 Killed                  ( "$CONDA_EXE" $_CE_M $_CE_CONDA "$@" )
web-server_1  | setup/setup_tests.sh: line 7: python: command not found
web-server_1  | setup/setup_tests.sh: line 8: python: command not found
web-server_1  | Running tests...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants