testing: migrate to trampoline_v2#3860
Conversation
|
Currently most of the tests are passing locally with the new script. |
|
|
426c07a to
a977ec4
Compare
|
The Somehow it's not passing locally. I'll skip this test for local runs. |
|
I run py-3.7 for all the tests with the following command and they all passed! $ TERM=vt100 \
TRAMPOLINE_BUILD_FILE=.kokoro/tests/run_tests.sh \
.kokoro/trampoline_v2.sh 2>&1 | tee /tmp/build.log |
8f86299 to
5fac350
Compare
|
Rebased to the master, and experimenting Trampoline V2 with the py-3.7 build. Fingers crossed 🤞 |
|
Doh. The regex part was wrong. |
|
All the presubmit builds passed. This is amazing :) However, I need to make some more changes. |
There was a problem hiding this comment.
Why set during build-time? Why can't these just be set at execution time?
There was a problem hiding this comment.
Sorry for a long explanation.
- To allow sudo we need a real username
- One of the tests (compute/oslogin iirc) needs a real username
So we need the real username for those reasons. So anyways we need to create a user and allow sudo at build time. Well, if we create a user, I can also add the user to the docker group.
Then we can pass --user={$user_uid}:${user_gid} which is better than --user={$user_uid}:${docker_gid}.
The files created by the former will be the same uid:gid as you, but the latter will create a file with different gid.
Usecase: we can use the script to generate the README:
scripts/run_tests_local.sh cdn readmegen
This will generate the README.rst in cdn directory with the same uid and gid as you.
There was a problem hiding this comment.
So basically, the direct answer to your question is:
Yes, we can pass --user={$user_uid}:${docker_gid} and docker works, but some other tests will fail (compute/oslogin, and potentially other tests which need sudo access).
There was a problem hiding this comment.
Docker already has functionality for setting a default user/group as part of a Dockerfile. What do we gain by having trampoline manage this instead of letting the container do it itself?
I think my concern is that the container is supposed to be the encapsulation of the entire environment, including users/groups. I don't think it makes sense to have trampoline play a part in managing that. If folks want to have a different user/group by default, they should set that in their Dockerfile.
There was a problem hiding this comment.
The current code makes sure that, docker will run with the same uid:gid as the caller of the trampoline_v2.sh script. This is dynamic. When you run locally, it's going to be your usual uid:gid, when it's run in Kokoro, it's going to be uid:gid of kbuilder user, etc etc.
There was a problem hiding this comment.
So, to summarize:
- I want to use
--user "${user_uid}:${user_gid}"for making sure the container will create files with the same uid:gid as the caller. - It works for most of our tests, but some tests are failing.
- compute/oslogin: We need a real user name for this uid -> we need to create the user in the container OS. This is why we passing the build arg.
- docker in docker: We need to make sure the user in the container have a permission on /var/run/docker.sock. -> This is why we add our user to the group with the same id as the docker group in the host os.
- sudo: Some tests potentially need root access, so we need to add sudoers entry on the fly. Actually our
run_tests.shneeds it for storing cloudsql_proxy to /bin.
So there are reasons. If you come up with a solution with the passing tests both for compute/oslogin and vision/automl, I'm open to such a solution.
You can download this branch and try out the script with the following command:
$ scripts/decrypt-secrets.sh # Make sure we have the secrets
$ scripts/run_tests_local.sh compute/oslogin py-3.7
$ scripts/run_tests_local.sh vision/automl py-3.7
# or
$ RUN_TESTS_DIRS=compute/oslogin:vision/automl \
TRAMPOLINE_BUILD_FILE=.kokoro/tests/run_tests.sh \
.kokoro/trampoline_v2.sh
These commands start from building the Docker image, so you can test end 2 end for your proposed solution.
There was a problem hiding this comment.
Doesn't this also mean that it rebuilds (and potentially pushes a new image) every time a new user runs it?
There was a problem hiding this comment.
Yes, it takes a while for the first run but after that they will be fast because docker pull can detect the local layers and docker build is using --cache-from. These command never upload to anywhere by default. You have to provide two envvars to use that option like this:
TRAMPOLINE_IMAGE=gcr.io/tmatsuo-test/my-python-multi \
TRAMPOLINE_IMAGE_UPLOAD=true
scripts/run_tests_local.sh compute/oslogin py-3.7There was a problem hiding this comment.
Having said that, the upload option is not intend to be used by humans.
My intention is to set that flag only in one of the periodic builds. When the tests are all green, it will push the built image. We won't need the Cloud Build hook any more (although it won't harm anything).
There was a problem hiding this comment.
I also hope that in the future, we can merge all the periodic builds into one. If we pass down the multiple nox sessions, it might be possible. The tests will take a very long time, so we may want to have a tool to run tests in parallel. I thought someone is developing such a tool.
|
@kurtisvg Thanks for the review! I think I addressed your comments and answered your questions. PTAL |
|
Don't worry about the Python 3.8 failure. That's the one I manually started and stopped once I confirmed the build cop bot is working. |
also prevents decrypt-secrets.sh to override existing files. also stop unnecessary (and harmful) workaround.
we can add the user to the docker group for accessing docker socket.
|
lint and py-2.7 passed. I'll merge this now with the admin power. |
This PR changes the build file to
trampoline_v2.shfor all the jobs.trampoline_v2.shdoes 3 things.in a way that is somewhat compatible with trampoline_v1.
It also behaves better in many ways.
.trampolinerc.Python specific info
To run this script, first decrypt our test secret by running
scripts/decrypt-secrets.sh. This need secret-accessor role on our Cloud Project.Then run the script:
You can optionally change these environment variables:
I also added a handy script for running tests locally.
scripts/run_tests_local.shtakes only one required argument, and run test sessions for that directory. Example: