-
Notifications
You must be signed in to change notification settings - Fork 6
Improve layer reuse of python images #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve layer reuse of python images #39
Conversation
3a9f439 to
690ff56
Compare
* by first installing system dependencies and then python Setting python version incl. patch Fix python version env var Improve readability of docker build output Use PR-base-image for datascience-image in a PR Enable buildKit layer caching Fix layer caching Remove breaking cache-parameter Next attempt to get buildcache to work Fix pip installation for datascience image Tweaked order of ARGs to invalidate less layers Use all caches in datascience builds Speed up builds by not loading images unnecessarily Ensure big apt-get layer is shared Don't build gpu images for now Fix main-builcache declaration Extracting apt-get into build-step to ensure reuse Share cache among all builds Build base image separately Strictly separate base from python image build Move build-arg so it's avaiable
0929552 to
cf6c086
Compare
otherwise manifests are including "architecture": "unknown"
|
It seems that vanilla python:3.9 is working. So the issue needs to be solved in installing python :/ |
|
Repro steps: Testing project on staging |
chudyandrej
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the new installation of Python is causing issues with the certificate.
to prevent errors like `certificate verify failed: unable to get local issuer certificate`
| ARG PYTHON_VERSION | ||
| # Layers will be different between python versions from here onwards because of the build-arg | ||
|
|
||
| COPY --from=builder "/usr/local/bin/python${PYTHON_VERSION}" "/usr/local/bin/python${PYTHON_VERSION}" | ||
| COPY --from=builder "/usr/local/bin/pip${PYTHON_VERSION}" "/usr/local/bin/pip${PYTHON_VERSION}" | ||
| COPY --from=builder "/usr/local/lib/python${PYTHON_VERSION}" "/usr/local/lib/python${PYTHON_VERSION}" | ||
|
|
||
| RUN update-alternatives --install /usr/bin/python python "/usr/local/bin/python${PYTHON_VERSION}" 1 | ||
| RUN update-alternatives --install /usr/bin/pip pip "/usr/local/bin/pip${PYTHON_VERSION}" 1 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is somehow copied from official Python because I'm not sure if it is the recommended way or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just afraid about some side effects that could not be transferred from the first stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if you are sure that it is fine, I'm okay with that. It's working, so I think we are good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically just for symlinking python and pip binaries.
Official python is doing it differently, which we can of course adapt to.
# make some useful symlinks that are expected to exist ("/usr/local/bin/python" and friends)
RUN set -eux; \
for src in idle3 pip3 pydoc3 python3 python3-config; do \
dst="$(echo "$src" | tr -d 3)"; \
[ -s "/usr/local/bin/$src" ]; \
[ ! -e "/usr/local/bin/$dst" ]; \
ln -svT "$src" "/usr/local/bin/$dst"; \
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's discuss this. I think I don't understand how this can work.
.circleci/config.yml
Outdated
| matrix: | ||
| parameters: | ||
| python-version: | ||
| - "3.8.19" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a link to a table with the exact versions of Python that can be listed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - "3.8.19" | |
| # Latest versions from this list https://www.python.org/doc/versions/ | |
| - "3.8.19" |
.circleci/config.yml
Outdated
| matrix: | ||
| parameters: | ||
| python-version: | ||
| - "3.8.19" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create a template here to avoid repetitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define:
&pythonVersions
Use:
*pythonVersions
|
PLA-3118 |
…use-of-python-images Improve layer reuse of python images

After this PR all deepnote/python images are sharing a common dependency layer, saving a roundtrip of ca 200mb each.
I shied away of doing something similar to the deepnote-datascience dependencies, as the chance that the packages would be dependent on each specific python version seemed too high.
If that would not be the case, we could build one site-packages bundle for all python versions below 3.11 and one for above 3.11 in a separate base image. Maybe we can come back to it later.