Skip to content

Conversation

@hc2p
Copy link
Contributor

@hc2p hc2p commented Nov 7, 2024

After this PR all deepnote/python images are sharing a common dependency layer, saving a roundtrip of ca 200mb each.
I shied away of doing something similar to the deepnote-datascience dependencies, as the chance that the packages would be dependent on each specific python version seemed too high.
If that would not be the case, we could build one site-packages bundle for all python versions below 3.11 and one for above 3.11 in a separate base image. Maybe we can come back to it later.

@hc2p hc2p force-pushed the hannes/pla-3118-improve-layer-reuse-of-python-images branch 3 times, most recently from 3a9f439 to 690ff56 Compare November 8, 2024 13:46
* by first installing system dependencies and then python

Setting python version incl. patch

Fix python version env var

Improve readability of docker build output

Use PR-base-image for datascience-image in a PR

Enable buildKit layer caching

Fix layer caching

Remove breaking cache-parameter

Next attempt to get buildcache to work

Fix pip installation for datascience image

Tweaked order of ARGs to invalidate less layers

Use all caches in datascience builds

Speed up builds by not loading images unnecessarily

Ensure big apt-get layer is shared

Don't build gpu images for now

Fix main-builcache declaration

Extracting apt-get into build-step to ensure reuse

Share cache among all builds

Build base image separately

Strictly separate base from python image build

Move build-arg so it's avaiable
@hc2p hc2p force-pushed the hannes/pla-3118-improve-layer-reuse-of-python-images branch from 0929552 to cf6c086 Compare November 8, 2024 14:05
@hc2p hc2p requested a review from chudyandrej November 8, 2024 14:10
otherwise manifests are including "architecture": "unknown"
@chudyandrej
Copy link
Contributor

Seem we have an issue with the certificates
Screenshot 2024-11-10 at 22 19 42

Any chance that we forgot to install something?

@chudyandrej
Copy link
Contributor

It seems that vanilla python:3.9 is working. So the issue needs to be solved in installing python :/

@chudyandrej
Copy link
Contributor

Repro steps: Testing project on staging

Copy link
Contributor

@chudyandrej chudyandrej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the new installation of Python is causing issues with the certificate.

to prevent errors like `certificate verify failed: unable to get local issuer certificate`
Comment on lines +39 to +48
ARG PYTHON_VERSION
# Layers will be different between python versions from here onwards because of the build-arg

COPY --from=builder "/usr/local/bin/python${PYTHON_VERSION}" "/usr/local/bin/python${PYTHON_VERSION}"
COPY --from=builder "/usr/local/bin/pip${PYTHON_VERSION}" "/usr/local/bin/pip${PYTHON_VERSION}"
COPY --from=builder "/usr/local/lib/python${PYTHON_VERSION}" "/usr/local/lib/python${PYTHON_VERSION}"

RUN update-alternatives --install /usr/bin/python python "/usr/local/bin/python${PYTHON_VERSION}" 1
RUN update-alternatives --install /usr/bin/pip pip "/usr/local/bin/pip${PYTHON_VERSION}" 1

Copy link
Contributor

@chudyandrej chudyandrej Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is somehow copied from official Python because I'm not sure if it is the recommended way or not.

Copy link
Contributor

@chudyandrej chudyandrej Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just afraid about some side effects that could not be transferred from the first stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you are sure that it is fine, I'm okay with that. It's working, so I think we are good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically just for symlinking python and pip binaries.
Official python is doing it differently, which we can of course adapt to.

# make some useful symlinks that are expected to exist ("/usr/local/bin/python" and friends)
RUN set -eux; \
	for src in idle3 pip3 pydoc3 python3 python3-config; do \
		dst="$(echo "$src" | tr -d 3)"; \
		[ -s "/usr/local/bin/$src" ]; \
		[ ! -e "/usr/local/bin/$dst" ]; \
		ln -svT "$src" "/usr/local/bin/$dst"; \
	done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's discuss this. I think I don't understand how this can work.

matrix:
parameters:
python-version:
- "3.8.19"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a link to a table with the exact versions of Python that can be listed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- "3.8.19"
# Latest versions from this list https://www.python.org/doc/versions/
- "3.8.19"

matrix:
parameters:
python-version:
- "3.8.19"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a template here to avoid repetitions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with template?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define:
&pythonVersions

Use:
*pythonVersions

@hc2p
Copy link
Contributor Author

hc2p commented Nov 18, 2024

PLA-3118

@hc2p hc2p merged commit df2f021 into main Nov 18, 2024
12 checks passed
@hc2p hc2p deleted the hannes/pla-3118-improve-layer-reuse-of-python-images branch November 18, 2024 15:25
hc2p added a commit that referenced this pull request Feb 4, 2025
…use-of-python-images

Improve layer reuse of python images
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants