Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance in the Python runtime module #6884

Conversation

helderco
Copy link
Contributor

@helderco helderco commented Mar 14, 2024

This PR adds uv support, using it by default to install python packages.

The need for speed

need for speed

The main motivation is performance. Some quick numbers:

Without uv:

  • python -m venv → ≈2.08s
  • pip install -r lock → ≈4s (cold)
  • pip install -e sdk → ≈6.30s/3.56s (cold/warm)

With uv:

  • uv venv → ≈0.03s
  • uv pip install -r lock → ≈0.9s (cold)
  • uv pip install -e sdk → ≈1.36s/0.76s (cold/warm)

These were not properly benchmarked, but the speed is real and uv does unlock a few things:

  • It’s a single binary that joins virtualenv, pip and pip-tools so it’s cheap to install.
  • Creating a virtual env is free now (but otherwise with pip, not creating a virtual env and just using the system install).
  • No longer need to worry about setting up a cache volume for the virtual env with a cache key that would have to be unique for each module. 😶‍🌫️
  • We can now generate a pinned requirements.lock file on dagger init --sdk=python by default.
  • Installing from the lock file, with --no-deps is faster.
  • The global cache in uv is instantly reusable for other modules, making it faster for new dagger init --sdk=python!

There were other efforts at improving performance beyond uv too:

  • More intelligent pipeline setup to avoid cache invalidation.
  • All API calls done concurrently once, in the beginning of execution.
  • Insignificant difference between Codegen and RuntimeModule functions to reuse as much layer cache as possible.
  • Installing dependencies first, sources last.
  • Compiling bytecode during install, so it can be reused in the final exec /runtime step.
  • Favoring mounts over copies.
  • Smart and dynamic pipeline creation based on an initial .Entries() fetch.
  • Simpler generation from template without doing an exec with sed.

Configurability

uv is still very new, but it’s already production ready and being adopted in the community like wild fire 🔥. But, in case there’s any conflict or issue, I took the opportunity to introduce configurability into the runtime module. There’s more configurations to come, but for now, these are available:

  • New [tool.dagger] table in pyproject.toml to add dagger specific configuration:
    • use-uv = false to disable uv and use pip instead.
    • uv-version = "==0.1.27" to override the default pinned uv version
    • base-image = "mypython:3.12" to use a different base image1 (e.g., add system packages, certificates, etc).
  • .python-version file, as used by pyenv and rye, to use a different Python version:
    • Can also be used to pin it and avoid an automatic upgrade with a new version of dagger.
    • With rye, can be created with rye pin 3.12.
  • requirements.lock file, if exists, will assume it has all dependencies in it.
    • This file will be created automatically for new modules.

For more advanced use cases, the pipelines have been more granularly composed to allow reusability in a custom SDK module that uses this one as a dependency, in order to customize some part of the process. Without this, you’d had to duplicate most of the code.

Note that adding configurability adds an overhead, but I feel it's necessary. And there's a flag to disable it if doing a custom SDK use case.

Default template

The default template has a couple of changes:

Reproducibility

Summarizing from above, you have stronger guarantees for a reproducible environment, with:

  • The requirements.lock file
  • The pinned python version in .python-version
  • The pinned uv version in uv-version config
  • Hashes in the lock file for verifying the downloaded packages
    • Pip checks the hashes. uv doesn’t yet, but will.

What's next?

There's more follow-up configuration and performance gains but need to do more profiling and tracing will help with that.

Footnotes

  1. Use at own risk since we can’t guarantee support for custom deviations. For best compatibility extend from the default image that dagger uses: python:3.11-slim. May need to watch out when the default image is updated.

@helderco helderco requested a review from sipsma March 14, 2024 23:32
@helderco helderco force-pushed the helder/dev-3558-improve-performance-with-uv-and-add-basic-configurability-to branch from 1cc3fab to ece26ac Compare March 16, 2024 09:39
@helderco helderco force-pushed the helder/dev-3558-improve-performance-with-uv-and-add-basic-configurability-to branch from ece26ac to a3159ea Compare March 19, 2024 17:06
@helderco helderco modified the milestones: v0.10.3, v0.10.x Mar 20, 2024
Copy link
Member

@gerhard gerhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the PR description - it's world class 🥇🚀

@helderco helderco modified the milestones: v0.10.x, v0.11.0 Mar 28, 2024
Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>
Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>
Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>
Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>
Because of Rye: astral-sh/rye#895

Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>
@helderco helderco force-pushed the helder/dev-3558-improve-performance-with-uv-and-add-basic-configurability-to branch from a3159ea to 6ecefa9 Compare March 30, 2024 10:54
@helderco
Copy link
Contributor Author

Added missing tests.

Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>
@helderco helderco force-pushed the helder/dev-3558-improve-performance-with-uv-and-add-basic-configurability-to branch from 6ecefa9 to 1608b08 Compare March 30, 2024 11:13
@helderco helderco merged commit 5c96054 into dagger:main Apr 1, 2024
43 checks passed
@helderco helderco deleted the helder/dev-3558-improve-performance-with-uv-and-add-basic-configurability-to branch April 1, 2024 17:40
vikram-dagger pushed a commit to vikram-dagger/dagger that referenced this pull request May 3, 2024
* Improve performance in the Python runtime module

Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>

* Fix tests

Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>

* Add change log

Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>

* Add setting for overriding uv version

Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>

* Remove root editable install in requirements.lock file

Because of Rye: astral-sh/rye#895

Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>

* Add tests

Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>

---------

Signed-off-by: Helder Correia <174525+helderco@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python modules: dagger init should create code from template in src/main/__init__.py instead of src/main.py
2 participants