Skip to content

Conversation

nv-tusharma
Copy link
Contributor

@nv-tusharma nv-tusharma commented Jul 31, 2025

Overview:

Milestone 1 in DEP:
ai-dynamo/enhancements#8

This PR adds support for a Dynamo base container which can be used for faster prototyping and faster developer validation for PR changes. As part of the PR, we change the pre-merge check to build this Dynamo base image instead of the vLLM full fat container. As a result of this change and further optimizations to the build, we are able to reduce the time to run PR validation from 1 hour approximately to 20 minutes.

Details:

  • Introduce a Dockerfile (Dockerfile.dynamo) which provides NIXL/UCX, Dynamo, and NATS/ETCD
  • Change the github pre-merge-workflow to build the Dynamo base container instead of the full fat vLLM container
  • Remove Dockerfile.none and add Dockerfile.dynamo as a replacement
  • Add integration testing as a pre-merge check for validation, use docker compose to start NATS & ETCD services

Where should the reviewer start?

  • Dockerfile.dynamo

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

ref: OPS-565, OPS-607

Summary by CodeRabbit

  • New Features
    • Introduced a new multi-stage Dockerfile for the Dynamo project, supporting multiple architectures and enhanced build processes.
  • Chores
    • Updated the build script and Docker-related configurations to replace the "NONE" framework with "DYNAMO".
    • Updated the .dockerignore file to exclude certain Dockerfiles from the build context.
    • Removed the old Dockerfile for the "NONE" framework.
    • Enhanced the GitHub Actions workflow to support the "dynamo" framework, expanded testing, and improved service management during CI.
  • Tests
    • Marked a failing integration test to be ignored during CI runs.

@github-actions github-actions bot added the build label Jul 31, 2025
@nv-tusharma nv-tusharma changed the title build: introduce dynamo base container for faster Github CI testing build: introduce dynamo base container for faster Github CI validation Jul 31, 2025
@nv-tusharma nv-tusharma changed the title build: introduce dynamo base container for faster Github CI validation build: introduce dynamo base container Jul 31, 2025
@nv-tusharma
Copy link
Contributor Author

nv-tusharma commented Aug 7, 2025

Adding Misha's comments here for transparency:

  1. Makes sure that dependency and installation has separate stage
  • The dockerfile leverages a multi-stage approach with 3 primary stages: base, wheel_builder, and dev. Base builds all the dynamo dependencies (NIXL, UCX, python deps), wheelbuilder builds the dynamo wheel, and dev installs the artifacts built from previous stages onto the container.
  1. Library configuration should be resolved in ld.so.conf:
  • Removed references to LD_LIBRARY_PATH and used ld.so.conf alternatively.
  1. Final directory structure shouldn't contain directory like /workspace it's a bad practice.
  • Changed to host artifacts in /opt/dynamo instead of /workspace.
  1. Comments within single instruction step must be removed or moved before the step.
  • I've kept this for now since it improves clarity of which apt dependencies map to the package to install.
  1. python virtual environment, may be if it works fine we can compile minimal version of python with pip and use it by having full control of it:
  • See L171-177, we are using a python virtual environment and installing all python-specific packages to that location.
  1. NIXL build, wondering if build configuration could handle the logic you place in Dockerfile.
  • OPS-597 to handle this task.

@nv-tusharma
Copy link
Contributor Author

@ryanolson

Great start. Are we building this base image every time? If so are we getting good cache reuse if the dockerfile hasn't changed?

We don't have caching setup for containers built via Github due to issues with the container registry. As next steps, we will look to setup some external registry which we can use to enable external caching. For now, the Dockerfile is setup to use docker's default build cache mechanism which keeps instructions which are expected to change more frequently (such as dynamo src change) towards the end of the Dockerfile instead of the beginning.

@ryanolson
Copy link
Contributor

@ryanolson

Great start. Are we building this base image every time? If so are we getting good cache reuse if the dockerfile hasn't changed?

We don't have caching setup for containers built via Github due to issues with the container registry. As next steps, we will look to setup some external registry which we can use to enable external caching. For now, the Dockerfile is setup to use docker's default build cache mechanism which keeps instructions which are expected to change more frequently (such as dynamo src change) towards the end of the Dockerfile instead of the beginning.

Can we use NGC?

@nv-tusharma
Copy link
Contributor Author

@ryanolson NGC would definitely be a viable option.

@nv-anants nv-anants merged commit 10f4302 into main Aug 7, 2025
12 of 13 checks passed
@nv-anants nv-anants deleted the dynamo-base-container-dev branch August 7, 2025 20:24
mkhazraee pushed a commit to whoisj/dynamo that referenced this pull request Aug 8, 2025
Signed-off-by: Tushar Sharma <tusharma@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
krishung5 pushed a commit that referenced this pull request Aug 12, 2025
Signed-off-by: Tushar Sharma <tusharma@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants