Skip to content

Conversation

@vishalkadam47
Copy link
Contributor

This PR introduces GPU support in Dokploy by leveraging Docker Swarm's native GPU resource management capabilities. The implementation follows the Docker Swarm GPU configuration guidelines, enabling seamless integration of GPU resources across the infrastructure.

Key aspects of the implementation:

  • Configured Docker Swarm to advertise GPU resources using DOCKER_RESOURCE_GPU
  • Integrated GPU resource allocation for container deployments
  • Added API endpoint for GPU setup management
  • Implemented system-wide GPU constants for consistent resource handling

The GPU support work seamlessly at the server level, allowing applications to utilize GPU resources when available.

This enhancement makes Dokploy more versatile for GPU-intensive workloads while maintaining compatibility with non-GPU environments.

This feature significantly expands Dokploy's capabilities, enabling efficient deployment and management of GPU-dependent applications through Docker Swarm's native resource management system.

@Siumauricio Siumauricio self-assigned this Oct 27, 2024
Copy link
Contributor

@Siumauricio Siumauricio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good, I haven tested yet, we have two way of deploying applications:

  1. Dopkloy Server(Done)
  2. Remote servers(Missing)

Can you add support for remote servers?

@vishalkadam47
Copy link
Contributor Author

vishalkadam47 commented Oct 27, 2024

The current PR of GPU support in dokploy works at both the server level and the remote server level. This means that the GPU-enabled containers can be deployed on both the dokploy server and on remote servers that are connected to the dokploy platform.

Remote Server GPU Support: The user needs to ensure that the necessary GPU hardware and drivers are installed and configured on the remote server. Once this is done, the user can simply add the GPU configuration to their
docker-compose.yml file, and the GPU-enabled containers will be able to utilize the GPU resources on the remote server.

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities:
                - gpu

Additional Implementation: if required we can add an option under new server Deployment Setup Server when checked run a script verify that the necessary GPU hardware and drivers are installed and configured on the remote server.

For Reference: You can review the recording here

Copy link
Contributor

@Siumauricio Siumauricio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where to activate it because I don't see any way to trigger the setupGPU method other than through the API, you need to expose some way to do it trought the UI

@vishalkadam47 vishalkadam47 changed the title Implement Server-Level GPU Support for Docker Swarm and Add Blender Template Implement Remote server and Dokploy Server - GPU Support for Docker Swarm Nov 2, 2024
Copy link
Contributor Author

@vishalkadam47 vishalkadam47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed the changes as suggested

Copy link
Contributor

@Siumauricio Siumauricio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and works great and the UI Looks awesome

I rent a GPU Droplet in digitalocean and I tried to see the GPU Setup and I get this:

Which Is fine
Screenshot 2024-11-07 at 12 32 41 AM

Then I tried to setup the GPU and I got this error

"NVIDIA drivers or runtime not installed. Please install them first.",

Then I look in to the code and commented this code, I think here we need to do something because here are validating if we already have enabled the docker gpu runtime in docker swarm, but that is wrong because we are not setup yet, so first we need to setup and then we can check that validation

GPU Status Check Result: {
  driverInstalled: true,
  driverVersion: '535.183.06',
  runtimeInstalled: false,
  runtimeConfigured: false,
  availableGPUs: 1,
  swarmEnabled: false,
  gpuResources: 0,
  gpuModel: 'NVIDIA H100 80GB HBM3',
  memoryInfo: '81559 MiB',
  cudaSupport: true,
  cudaVersion: '12.2'
}

image

Then it works

Screenshot 2024-11-07 at 12 39 00 AM

Also before i setup the GPU Setup I tried to deploy the blender template and I have this error
Screenshot 2024-11-07 at 12 39 30 AM

But when I activate the gpu it works now
Screenshot 2024-11-07 at 12 40 13 AM

I think it works very good, just need a few validations as I mentioned above

@vishalkadam47
Copy link
Contributor Author

I tested and works great and the UI Looks awesome

I rent a GPU Droplet in digitalocean and I tried to see the GPU Setup and I get this:

Which Is fine Screenshot 2024-11-07 at 12 32 41 AM

Then I tried to setup the GPU and I got this error

"NVIDIA drivers or runtime not installed. Please install them first.",

Then I look in to the code and commented this code, I think here we need to do something because here are validating if we already have enabled the docker gpu runtime in docker swarm, but that is wrong because we are not setup yet, so first we need to setup and then we can check that validation

GPU Status Check Result: {
  driverInstalled: true,
  driverVersion: '535.183.06',
  runtimeInstalled: false,
  runtimeConfigured: false,
  availableGPUs: 1,
  swarmEnabled: false,
  gpuResources: 0,
  gpuModel: 'NVIDIA H100 80GB HBM3',
  memoryInfo: '81559 MiB',
  cudaSupport: true,
  cudaVersion: '12.2'
}

image

Then it works

Screenshot 2024-11-07 at 12 39 00 AM

Also before i setup the GPU Setup I tried to deploy the blender template and I have this error Screenshot 2024-11-07 at 12 39 30 AM

But when I activate the gpu it works now Screenshot 2024-11-07 at 12 40 13 AM

I think it works very good, just need a few validations as I mentioned above

NVIDIA Container Runtime is not installed. Please follow these steps:

  1. Install NVIDIA Container Runtime:
    Ubuntu/Debian:
    curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
      sudo apt-key add -
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
      sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
    sudo apt-get update
    sudo apt-get install nvidia-container-runtime

Note: for some VPS providers may have nvidia-container-runtime pre-installed, @Siumauricio Will update proper checks

  1. NVIDIA Container Runtime is installed but not configured.

Configuration is handled when we click the Enable GPU button in the UI.

@Siumauricio
Copy link
Contributor

I tested and works great and the UI Looks awesome
I rent a GPU Droplet in digitalocean and I tried to see the GPU Setup and I get this:
Which Is fine Screenshot 2024-11-07 at 12 32 41 AM
Then I tried to setup the GPU and I got this error
"NVIDIA drivers or runtime not installed. Please install them first.",
Then I look in to the code and commented this code, I think here we need to do something because here are validating if we already have enabled the docker gpu runtime in docker swarm, but that is wrong because we are not setup yet, so first we need to setup and then we can check that validation

GPU Status Check Result: {
  driverInstalled: true,
  driverVersion: '535.183.06',
  runtimeInstalled: false,
  runtimeConfigured: false,
  availableGPUs: 1,
  swarmEnabled: false,
  gpuResources: 0,
  gpuModel: 'NVIDIA H100 80GB HBM3',
  memoryInfo: '81559 MiB',
  cudaSupport: true,
  cudaVersion: '12.2'
}

image
Then it works
Screenshot 2024-11-07 at 12 39 00 AM
Also before i setup the GPU Setup I tried to deploy the blender template and I have this error Screenshot 2024-11-07 at 12 39 30 AM
But when I activate the gpu it works now Screenshot 2024-11-07 at 12 40 13 AM
I think it works very good, just need a few validations as I mentioned above

NVIDIA Container Runtime is not installed. Please follow these steps:

  1. Install NVIDIA Container Runtime:
    Ubuntu/Debian:
    curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
      sudo apt-key add -
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
      sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
    sudo apt-get update
    sudo apt-get install nvidia-container-runtime

Note: for some VPS providers may have nvidia-container-runtime pre-installed, @Siumauricio Will update proper checks

  1. NVIDIA Container Runtime is installed but not configured.

Configuration is handled when we click the Enable GPU button in the UI.

I don't think that's the solution, because I didn't have to install anything in that gpu droplet, there is only one validation that you don't have to do when you want to do the gpu configuration, and that was the portion of the code that I commented

@vishalkadam47
Copy link
Contributor Author

vishalkadam47 commented Nov 7, 2024

I tested and works great and the UI Looks awesome
I rent a GPU Droplet in digitalocean and I tried to see the GPU Setup and I get this:
Which Is fine Screenshot 2024-11-07 at 12 32 41 AM
Then I tried to setup the GPU and I got this error
"NVIDIA drivers or runtime not installed. Please install them first.",
Then I look in to the code and commented this code, I think here we need to do something because here are validating if we already have enabled the docker gpu runtime in docker swarm, but that is wrong because we are not setup yet, so first we need to setup and then we can check that validation

GPU Status Check Result: {
  driverInstalled: true,
  driverVersion: '535.183.06',
  runtimeInstalled: false,
  runtimeConfigured: false,
  availableGPUs: 1,
  swarmEnabled: false,
  gpuResources: 0,
  gpuModel: 'NVIDIA H100 80GB HBM3',
  memoryInfo: '81559 MiB',
  cudaSupport: true,
  cudaVersion: '12.2'
}

image
Then it works
Screenshot 2024-11-07 at 12 39 00 AM
Also before i setup the GPU Setup I tried to deploy the blender template and I have this error Screenshot 2024-11-07 at 12 39 30 AM
But when I activate the gpu it works now Screenshot 2024-11-07 at 12 40 13 AM
I think it works very good, just need a few validations as I mentioned above

NVIDIA Container Runtime is not installed. Please follow these steps:

  1. Install NVIDIA Container Runtime:
    Ubuntu/Debian:
    curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
      sudo apt-key add -
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
      sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
    sudo apt-get update
    sudo apt-get install nvidia-container-runtime

Note: for some VPS providers may have nvidia-container-runtime pre-installed, @Siumauricio Will update proper checks

  1. NVIDIA Container Runtime is installed but not configured.

Configuration is handled when we click the Enable GPU button in the UI.

I don't think that's the solution, because I didn't have to install anything in that gpu droplet, there is only one validation that you don't have to do when you want to do the gpu configuration, and that was the portion of the code that I commented

I understand your point. Some GPU VM instance comes with everything pre-installed, so there's no additional setup needed. However, for daemon.json configuration, it's necessary to have the NVIDIA Container Runtime installed. This is why we need a check to notify users to install it if it’s missing.

To improve this process, I’ve separated out the check, which should make it more effective. I’ll push an update shortly, and would appreciate your feedback.

This check is also valuable for local GPU setups, as we’re using the same components across both Dokploy Server and Remote Server.

vishalkadam47 and others added 3 commits November 11, 2024 23:18
- Add gpu status refresh with useEffect
- Update docker-compose.yml configuration
- Modify gpu setup scripts
- Improve gpu support checks
@vishalkadam47 vishalkadam47 changed the title Implement Remote server and Dokploy Server - GPU Support for Docker Swarm feat: Implement Remote server and Dokploy Server - GPU Support for Docker Swarm Nov 13, 2024
Copy link
Contributor Author

@vishalkadam47 vishalkadam47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed console logs and error handling

Copy link
Contributor Author

@vishalkadam47 vishalkadam47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed sleep function and updated import

Copy link
Contributor

@Siumauricio Siumauricio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this huge PR ❤️

@Siumauricio Siumauricio merged commit af84942 into Dokploy:canary Nov 17, 2024
@Siumauricio Siumauricio deleted the feature/gpu-support-blender-template branch November 17, 2024 15:48
@Siumauricio Siumauricio mentioned this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants