Skip to content

[Bug]: Fleet plan includes offers from backends without multi-node support with placement: cluster #2300

@r4victor

Description

@r4victor

Steps to reproduce

  1. Create a fleet configuration with placement: cluster.
  2. Apply configuration.
  3. Notice offers from clouds that do not support multi-node:
✗ dstack apply -f .dstack/confs/fleet.yaml            
 Project        r4victor-main                          
 User           victor                                 
 Configuration  .dstack/confs/fleet.yaml               
 Type           fleet                                  
 Fleet type     cloud                                  
 Nodes          2                                      
 Placement      cluster                                
 Resources      2..xCPU, 8GB.., 8xa100, 100GB.. (disk) 
 Spot policy    on-demand                              

 #  BACKEND  REGION            INSTANCE     RESOURCES        SPOT  PRICE      
 1  lambda   europe-central-1  gpu_8x_a100  124xCPU,         no    $10.32     
                                            1933GB, 8xA100                    
                                            (40GB),                           
                                            6598.7GB (disk)                   
 2  lambda   asia-northeast-1  gpu_8x_a100  124xCPU,         no    $10.32     
                                            1933GB, 8xA100                    
                                            (40GB),                           
                                            6598.7GB (disk)                   
 3  vastai   us-washington     17831846     96xCPU, 1814GB,  no    $13.8944   
                                            8xA100 (80GB),                    
                                            100.0GB (disk)                    
    ...                                                                       
 Shown 3 of 37 offers, $51.2073 max

Specifying 8xa100 helps getting backends without multi-node support on top.

Actual behaviour

This is caused by the multinode check that relies on fleet_model that does not exist when requesting the plan:

multinode = fleet.spec.configuration.placement == InstanceGroupPlacement.CLUSTER

Note that the offers without multi-node are actually filtered out when provisioning the fleet, so it's only a fleet plan bug.

Expected behaviour

No response

dstack version

master

Server logs

Additional information

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions