Skip to content

[Bug]: Orphan Hot Aisle instances because of min reservation period #3241

@jvstme

Description

@jvstme

Steps to reproduce

  1. Provision 2x MI300X 26x Xeon Platinum 8470 on Hot Aisle.
  2. Terminate it.

Actual behaviour

Stuck in terminating for 15 minutes.

> dstack fleet
 FLEET        INSTANCE  BACKEND                   RESOURCES                                     PRICE  STATUS       CREATED     
 funny-gecko  0         hotaisle (us-michigan-1)  cpu=26 mem=448GB disk=12288GB MI300X:192GB:2  $3.98  terminating  20 mins ago

Then marked as terminated, although it actually keeps running on Hot Aisle.

The reason is that Hot Aisle VMs now have a min reservation period during which they cannot be deleted.

Image

Expected behaviour

Keep the instance in terminating until we are able to delete it.

Also do our best to communicate this particularity to the user. Some possibilities include describing it in the docs, adding offer notes to the run plan, asking for an additional confirmation in the CLI before terminating instances with an unelapsed reservation period.

dstack version

0.19.34

Server logs

[17:17:41] ERROR    dstack._internal.server.background.tasks.process_instances:962 Failed all attempts to terminate instance funny-gecko-0. Please        
                    terminate the instance manually to avoid unexpected charges. Error: HTTPError('400 Client Error: Bad Request for url:                 
                    https://admin.hotaisle.app/api/teams/team-name/virtual_machines/enc1-gpuvm005/')                                                       
                    Traceback (most recent call last):                                                                                                    
                      File "/dstack/src/dstack/_internal/server/background/tasks/process_instances.py", line 941, in _terminate    
                        await run_async(                                                                                                                  
                      File "/dstack/src/dstack/_internal/utils/common.py", line 21, in run_async                                   
                        return await asyncio.get_running_loop().run_in_executor(None, func_with_args)                                                     
                      File "/usr/lib64/python3.10/concurrent/futures/thread.py", line 58, in run                                                          
                        result = self.fn(*self.args, **self.kwargs)                                                                                       
                      File "/dstack/src/dstack/_internal/core/backends/hotaisle/compute.py", line 126, in terminate_instance       
                        self.api_client.terminate_virtual_machine(vm_name)                                                                                
                      File "/dstack/src/dstack/_internal/core/backends/hotaisle/api_client.py", line 83, in                        
                    terminate_virtual_machine                                                                                                             
                        response.raise_for_status()                                                                                                       
                      File "/dstack/venv/lib64/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status        
                        raise HTTPError(http_error_msg, response=self)                                                                                    
                    requests.exceptions.HTTPError: 400 Client Error: Bad Request for url:                                                                 
                    https://admin.hotaisle.app/api/teams/team-name/virtual_machines/enc1-gpuvm005/

Additional information

We expressed concerns to Hot Aisle regarding the limitations of min reservation periods. They will discuss it internally

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions