Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Cannot run configurations on TensorDock #1504

Closed
jvstme opened this issue Aug 5, 2024 · 0 comments · Fixed by #1506
Closed

[Bug]: Cannot run configurations on TensorDock #1504

jvstme opened this issue Aug 5, 2024 · 0 comments · Fixed by #1506
Assignees
Labels
bug Something isn't working

Comments

@jvstme
Copy link
Collaborator

jvstme commented Aug 5, 2024

Steps to reproduce

> cat .dstack.yml
type: dev-environment
ide: vscode

> dstack apply -b tensordock --gpu a100 --region czechrepublic --max-price 2.1

Actual behaviour

 #  BACKEND     REGION         INSTANCE                              RESOURCES                                     SPOT  PRICE    
 1  tensordock  czechrepublic  353c7146-1d73-41ae-b19c-80a578297621  28xCPU, 158GB, 1xA100 (80GB), 158.0GB (disk)  no    $2.035   

Submit a new run? [y/n]: y
shy-ladybug-1 provisioning completed (failed)
All provisioning attempts failed. This is likely due to cloud providers not having enough capacity. Check CLI and server logs for more details.

Expected behaviour

Configuration runs successfully.

dstack version

master

Server logs

[17:43:37] ERROR    dstack._internal.server.background.tasks.process_submitted_jobs:380 job(978f83)shy-ladybug-1-0-0: got exception when launching            
                    353c7146-1d73-41ae-b19c-80a578297621 in tensordock/czechrepublic                                                                          
                    Traceback (most recent call last):                                                                                                        
                      File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/server/background/tasks/process_submitted_jobs.py", line 359, in              
                    _run_job_on_new_instance                                                                                                                  
                        job_provisioning_data = await run_async(                                                                                              
                      File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/server/utils/common.py", line 24, in run_async                                
                        return await asyncio.get_running_loop().run_in_executor(None, func_with_args)                                                         
                      File "/usr/lib64/python3.8/concurrent/futures/thread.py", line 57, in run                                                               
                        result = self.fn(*self.args, **self.kwargs)                                                                                           
                      File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/core/backends/tensordock/compute.py", line 123, in run_job                    
                        return self.create_instance(instance_offer, instance_config)                                                                          
                      File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/core/backends/tensordock/compute.py", line 98, in create_instance             
                        ssh_port={v: k for k, v in resp["port_forwards"].items()}["22"],                                                                      
                    KeyError: '22'

Additional information

No response

@jvstme jvstme added the bug Something isn't working label Aug 5, 2024
@jvstme jvstme self-assigned this Aug 5, 2024
@jvstme jvstme closed this as completed Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant