Skip to content

Show run error on the run page in the UI #1655

@r4victor

Description

@r4victor

Currently, the run error (termination reason) can be seen via the CLI with dstack ps -v or via the API:

 NAME                  BACKEND    INSTANCE           PRICE     STATUS      SUBMITTED     ERROR
 shaggy-husky-1        local      local              $0.0      failed      4 weeks ago   JOB_FAILED                    
                                                                                         (CONTAINER_EXITED_WITH_ERROR) 
 heavy-crab-1          local      local              $0.0      terminated  4 weeks ago   STOPPED_BY_USER               
 tame-fox-1            local      local              $0.0      terminated  4 weeks ago   STOPPED_BY_USER               
 ordinary-wombat-1     local      local              $0.0      done        4 weeks ago   ALL_JOBS_DONE

There should also be a way to see run errors via the UI as expected by users (e.g. #1654). Add the Error field next to Status on the run page. It should display run.termination_reason (run.jobs[0].job_submissions[-1].termination_reason). Here the CLI logic:

def _get_run_error(run: Run) -> str:
if run._run.termination_reason is None:
return ""
if len(run._run.jobs) > 1:
return run._run.termination_reason.name
run_job_termination_reason = _get_run_job_termination_reason(run)
# For failed runs, also show termination reason to provide more context.
# For other run statuses, the job termination reason will duplicate run status.
if run_job_termination_reason is not None and run._run.termination_reason in [
RunTerminationReason.JOB_FAILED,
RunTerminationReason.SERVER_ERROR,
RunTerminationReason.RETRY_LIMIT_EXCEEDED,
]:
return f"{run._run.termination_reason.name}\n({run_job_termination_reason.name})"
return run._run.termination_reason.name
def _get_run_job_termination_reason(run: Run) -> Optional[JobTerminationReason]:
for job in run._run.jobs:
if len(job.job_submissions) > 0:
if job.job_submissions[-1].termination_reason is not None:
return job.job_submissions[-1].termination_reason
return None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions