Skip to content

Conflicts with other profiling tools #56

@FindHao

Description

@FindHao

Hi, I'm using DCGM on a server with multiple A100 GPUs. When other profiling tools are running such as CUPTI based profilers, the dcgmi will raise an error the third-party profiling module returned an unrecoverable error. It makes sense. But if other profilers only profile on one GPU, and I specify the other GPUs such as dcgmi -i 1, the error doesn't make sense.

Is it possible to implement such a feature that dcgm still works if the specified GPUs are not used and will not be used in the future by other processes?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions