Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many GPUs used counter #45

Open
stephenlienharrell opened this issue Aug 7, 2023 · 2 comments
Open

How many GPUs used counter #45

stephenlienharrell opened this issue Aug 7, 2023 · 2 comments
Assignees
Labels

Comments

@stephenlienharrell
Copy link
Member

We need a counter on the new version that says how many GPUs were used for a job.

@stephenlienharrell
Copy link
Member Author

need to separate gpu counter data in order to implement this correctly

@nicejunjie
Copy link
Collaborator

preliminary implementation done and online for LS6.
limitations: 1) raw data for individual GPUs are merged in the database when imported, so only the total percentage is availlable.
2) a few nodes in gpu-a100-small and gpu-dev seems don't have gpu recording enabled by the monitor, no gpu data is recorded, e.g. : https://ls6-stats.tacc.utexas.edu/machine/job/1473810/

Possible workaround without changing database stucture: make "event" to be "utilization_$gpunumber" instead of "utilization" when importing, then extract "$gpunumber" in views.py.

@stephenlienharrell stephenlienharrell removed this from the 2.4 Update milestone Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants