You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-05-23 12:02:29,517 INFO worker.py:1749 -- Started a local Ray instance.
Loaded detection model vikp/surya_det2 on device cuda with dtype torch.float16
Loaded detection model vikp/surya_layout2 on device cuda with dtype torch.float16
Loaded reading order model vikp/surya_order on device cuda with dtype torch.float16
Loaded texify model to cuda with torch.float16 dtype
Converting 40217 pdfs in chunk 1/1 with 5 processes, and storing in path/to/my/output/dir
1%|█▌ | 382/40217 [56:37<80:05:59, 7.24s/it]
Only GPU 0 gets utilized (75%). The other 7 just have 4 MiB of memory usage, but no utilization and no processes are tied to them. Output from nvidia-smi:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-PCIE-16GB On | 00000000:08:00.0 Off | 0 |
| N/A 42C P0 222W / 250W | 10132MiB / 16384MiB | 75% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE-16GB On | 00000000:0B:00.0 Off | 0 |
| N/A 27C P0 27W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 Tesla V100-PCIE-16GB On | 00000000:0E:00.0 Off | 0 |
| N/A 38C P0 26W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 Tesla V100-PCIE-16GB On | 00000000:11:00.0 Off | 0 |
| N/A 34C P0 26W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 4 Tesla V100-PCIE-16GB On | 00000000:16:00.0 Off | 0 |
| N/A 27C P0 24W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 5 Tesla V100-PCIE-16GB On | 00000000:19:00.0 Off | 0 |
| N/A 28C P0 29W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 6 Tesla V100-PCIE-16GB On | 00000000:1C:00.0 Off | 0 |
| N/A 32C P0 23W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 7 Tesla V100-PCIE-16GB On | 00000000:22:00.0 Off | 0 |
| N/A 32C P0 25W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 37553 C ray::process_single_pdf 4026MiB |
| 0 N/A N/A 37554 C ray::process_single_pdf 4026MiB |
| 0 N/A N/A 38661 C ray::process_single_pdf 818MiB |
| 0 N/A N/A 73566 C ...conda_envs/py311agentsds/bin/python 1298MiB |
+---------------------------------------------------------------------------------------+
The text was updated successfully, but these errors were encountered:
I'm running marker-pdf 0.2.8 on a RHEL 7 machine with 8 GPUs on it. I am trying to leverage all 8 of those GPUs, but only GPU 0 is getting utilized.
Command I am using:
Console output, after running the above command:
Only GPU 0 gets utilized (75%). The other 7 just have 4 MiB of memory usage, but no utilization and no processes are tied to them. Output from
nvidia-smi
:The text was updated successfully, but these errors were encountered: