Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only 1 Tesla T4 GPU is used for 25 streams and encoding can't keep up on 2.6.2 #5590

Closed
alfred-stokespace opened this issue Sep 20, 2023 · 5 comments
Assignees

Comments

@alfred-stokespace
Copy link

Short description

I have 25 inbound rtmp streams (1080p) and only some of those can sustain broadcast status of around Broadcasting 1.00x the rest go down to 0.01
I have evidence that only one of the four T4 GPUs on this EC2 instance are being engaged.

Environment

  • Ubuntu 20.04.6 LTS
  • Java version: openjdk 11.0.20 2023-07-18
  • Ant Media Server version: 2.6.2
  • Browser name and version: N/A

Steps to reproduce

  1. Install 2.6.2 on g4dn.12xlarge
  2. have 25 rtmp 1080 30fps @ about 4-8 mbps each being transcoded to two renditions (480p and 720p)
  3. check nvidia-smi output

Expected behavior

Same performance as 2.4.3 (which is able to handle the exact same camera sources and rendition count/type)
All 4 gpu's are utilized on 2.4.3 and 2.4.3 can keep up all 25 streams at 99 to 101 percent broadcast status.

Actual behavior

Only a fraction of the streams can keep up and nvidia-smi shows the following...

Wed Sep 20 14:56:59 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1B.0 Off |                    0 |
| N/A   60C    P0    41W /  70W |  14562MiB / 15360MiB |     97%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            On   | 00000000:00:1C.0 Off |                    0 |
| N/A   28C    P8    15W /  70W |      5MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            On   | 00000000:00:1D.0 Off |                    0 |
| N/A   32C    P0    26W /  70W |    129MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P0    27W /  70W |    133MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1118      C   ...11-openjdk-amd64/bin/java    14638MiB |
|    2   N/A  N/A      1118      C   ...11-openjdk-amd64/bin/java      119MiB |
|    3   N/A  N/A      1118      C   ...11-openjdk-amd64/bin/java      123MiB |
+-----------------------------------------------------------------------------+

Logs

will send to support upon request.

@alfred-stokespace
Copy link
Author

Along with the nvidia-smi output is top output

top - 14:57:56 up 12 min,  0 users,  load average: 11.03, 7.36, 3.29
Tasks: 522 total,   1 running, 521 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.6 us,  4.0 sy,  0.0 ni, 91.3 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem : 191074.8 total, 170309.6 free,  18189.4 used,   2575.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 170512.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1118 antmedia  20   0  113.0g  16.9g   7.6g S 398.0   9.0  29:50.17 java
   1040 root     -51   0       0      0      0 S   3.0   0.0   0:11.96 irq/92-nvidia
   1044 root     -51   0       0      0      0 S   2.0   0.0   0:06.16 irq/96-nvidia
    472 root      20   0       0      0      0 I   0.3   0.0   0:00.01 kworker/31:2-mm_percpu_wq
  10596 root      20   0 2353880  23924  12352 S   0.3   0.0   0:00.62 ssm-session-wor
  12493 root      20   0   11552   4372   3228 R   0.3   0.0   0:00.02 top
      1 root      20   0  169572  12868   8244 S   0.0   0.0   0:04.23 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
      5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 slub_flushwq
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 netns
      8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri
      9 root      20   0       0      0      0 I   0.0   0.0   0:00.21 kworker/u96:0-events_unbound
     10 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq
     11 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_rude_
     12 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_trace
     13 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/0
     14 root      20   0       0      0      0 I   0.0   0.0   0:00.34 rcu_sched
     15 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/0
     16 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/0
     18 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0
     19 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/1
     20 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/1
     21 root      rt   0       0      0      0 S   0.0   0.0   0:00.87 migration/1
     22 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/1
     24 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0H-events_highpri
     25 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/2
     26 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/2
     27 root      rt   0       0      0      0 S   0.0   0.0   0:00.87 migration/2
     28 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/2
     30 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/2:0H-kblockd
     31 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/3
     32 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/3
     33 root      rt   0       0      0      0 S   0.0   0.0   0:00.89 migration/3
     34 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/3
     36 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/3:0H-kblockd
     37 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/4
     38 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/4
     39 root      rt   0       0      0      0 S   0.0   0.0   0:00.89 migration/4
     40 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/4
     42 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/4:0H-events_highpri
     43 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/5
     44 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/5
     45 root      rt   0       0      0      0 S   0.0   0.0   0:00.90 migration/5
     46 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/5
     48 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/5:0H-events_highpri
     49 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/6
     50 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/6

@alfred-stokespace
Copy link
Author

Reduced the stream count to 15 and still having the same issue

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1B.0 Off |                    0 |
| N/A   54C    P0    46W /  70W |  11059MiB / 15360MiB |     70%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            On   | 00000000:00:1C.0 Off |                    0 |
| N/A   28C    P8    14W /  70W |      5MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            On   | 00000000:00:1D.0 Off |                    0 |
| N/A   28C    P8    13W /  70W |      5MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   29C    P8    14W /  70W |      5MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1113      C   ...11-openjdk-amd64/bin/java    11036MiB |
+-----------------------------------------------------------------------------+

@mekya
Copy link
Contributor

mekya commented Sep 25, 2023

Hi @alfred-stokespace,

Thank you for the issue.

It seems that Ant Media Server cannot utilize the whole GPUs for some reasons.
Let's schedule the issue.

Regards

@lastpeony
Copy link
Contributor

lastpeony commented Oct 8, 2023

Hi @alfred-stokespace
We appreciate your thorough bug report.

The problem has been resolved and will soon be merged into the master branch for the next release. If you prefer not to wait, we can offer you an early build snapshot that includes the fix.
Please inform us of your preference.
https://gitlab.com/Ant-Media/Ant-Media-Enterprise/-/merge_requests/626

@mekya mekya self-assigned this Oct 9, 2023
@mekya
Copy link
Contributor

mekya commented Oct 23, 2023

Hi Guys,
The PR is merged into the master.

Please check it and feel free to re-open if it does not work for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants