Chunking over frequency instead of time #39

popcornell · 2023-11-14T23:03:39Z

Most code comes from @boeddeker .
Also he raised the issue here #33

I am re-running the code on CHiME-7 to see if it will match the previous version.

max chunks is the number of freqs.

popcornell · 2023-11-14T23:04:16Z

gss/core/gss.py

-    def __call__(self, Obs, acitivity_freq):
-
-        initialization = cp.asarray(acitivity_freq, dtype=cp.float64)
+    def __call__(self, Obs, activity_freq):


I ve just corrected spelling here

desh2608 · 2023-11-15T14:47:16Z

Awesome! I'll merge once you can verify that the performance remains unchanged (which I believe it should) :)

popcornell · 2023-11-15T20:47:43Z

There seems to be some inconsistency when I change the number of GPUs (I don't think this depends on this PR however).
It seems that the more GPUs the more the memory occupation ?!

With 3 GPUs:

2023-11-15:20:42:15,574 INFO     [enhancer.py:207] Processing batch 1 ('S26', 'P29'): 1 segments = 8.89s (total: 0 segments)
2023-11-15:20:42:20,890 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 2 chunks.
2023-11-15:20:42:32,172 INFO     [enhancer.py:207] Processing batch 2 ('S26', 'P30'): 1 segments = 18.55s (total: 1 segments)
2023-11-15:20:42:32,706 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 2 chunks.
2023-11-15:20:42:32,863 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 3 chunks.
2023-11-15:20:42:46,44 INFO     [enhancer.py:207] Processing batch 3 ('S26', 'P31'): 1 segments = 63.23s (total: 2 segments)
2023-11-15:20:42:46,350 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 2 chunks.
2023-11-15:20:42:46,589 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 3 chunks.
2023-11-15:20:42:46,810 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 4 chunks.
2023-11-15:20:42:47,21 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 5 chunks.
2023-11-15:20:43:04,114 INFO     [enhancer.py:207] Processing batch 4 ('S26', 'P32'): 1 segments = 15.86s (total: 3 segments)

With 2 GPU:

2023-11-15:20:46:13,89 INFO     [enhancer.py:207] Processing batch 1 ('S26', 'P29'): 1 segments = 8.89s (total: 0 segments)
2023-11-15:20:46:20,761 INFO     [enhancer.py:207] Processing batch 2 ('S26', 'P30'): 1 segments = 18.55s (total: 1 segments)
2023-11-15:20:46:30,84 INFO     [enhancer.py:207] Processing batch 3 ('S26', 'P31'): 1 segments = 63.23s (total: 2 segments)
2023-11-15:20:46:30,625 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 2 chunks.
2023-11-15:20:46:30,859 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 3 chunks.
2023-11-15:20:46:31,79 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 4 chunks.
2023-11-15:20:46:42,373 INFO     [enhancer.py:207] Processing batch 4 ('S26', 'P32'): 1 segments = 15.86s (total: 3 segments)

With 1 GPU:

2023-11-15:20:44:19,322 INFO     [enhancer.py:207] Processing batch 1 ('S26', 'P29'): 1 segments = 8.89s (total: 0 segments)
2023-11-15:20:44:26,218 INFO     [enhancer.py:207] Processing batch 2 ('S26', 'P30'): 1 segments = 18.55s (total: 1 segments)
2023-11-15:20:44:30,689 INFO     [enhancer.py:207] Processing batch 3 ('S26', 'P31'): 1 segments = 63.23s (total: 2 segments)
2023-11-15:20:44:38,79 INFO     [enhancer.py:207] Processing batch 4 ('S26', 'P32'): 1 segments = 15.86s (total: 3 segments)

desh2608 · 2023-11-15T23:59:29Z

There seems to be some inconsistency when I change the number of GPUs (I don't think this depends on this PR however). It seems that the more GPUs the more the memory occupation ?!

With 3 GPUs:

2023-11-15:20:42:15,574 INFO     [enhancer.py:207] Processing batch 1 ('S26', 'P29'): 1 segments = 8.89s (total: 0 segments)
2023-11-15:20:42:20,890 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 2 chunks.
2023-11-15:20:42:32,172 INFO     [enhancer.py:207] Processing batch 2 ('S26', 'P30'): 1 segments = 18.55s (total: 1 segments)
2023-11-15:20:42:32,706 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 2 chunks.
2023-11-15:20:42:32,863 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 3 chunks.
2023-11-15:20:42:46,44 INFO     [enhancer.py:207] Processing batch 3 ('S26', 'P31'): 1 segments = 63.23s (total: 2 segments)
2023-11-15:20:42:46,350 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 2 chunks.
2023-11-15:20:42:46,589 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 3 chunks.
2023-11-15:20:42:46,810 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 4 chunks.
2023-11-15:20:42:47,21 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 5 chunks.
2023-11-15:20:43:04,114 INFO     [enhancer.py:207] Processing batch 4 ('S26', 'P32'): 1 segments = 15.86s (total: 3 segments)

With 2 GPU:

2023-11-15:20:46:13,89 INFO     [enhancer.py:207] Processing batch 1 ('S26', 'P29'): 1 segments = 8.89s (total: 0 segments)
2023-11-15:20:46:20,761 INFO     [enhancer.py:207] Processing batch 2 ('S26', 'P30'): 1 segments = 18.55s (total: 1 segments)
2023-11-15:20:46:30,84 INFO     [enhancer.py:207] Processing batch 3 ('S26', 'P31'): 1 segments = 63.23s (total: 2 segments)
2023-11-15:20:46:30,625 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 2 chunks.
2023-11-15:20:46:30,859 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 3 chunks.
2023-11-15:20:46:31,79 WARNING  [enhancer.py:247] Out of memory error while processing the batch. Trying again with 4 chunks.
2023-11-15:20:46:42,373 INFO     [enhancer.py:207] Processing batch 4 ('S26', 'P32'): 1 segments = 15.86s (total: 3 segments)

With 1 GPU:

2023-11-15:20:44:19,322 INFO     [enhancer.py:207] Processing batch 1 ('S26', 'P29'): 1 segments = 8.89s (total: 0 segments)
2023-11-15:20:44:26,218 INFO     [enhancer.py:207] Processing batch 2 ('S26', 'P30'): 1 segments = 18.55s (total: 1 segments)
2023-11-15:20:44:30,689 INFO     [enhancer.py:207] Processing batch 3 ('S26', 'P31'): 1 segments = 63.23s (total: 2 segments)
2023-11-15:20:44:38,79 INFO     [enhancer.py:207] Processing batch 4 ('S26', 'P32'): 1 segments = 15.86s (total: 3 segments)

That's strange. I have never seen this happen before. Can you check if your GPUs are configured to not share memory?

popcornell · 2023-11-16T01:56:46Z

That's strange. I have never seen this happen before. Can you check if your GPUs are configured to not share memory?

Yep they were in DEFAULT mode. Changed to exclusive mode and it does not happen anymore.
Maybe I should put a check into the code for the GPUs compute mode ?

boeddeker · 2023-11-16T11:15:52Z

Maybe I should put a check into the code for the GPUs compute mode ?

Something to prevent it would be great. I had this issue in CHiME-7.
It would be great to have the check such, that the user doesn't have to change the mode.

desh2608 · 2023-11-16T15:49:18Z

I think it should be sufficient to add this in the README (perhaps as an FAQ), instead of restricting certain modes in the processing. OOM issues can happen for a variety of reasons, such as if the GPU memory is not cleared from a previous running process or misconfigured nodes, and we cannot expect to solve all such problems.

desh2608

The changes look good to me. I'll wait if you want to add something to the README about the GPU configuration. Let me know when it's ready to merge.

desh2608 · 2023-11-16T15:52:59Z

gss/core/gss.py

        initialization = cp.where(initialization == 0, 1e-10, initialization)
        initialization = initialization / cp.sum(initialization, keepdims=True, axis=0)
-        initialization = cp.repeat(initialization[None, ...], 513, axis=0)
+        initialization = cp.repeat(initialization[None, ...], F, axis=0)


Good catch!

popcornell · 2023-11-16T15:54:19Z

I think I can grep the processing mode from nvidia-smi -q, but not sure if this will work on all clusters out there.
I can however put an additional arg to disable this check, with a big warning

desh2608 · 2023-11-16T15:59:11Z

I think I can grep the processing mode from nvidia-smi -q, but not sure if this will work on all clusters out there. I can however put an additional arg to disable this check, with a big warning

You could put these instructions in the README, so that users running the code can check. No need to add it in the code, IMO.

popcornell · 2023-11-17T23:01:49Z

Discussing offline with @boeddeker, we added a new utils called gpu_check.
The idea is to use it as this (e.g. CHiME-7 asr1 recipe):

$cmd JOB=1:$nj  ${exp_dir}/${dset_name}/${dset_part}/log/enhance.JOB.log \
    gss utils gpu_check $nj $cmd \& gss enhance cuts \
      ${exp_dir}/${dset_name}/${dset_part}/cuts.jsonl.gz ${exp_dir}/${dset_name}/${dset_part}/split$nj/cuts_per_segment.JOB.jsonl.gz \
       ${exp_dir}/${dset_name}/${dset_part}/enhanced \
      --bss-iterations $gss_iterations \
      --context-duration 15.0 \
      --use-garbage-class \
      --min-segment-length 0.0 \
      --max-segment-length $max_segment_length \
      --max-batch-duration $max_batch_duration \
      --max-batch-cuts 1 \
      --num-buckets 4 \
      --num-workers 4 \
      --force-overwrite \
      --duration-tolerance 3.0 \
       ${affix} || exit 1

However it will not exit when it raises an exception used as this.
My bash is bad, do you know how to make it to exit ?

desh2608 · 2023-11-18T03:09:21Z

Discussing offline with @boeddeker, we added a new utils called gpu_check. The idea is to use it as this (e.g. CHiME-7 asr1 recipe):

$cmd JOB=1:$nj  ${exp_dir}/${dset_name}/${dset_part}/log/enhance.JOB.log \
    gss utils gpu_check $nj $cmd \& gss enhance cuts \
      ${exp_dir}/${dset_name}/${dset_part}/cuts.jsonl.gz ${exp_dir}/${dset_name}/${dset_part}/split$nj/cuts_per_segment.JOB.jsonl.gz \
       ${exp_dir}/${dset_name}/${dset_part}/enhanced \
      --bss-iterations $gss_iterations \
      --context-duration 15.0 \
      --use-garbage-class \
      --min-segment-length 0.0 \
      --max-segment-length $max_segment_length \
      --max-batch-duration $max_batch_duration \
      --max-batch-cuts 1 \
      --num-buckets 4 \
      --num-workers 4 \
      --force-overwrite \
      --duration-tolerance 3.0 \
       ${affix} || exit 1

However it will not exit when it raises an exception used as this. My bash is bad, do you know how to make it to exit ?

The exit should work if any of the job fails. But I think this whole GPU check thing is overkill. GPU memory issues can happen in any program, and I don't see why it needs to be included in this repo specifically. I can add it if you are using it in ESPNet, but I personally think this is not the right place to solve this issue.

popcornell · 2023-11-18T19:35:57Z

In the meantime, I confirm i get same results with old version on CHiME-7:

###################################################
### Metrics for all Scenarios ###
###################################################
+----+------------+---------------+---------------+----------------------+----------------------+--------+-----------------+-------------+--------------+----------+
|    | scenario   |   num spk hyp |   num spk ref |   tot utterances hyp |   tot utterances ref |   hits |   substitutions |   deletions |   insertions |      wer |
|----+------------+---------------+---------------+----------------------+----------------------+--------+-----------------+-------------+--------------+----------|
|  0 | chime6     |             8 |             8 |                 6644 |                 6644 |  42884 |           11672 |        4325 |         3107 | 0.324451 |
|  0 | dipco      |            20 |            20 |                 3673 |                 3673 |  22175 |            5817 |        1974 |         2210 | 0.333745 |
|  0 | mixer6     |           118 |           118 |                14804 |                14804 | 126632 |           15991 |        6358 |         7815 | 0.202469 |
+----+------------+---------------+---------------+----------------------+----------------------+--------+-----------------+-------------+--------------+----------+
####################################################################
### Macro-Averaged Metrics across all Scenarios (Ranking Metric) ###
####################################################################
+----+---------------+---------------+---------------+----------------------+----------------------+--------+-----------------+-------------+--------------+----------+
|    | scenario      |   num spk hyp |   num spk ref |   tot utterances hyp |   tot utterances ref |   hits |   substitutions |   deletions |   insertions |      wer |
|----+---------------+---------------+---------------+----------------------+----------------------+--------+-----------------+-------------+--------------+----------|
|  0 | macro-average |       48.6667 |       48.6667 |              8373.67 |              8373.67 |  63897 |           11160 |        4219 |      4377.33 | 0.286888 |
+----+---------------+---------------+---------------+----------------------+----------------------+--------+-----------------+-------------+--------------+----------+

popcornell · 2023-11-19T23:24:19Z

Added some lines in the README.md

desh2608

Thanks for the contribution!

popcornell added 2 commits November 13, 2023 13:55

attempting chunking over frequency instead of time.

f1367ca

chunking over frequency:

9c8e30f

max chunks is the number of freqs.

popcornell commented Nov 14, 2023

View reviewed changes

popcornell added 2 commits November 14, 2023 18:50

applied black and isort

c4336bf

applied black and isort

b3b20c1

popcornell added 2 commits November 15, 2023 16:50

only print when less eq than num chunks

37065a4

black

0adee2f

desh2608 linked an issue Nov 16, 2023 that may be closed by this pull request

Chunking along time frames to save GPU Ram? Why not along frequency dim? #33

Closed

desh2608 approved these changes Nov 16, 2023

View reviewed changes

added gpu_check

afc1ea6

popcornell mentioned this pull request Nov 17, 2023

Update CHiME-7 ASR1 recipe espnet/espnet#5555

Merged

added gpu_check readme.md example and AMI recipe

b9ac008

desh2608 approved these changes Nov 20, 2023

View reviewed changes

desh2608 merged commit e74d9f4 into desh2608:master Nov 20, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunking over frequency instead of time #39

Chunking over frequency instead of time #39

popcornell commented Nov 14, 2023 •

edited

Loading

popcornell Nov 14, 2023

desh2608 commented Nov 15, 2023

popcornell commented Nov 15, 2023

desh2608 commented Nov 15, 2023

popcornell commented Nov 16, 2023

boeddeker commented Nov 16, 2023

desh2608 commented Nov 16, 2023

desh2608 left a comment

desh2608 Nov 16, 2023

popcornell commented Nov 16, 2023 •

edited

Loading

desh2608 commented Nov 16, 2023

popcornell commented Nov 17, 2023

desh2608 commented Nov 18, 2023

popcornell commented Nov 18, 2023

popcornell commented Nov 19, 2023

desh2608 left a comment

Chunking over frequency instead of time #39

Chunking over frequency instead of time #39

Conversation

popcornell commented Nov 14, 2023 • edited Loading

popcornell Nov 14, 2023

Choose a reason for hiding this comment

desh2608 commented Nov 15, 2023

popcornell commented Nov 15, 2023

desh2608 commented Nov 15, 2023

popcornell commented Nov 16, 2023

boeddeker commented Nov 16, 2023

desh2608 commented Nov 16, 2023

desh2608 left a comment

Choose a reason for hiding this comment

desh2608 Nov 16, 2023

Choose a reason for hiding this comment

popcornell commented Nov 16, 2023 • edited Loading

desh2608 commented Nov 16, 2023

popcornell commented Nov 17, 2023

desh2608 commented Nov 18, 2023

popcornell commented Nov 18, 2023

popcornell commented Nov 19, 2023

desh2608 left a comment

Choose a reason for hiding this comment

popcornell commented Nov 14, 2023 •

edited

Loading

popcornell commented Nov 16, 2023 •

edited

Loading