About start training: IndexError: tuple index out of range. #58

TungyuYoung · 2022-03-26T17:01:51Z

Hi, Dr.Gong,
I use AST on my own dataset. I have created the .json file and .csv file according to the guide. However, when I run run.sh, an error occured and I was too stupid to fix it. The error is as shown below,

I don't know how to solve it. I would appreciate it if you can tell me the reason.

Yours

TungyuYoung · 2022-03-26T18:32:41Z

I check for the reson for a while. I printed the shape of fbank which is [128, 1024]. It seems that the spectrograms havn't trans to RGB channel. Would you please tell me how to fix it?

YuanGongND · 2022-03-26T18:37:51Z

I cannot tell the reason either. But there's no RGB concept in the audio spectrogram. It is just 1-d information. [128,1024] means 128 frequency bins, 1024 time frames, which looks correct. The problem seems to happen in spectrogram masking.

TungyuYoung · 2022-03-26T18:56:56Z

Alright. I check line 191 in file ./dataloader, it seems that the problem happen in TimeMasking.

TungyuYoung · 2022-03-26T19:01:03Z

What's more, the average length of my dataset is only 2.5s. Do you think such a short audio will cause errors in theTimeMasking and affect the final performance of the model?

YuanGongND · 2022-03-27T00:55:10Z

The length depends on input_tdim, for your case, you should modify run.py to set input_tdim=250. timem should be smaller than input_tdim. Again, I suggest starting from either the speechcommands or esc50 recipe to get more familiar with the code.

TungyuYoung · 2022-03-27T16:06:10Z

Hi, Dr. Gong
I comment the timem and the program worked successfully. For my own dataset, I split them for test and train. 20 percent of each class for test and the last for train. After training, I got a strange result as shown:

0.908529570359724
0.965062499999999
0.685776370522133
1
2.56357337530442

These are the value from wa_result.csv.
I'm very confused about this. Besides, the RECALL performs value of 1 each epoch. And I try to use the inference.py to predict the audio file and it seems works well.
I would appreciate it if you could tell me your train of thought to solve this problem.

Yours

Mxnet123 · 2022-03-29T07:27:26Z

I also encountered the same problem, could you please tell me how to solve it in detail,Thank you very much

TungyuYoung · 2022-03-29T07:32:24Z

I also encountered the same problem, could you please tell me how to solve it in detail,Thank you very much
what exactly is your problem?

Mxnet123 · 2022-03-29T07:34:24Z

About start training: IndexError: tuple index out of range?

TungyuYoung · 2022-03-29T07:37:03Z

About start training: IndexError: tuple index out of range?

I commented it

ast/src/dataloader.py

Line 192 in 7b2fe70

if self.timem != 0:

ast/src/dataloader.py

Line 193 in 7b2fe70

fbank = timem(fbank)

directly. Then I performed data augmentation directly before training and took the augmented data as input.

Mxnet123 · 2022-03-29T07:43:11Z

Okay, thank you very much. I'm trying

YuanGongND · 2022-05-08T22:00:57Z

OK, I finally find the reason.

This is due to a torchaudio issue. We use torchaudio 0.8.1, in which the input of the masking can be [freq, time] while the newer version torchaudio only accepts [1, freq, time].

I have fixed it with a workaround (works for both old and new torchaudio) at

ast/src/dataloader.py

Lines 190 to 197 in b708675

    
           # this is just to satisfy new torchaudio version, which only accept [1, freq, time] 
        
           fbank = fbank.unsqueeze(0) 
        
           if self.freqm != 0: 
        
               fbank = freqm(fbank) 
        
           if self.timem != 0: 
        
               fbank = timem(fbank) 
        
           # squeeze it back, it is just a trick to satisfy new torchaudio version 
        
           fbank = fbank.squeeze(0)

Your workaround (comment out the time-masking) might cause a problem of inaccurate masking span and lead to a performance drop (while might be small). I would suggest using our fixed code. It is very simple.

YuanGongND · 2022-05-08T22:05:36Z

You can use the Colab script to find the bug https://colab.research.google.com/github/YuanGongND/ast/blob/master/colab/torchaudio_SpecMasking_1_1.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About start training: IndexError: tuple index out of range. #58

About start training: IndexError: tuple index out of range. #58

TungyuYoung commented Mar 26, 2022 •

edited

Loading

TungyuYoung commented Mar 26, 2022

YuanGongND commented Mar 26, 2022

TungyuYoung commented Mar 26, 2022

TungyuYoung commented Mar 26, 2022

YuanGongND commented Mar 27, 2022

TungyuYoung commented Mar 27, 2022

Mxnet123 commented Mar 29, 2022

TungyuYoung commented Mar 29, 2022

Mxnet123 commented Mar 29, 2022

TungyuYoung commented Mar 29, 2022 •

edited

Loading

Mxnet123 commented Mar 29, 2022

YuanGongND commented May 8, 2022 •

edited

Loading

YuanGongND commented May 8, 2022

About start training: IndexError: tuple index out of range. #58

About start training: IndexError: tuple index out of range. #58

Comments

TungyuYoung commented Mar 26, 2022 • edited Loading

TungyuYoung commented Mar 26, 2022

YuanGongND commented Mar 26, 2022

TungyuYoung commented Mar 26, 2022

TungyuYoung commented Mar 26, 2022

YuanGongND commented Mar 27, 2022

TungyuYoung commented Mar 27, 2022

Mxnet123 commented Mar 29, 2022

TungyuYoung commented Mar 29, 2022

Mxnet123 commented Mar 29, 2022

TungyuYoung commented Mar 29, 2022 • edited Loading

Mxnet123 commented Mar 29, 2022

YuanGongND commented May 8, 2022 • edited Loading

YuanGongND commented May 8, 2022

TungyuYoung commented Mar 26, 2022 •

edited

Loading

TungyuYoung commented Mar 29, 2022 •

edited

Loading

YuanGongND commented May 8, 2022 •

edited

Loading