Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About start training: IndexError: tuple index out of range. #58

Open
TungyuYoung opened this issue Mar 26, 2022 · 13 comments
Open

About start training: IndexError: tuple index out of range. #58

TungyuYoung opened this issue Mar 26, 2022 · 13 comments

Comments

@TungyuYoung
Copy link

TungyuYoung commented Mar 26, 2022

Hi, Dr.Gong,
I use AST on my own dataset. I have created the .json file and .csv file according to the guide. However, when I run run.sh, an error occured and I was too stupid to fix it. The error is as shown below,
3e219a85af871b195637717af2c5309
I don't know how to solve it. I would appreciate it if you can tell me the reason.

Yours

@TungyuYoung
Copy link
Author

I check for the reson for a while. I printed the shape of fbank which is [128, 1024]. It seems that the spectrograms havn't trans to RGB channel. Would you please tell me how to fix it?

@YuanGongND
Copy link
Owner

I cannot tell the reason either. But there's no RGB concept in the audio spectrogram. It is just 1-d information. [128,1024] means 128 frequency bins, 1024 time frames, which looks correct. The problem seems to happen in spectrogram masking.

@TungyuYoung
Copy link
Author

Alright. I check line 191 in file ./dataloader, it seems that the problem happen in TimeMasking.

@TungyuYoung
Copy link
Author

What's more, the average length of my dataset is only 2.5s. Do you think such a short audio will cause errors in theTimeMasking and affect the final performance of the model?

@YuanGongND
Copy link
Owner

The length depends on input_tdim, for your case, you should modify run.py to set input_tdim=250. timem should be smaller than input_tdim. Again, I suggest starting from either the speechcommands or esc50 recipe to get more familiar with the code.

@TungyuYoung
Copy link
Author

Hi, Dr. Gong
I comment the timem and the program worked successfully. For my own dataset, I split them for test and train. 20 percent of each class for test and the last for train. After training, I got a strange result as shown:

0.908529570359724
0.965062499999999
0.685776370522133
1
2.56357337530442

These are the value from wa_result.csv.
I'm very confused about this. Besides, the RECALL performs value of 1 each epoch. And I try to use the inference.py to predict the audio file and it seems works well.
I would appreciate it if you could tell me your train of thought to solve this problem.

Yours

@Mxnet123
Copy link

I also encountered the same problem, could you please tell me how to solve it in detail,Thank you very much

@TungyuYoung
Copy link
Author

I also encountered the same problem, could you please tell me how to solve it in detail,Thank you very much
what exactly is your problem?

@Mxnet123
Copy link

About start training: IndexError: tuple index out of range?

@TungyuYoung
Copy link
Author

TungyuYoung commented Mar 29, 2022

About start training: IndexError: tuple index out of range?

I commented it

if self.timem != 0:
fbank = timem(fbank)
directly. Then I performed data augmentation directly before training and took the augmented data as input.

@Mxnet123
Copy link

Okay, thank you very much. I'm trying

@YuanGongND
Copy link
Owner

YuanGongND commented May 8, 2022

OK, I finally find the reason.

This is due to a torchaudio issue. We use torchaudio 0.8.1, in which the input of the masking can be [freq, time] while the newer version torchaudio only accepts [1, freq, time].

I have fixed it with a workaround (works for both old and new torchaudio) at

ast/src/dataloader.py

Lines 190 to 197 in b708675

# this is just to satisfy new torchaudio version, which only accept [1, freq, time]
fbank = fbank.unsqueeze(0)
if self.freqm != 0:
fbank = freqm(fbank)
if self.timem != 0:
fbank = timem(fbank)
# squeeze it back, it is just a trick to satisfy new torchaudio version
fbank = fbank.squeeze(0)

Your workaround (comment out the time-masking) might cause a problem of inaccurate masking span and lead to a performance drop (while might be small). I would suggest using our fixed code. It is very simple.

@YuanGongND
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants