You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the Readme it is suggested that you use a similar audio pre-processing as Zimmerman et al. However, they use 40 filterbank channels across their code (e.g. in the yousaidthat repository https://github.com/joonson/yousaidthat/blob/98b51812894497cb6c2b65a7ae147067609fc6ca/run_demo.m#L22)
I was wondering if there was a reason for choosing 13, or if it had just been mixed up with the number of cepstral coefficients.
Thanks,
The text was updated successfully, but these errors were encountered:
@roodrallec our code had been written before theirs were released, and we chose the filterbank according to their paper, which I believe in the original version was 13. Or at lease in SyncNet was 13 :)
Talking-Face-Generation-DAVS/preprocess/savemfcc.m
Line 7 in c0233ac
In the Readme it is suggested that you use a similar audio pre-processing as Zimmerman et al. However, they use 40 filterbank channels across their code (e.g. in the yousaidthat repository https://github.com/joonson/yousaidthat/blob/98b51812894497cb6c2b65a7ae147067609fc6ca/run_demo.m#L22)
I was wondering if there was a reason for choosing 13, or if it had just been mixed up with the number of cepstral coefficients.
Thanks,
The text was updated successfully, but these errors were encountered: