Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question, performance not match with the paper and feature generation #6

Closed
meixitu opened this issue Jan 4, 2018 · 2 comments
Closed

Comments

@meixitu
Copy link

meixitu commented Jan 4, 2018

Hi,
Thanks for your wonderful work. It really help me much.
I have several question about this project
1 , I run the code with your train_commands.txt, and I found the performance is a little worse than the result in the paper, Table 7. For DS-CNN, small model, the highest validation accuracy is 92.98% in codee, and it is 93.6% in the paper.
My question is, do you get the Table 7 performance with the same code setting?

  1. in the train.py, the test dataset is implemented after training is done.
    It does not use the model when validation dataset accuracy is maximum.
    do you calculate the test accuracy in the paper with the same method?

3, did you compare the performance of LBFE vs MFCC? in google's paper, it use LFBE. But MFCC can use small feature, you only use 10 MFCC features. If we use more MFCC features, can we get the higher performance?

4, do you consider the feature normalization to compatible with the different signal power range?

5, if some frame, the signal power is zero, how do you calculate the log(LBFE)? I can't see it in the code. In general, it will use log(LBFE+delta), delta is a constant small value. what is the delta value?

  1. In many paper, they use window_size_ms=25 or 30ms, and window_stride_ms=10ms,
    but in DS-CNN, you use window_size=40ms and window_stride=20ms
    big window_stride can reduce the OPERATIONS, I can understand.
    But I don't know why use 40ms window_size, for 16k input sample rate, it should use 1024 FFT, it is power consuming.

7, I run the simulation, almost need 4 hours. I use GPU Geforce 1080 TI and CPU E5-2650. But I saw in your other reply that you only need 1 hours to complete the simulation. Is there any way to speed up? I found the feature generation use most of the time.

Thanks
Jinhong

@meixitu meixitu changed the title performance not match with the paper question, performance not match with the paper and feature generation Jan 4, 2018
@navsuda
Copy link
Collaborator

navsuda commented Jan 8, 2018

Hi @zhangjinhong17

  1. Such difference is expected because of differences in the weight initialization.
  2. The Table 7 accuracies are obtained from the checkpoint with highest accuracy on validation set (i.e. the last saved checkpoint). Use test.py to test the accuracy on your checkpoints.
  3. We did not compare LFBE vs. MFCC and we did not any observe higher performance (i.e. accuracy) using more MFCC features. It might be a function of the number and type of output words you are classifying, for example, you may need a higher resolution (i.e. more features) to differentiate "light" vs. "flight".
  4. If you were asking about batch normalization to normalize the features across different inputs, it seems to work fine with this dataset. It would be interesting to see how well the batch norm parameters generalize to another dataset.
  5. MFCC computation is a part of Tensorflow, where a delta of 1e-12 is used. Check here for more details: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/mfcc.cc#L60.
  6. That's a good point, as 40ms will give 640 samples and you would have to do pad it to 1024 to perform FFT. However, typically, the total number of operations in neural network is much higher than the number of computations in FFT and hence it will not matter. It might matter, though, when the neural network is squeezed down to <1MOps per inference. In our case, window size of 40ms was the result of the initial hyperparameter search.
  7. Training time is a function of network size and from what we have seen, for small networks you should get a good enough accuracy within the first hour and only incremental accuracy improvement (~1-2%) after that.

@meixitu
Copy link
Author

meixitu commented Jan 9, 2018

Hi @navsuda ,

1. You are right. I run the same code twice, and the result is a little different. 

Thanks for your other reply.

Thanks
Jinhong

@navsuda navsuda closed this as completed Jan 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants