Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding input data #11

Closed
abhishekkritarth opened this issue Mar 21, 2018 · 14 comments
Closed

Regarding input data #11

abhishekkritarth opened this issue Mar 21, 2018 · 14 comments

Comments

@abhishekkritarth
Copy link

Hi @astorfi
I have gone through your code. While extracting mfcc features for sample audio file it contains shape (420,40) here, 420 is number of frames and 40 is number of features.But In sample data of your code youre applying mfec feature file contains shape (3209,40,3). As per my understanding 3209 is Number of Frames,40 is Number of Features,3 Is number of Channels. I didn't understand the number of channels usage.can you please suggest how to create Feature_mfec.npy file in your format.

@mmubeen-6
Copy link

@abhishekkritarth, looks like you have not read the corresponding paper properly. The first channel is the actual MFEC features of the audio, while the second and third channel represent the 1st and 2nd derivative of the MFEC features respectively. They can be easily extracted using the functions available in speechpy library.

@astorfi
Copy link
Owner

astorfi commented Mar 24, 2018

@abhishekkritarth For further information please take a look at the associated paper:
Text-Independent Speaker Verification Using 3D Convolutional Neural Networks.

The 3 is the number of channels which consists of static, first order and second order derivative features.
As @mmubeen-6 kindly mentioned, it can directly be extracted using SpeechPy.

Please let me know if you had any other question.

@astorfi astorfi closed this as completed Mar 29, 2018
@cp2923
Copy link

cp2923 commented Apr 7, 2018

@astorfi Sorry to bother you.. This might be a silly question but I want to make sure if I understand correctly.
For the 3209 which is Number of Frames of MFEC, it should vary from different samples. But I don't need to trim them into 80(which is the input frame size you mentioned in the paper) before training. Am I correct?

@astorfi
Copy link
Owner

astorfi commented Apr 7, 2018

@cp2923 No problem.

There is no trimming. However, you should choose how many of them you need. I case of my paper, only 80 consecutive frames are needed and it's a must as well!!
For example, 80 consecutive frames means 0.8-sec in case of using 10ms stride for generating features.
That 3209 you mentioned is for the whole sound file.

Please let me know if you had any other question.

@astorfi astorfi reopened this Apr 7, 2018
@astorfi
Copy link
Owner

astorfi commented Apr 7, 2018

@cp2923 I just opened this issue again. So please close it if your got the answer of your question.
Thanks

@cp2923
Copy link

cp2923 commented Apr 7, 2018

@astorfi Thank you for your quick reply. So for the (3209,40,3) npy file, I should divide them into 40 npy file as the input of create_development.py?

@astorfi
Copy link
Owner

astorfi commented Apr 7, 2018

@cp2923 I am a little bit confused about what you are trying to do. Let's say you want to feed some (80,40,3) cube to the network (discarding the batch dimension for now). So you get those 80s from 3209. Right? The create_development.py file is specific to the dataset that I used and it's just a sample. You do not need to be confused about it. Please read the paper in detail to realize what is the input pipeline.
Please refer to this post as well.

In any case, please do not hesitate to contact me if you had an issue.

@cp2923
Copy link

cp2923 commented Apr 8, 2018

@astorfi I think I totally understand. May I ask one more question? Why did you create two sessions under sample_data/1 ? Do I have to create a folder for each npy? Thanks!!

@astorfi
Copy link
Owner

astorfi commented Apr 8, 2018

@cp2923 Sure. That was just an arbitrary naming for different files. There is nothing much there for considering.

@cp2923
Copy link

cp2923 commented Apr 8, 2018

@astorfi Thank you for your patient!
Sorry I can't find how to close the issue here....

@astorfi
Copy link
Owner

astorfi commented Apr 8, 2018

@cp2923 My pleasure. I will close this.

@astorfi astorfi closed this as completed Apr 8, 2018
@cp2923
Copy link

cp2923 commented Apr 8, 2018

@astorfi
Hi, sorry to bother again.... I still feel a little confused when I read create_development.
Does folder 1 and 2 under sample_data denote two speaker?
What's the difference between train_files_subjects_list and train_files_subjects_ids?

@astorfi
Copy link
Owner

astorfi commented Apr 9, 2018

@cp2923 Please ignore those naming conventions. That's database related. Please follow the paper.

@cp2923
Copy link

cp2923 commented Apr 9, 2018

@astorfi Thanks for your reply.
These questions seems too detailed to be mentioned in you paper. I think I need to know what train_files_subjects_list and train_files_subjects_ids are, and then I can build the data input correctly.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants