New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Convolution expects input with rank 4, got 5 #38
Comments
I'm getting this error also |
@imranparuk Generally, I think it's the new APIs of tensorflow that differs from the one of the coders'. I tried to use |
I have tried multiple older versions of TF and this code base, no combination seems to be working. I have also tried the conv3d but the shape of the nets don't correspond to what is written in the paper. It fails because of incorrect dimensions that go into tf.squeeze. Would be great if the author helps resolve the issue. |
@imranparuk I‘ve just removed the |
I think I solved it. Use an older version of tensorflow, and uninstall your current version of numpy. Let tensorflow install the numpy version it requires. |
@imranparuk Wow, It seems from a version of acient times. By the way, I'm planning to rewrite the code by using keras as the paper as possible. It should be out till the 3rd, Sep. |
@ArvinSiChuan I would contribute to that. |
@ArvinSiChuan @imranparuk Thank you all for your contribution. |
@astorfi Which dataset would you want to use? I would think the mvu multimodal dataset reported in the paper isn't the one of public choices. And I'm finding datasets suit for this task. If you just give some, it would be perfect! |
@astorfi @ArvinSiChuan , I'm also finding it very difficult to get the multi-modal dataset from the paper. Would be great if we could use a open source dataset. |
@imranparuk I'm considering the datasets in |
Perhaps VoxCeleb dataset is one of the best options. |
@ArvinSiChuan @imranparuk I agree that one of the problems is the dataset is restricted. It takes a lot of effort for me to tune it for a new dataset as I am not working on this project anymore. |
@astorfi I have actually been using the VoxCeleb dataset, I wanted to try using the Mozilla Open Voice. However it seems to require conversion from mp3 to wav. Just lazy to do that. My question is, is it illegal or unethical to include these datasets in your own repository since they all require you to formally request them? |
@imranparuk From my point of view, with the citing of the author and the following of the license(CC BY-SA 4.0 for VoxCeleb), we can use the dataset in our experiements or something. I would like to explain which and how a dataset is used rather including the real dataset files in a repo. |
@imranparuk @astorfi I'm now doing mfec at Here. keras, docs and etc will be comig soon. Would you help me finding bugs or some thing? |
@ArvinSiChuan Thank you for your effort. Sure. I will be more than happy to help. @Article{torfi2018speechpy, |
@astorfi I have some problem about the model. I've built the model with the structure like:
Is this structure the same as yours? Or my input feature pipline is incorrect? Or something with the optimizer? I've got a zero acc in Keras evaluation. Could you help me finding out what's the problem there? |
After moving to Tensorflow v1.0.0, I could get past the training. But the enrollment is failing: Any idea why this is happening? Thanks! |
@sivagururaman The error there explained the problem is |
@ArvinSiChuan, Do i need to move to someother python version? Something like 2.7? |
@sivagururaman Yes, that's one way. You can also try to use |
@ArvinSiChuan Okay. Thanks. WIll try the suggestion and get back if I get stuck somewhere else. |
@ArvinSiChuan Thanks! It worked... Now how do I go about using our own data set for training, enrollment and evaluation? Our problem space is sub set to this, I suppose. As we need only text dependent speaker_id..... |
@ArvinSiChuan I am now able to run the demo as needed. Do you have the input preprocess of wav file handy which I can use? i wanted to extract the input feature of my own clip and use the rest of the n/w for the evaluation. |
@sivagururaman You could refer to input part in the code folder. |
@ArvinSiChuan Thanks. I am going through that now.... |
@ArvinSiChuan One quick question: |
@sivagururaman good question. The input portion of the code seems incomplete, we should take some initiative and complete it... eventually. Try something like this (but better) maybe? datasetTest = AudioDataset(files_path='/home/imran/Documents/projects/wd/wd-3d-cnn/code/0-input/file_path_enrollment_eval.txt', audio_dir=args.audio_dir,
transform=Compose([CMVN(), Feature_Cube(cube_shape=(20, 80, 40), augmentation=True), ToOutput()]))
datasetTrain = AudioDataset(files_path='/home/imran/Documents/projects/wd/wd-3d-cnn/code/0-input/file_path_enrollment_enroll.txt', audio_dir=args.audio_dir,
transform=Compose([CMVN(), Feature_Cube(cube_shape=(20, 80, 40), augmentation=True), ToOutput()]))
# idx is the representation of the batch size which chosen to be as one sample (index) from the data.
# ex: batch_features = [dataset.__getitem__(idx)[0] for idx in range(32)]
# The batch_features is a list and len(batch_features)=32.
lengthTest = datasetTest.__len__()
lengthTrain = datasetTrain.__len__()
out_array_features_train = list()
fileh = tables.open_file('da_dataset.h5', mode='w')
a = tables.Float32Atom()
b = tables.Int32Atom()
array_a = fileh.create_earray(fileh.root, 'label_evaluation', b, (0,))
array_b = fileh.create_earray(fileh.root, 'label_enrollment', b, (0,))
array_c = fileh.create_earray(fileh.root, 'utterance_evaluation', a, (0, 80, 40, 1))
array_d = fileh.create_earray(fileh.root, 'utterance_enrollment', a, (0, 80, 40, 1))
for x in range(0, lengthTest):
feature, label = datasetTest.__getitem__(x)
feature = feature.swapaxes(1, 2).swapaxes(2, 3)
feature = feature[:,:,:,0:1]
feature = np.squeeze(np.array(feature), axis=0)
print(feature.shape)
array_a.append(np.array([label]))
array_c.append(np.array([feature]))
for x in range(0, lengthTrain):
feature, label = datasetTrain.__getitem__(x)
feature = feature.swapaxes(1, 2).swapaxes(2, 3)
feature = feature[:, :, :, 0:1]
feature = np.squeeze(np.array(feature), axis=0)
array_b.append(np.array([label]))
array_d.append(np.array([feature]))
# close the file...
fileh.close() |
@imranparuk I could not test this input code. Will do so in coming days and let you know of the updates. Just after a glance of the code: I understand that we would have da_datast.h5 which will have the enrollment and evaluation data. if I need to test with another sample say new utterance.wav, how do i do that? |
@sivagururaman Another good question... I wrote my own prediction code to do that but it's too long to post here. A simpler way would be to create a dataset file the same way but the text file only has 1 item... Then pass it to the model in a similar way. I place the task of posting the code to another user... |
@imran - did you mean test file instead of text file?
Regards,
Sivagururaman
On Mon, 1 Oct 2018 at 6:53 PM Imran Paruk ***@***.***> wrote:
Another good question... I wrote my own prediction code to do that but
it's too long to post here. A simpler way would be to create a dataset file
the same way but the text file only has 1 item... Then pass it to the model
in a similar way.
I place the task of posting the code to another user...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AmZiiGV_zaGdvvegpZoNOtDirZK9C1hsks5ughdfgaJpZM4Vn8OC>
.
--
Regards, Sivam.
|
@sivagururaman no, there is a text file provided with the git repo which has a particular format to identify speakers.
etc |
@imran - Thanks. Will go over the format and see if that helps our run here.
Thanks!
…On Thu, Oct 4, 2018 at 3:50 AM Imran Paruk ***@***.***> wrote:
@sivagururaman <https://github.com/sivagururaman> no, there is a text
file provided with the git repo which has a particular format to identify
speakers.
0 file1.wav
1 file2.wav
1 file3.wav
2 file4.wav
etc
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AmZiiDhlEm-9t8P-sEYX8i4UsGs23HFpks5uhTgggaJpZM4Vn8OC>
.
|
@imran - If you donot mind, can you also share your prediction code with me
which I can use for reference?
Thanks!
On Thu, Oct 4, 2018 at 8:05 AM Sivam Mahadevan <sivam.mahadevan@broadcom.com>
wrote:
… @imran - Thanks. Will go over the format and see if that helps our run
here.
Thanks!
On Thu, Oct 4, 2018 at 3:50 AM Imran Paruk ***@***.***>
wrote:
> @sivagururaman <https://github.com/sivagururaman> no, there is a text
> file provided with the git repo which has a particular format to identify
> speakers.
>
> 0 file1.wav
> 1 file2.wav
> 1 file3.wav
> 2 file4.wav
>
> etc
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#38 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AmZiiDhlEm-9t8P-sEYX8i4UsGs23HFpks5uhTgggaJpZM4Vn8OC>
> .
>
|
@imranparuk , thank you for sharing you code. I tried it and it gave the following (please note that I used
I think this won't work. If we considered the first shape dimension (8) is the idx of speakers in my file_path.txt (there are 9 wav files), then we took just one utterance for each of them (last one = 0 and must be the utterance idx). That will give an error about the # of the utterances when the demo is run. |
@MSAlghamdi Hey man, I actually stopped doing the static dataset method (Where you extract the features beforehand). I have started a project based off this project but written in Keras. PS: @astorfi I will make sure you are given credit for your work. I just created this project, haven't had time to complete the README.md |
@imranparuk Good work! Thank you for sharing it. I still have hope to do it in simpler static way. I tried anther method that has some issues.
The .h5 file was created in a good shape. There's another issue popped up when I ran the demo. The posted h5 file with the project is as same structure as mine but mine has negative #'s. I think this's an issue with generating the features in the input file.py |
Thank you so much for your effort and great work. |
Thank you @astorfi for your kindness and your grate project. My master's thesis is an evaluation of yours with other SV systems. It seems your has the ability to beat them. I'm just stuck with only yours because of the h5 file issues. |
Thanks for your kind words. Please consider the following directions:
Bests |
When I run the
run.sh
, it shows something wrong:The text was updated successfully, but these errors were encountered: