Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples do not run and have errors, possibly relating to the value of nj #3

Closed
tfburns opened this issue Oct 25, 2020 · 4 comments
Closed

Comments

@tfburns
Copy link

tfburns commented Oct 25, 2020

Description

Following the installation instructions and building release or debut builds, the TestDll fails to execute both the YesNo and LibriSpeech examples. Both how one similar early kaldi warning (FST is not stochastic!) and according to kaldi user issues and discussions I found, their errors appear to be due to an incorrect value set for nj ("number of jobs"). Kaldi documentation states that "Generally speaking you can reduce the value of the –nj option without affecting the outcome, but there are some situations where the –nj options given to multiple scripts must match, or a later stage will crash." The format for changing its values in some kaldi command-line examples is --nj 1 for 1 job.

Version & Hardware

Operating System version: Windows 10 Home 64-bit

Computer: i7 CPU, 16GB RAM

Computer Processors/Cores: 4 cores

Visual Studio Version: Visual Studio Community 2017 (version 15.9.28)

Visual Studio Coniguration: Release or Debug 64-bit

Steps to Reproduce In Case Of An Error

  1. Follow installation instructions as described.
  2. Run examples.

Expected behavior: Expect no errors

Actual behavior: Get error

Reproduces how often: 100%

Which Steps Have You Tried To Debug The Problem In Case Of An Error

In the YesNo test case, it seems the value of nj is too large. This issue on kaldi discusses it for an example: kaldi-asr/kaldi#2320
In those cases users are advised to change nj to a lower value in the various kaldi shell script files. I tried passing --nj 1 via the mfcc.conf file but this was not accepted as an option by kaldi. I don't know where to change the nj value in VoiceBridge.

In the LibirSpeech test case, there seems to be a mis-match when splitting the data directory between the number of speakers and some kind of output scp files. However, I can find no mention of "splitting" in LibriSpeech.cpp. This discussion re kaldi seems to suggest it may be an issue again with the value set for nj: https://groups.google.com/g/kaldi-help/c/nV8FcnjoxJY

Logs

YesNo test:

[INFO]  ***************************************
[INFO]  * WELCOME TO VOICEBRIDGE FOR WINDOWS! *
[INFO]  ***************************************
[INFO]  Preparing data...
[INFO]  Creating backup of existing data directory...
[INFO]  Creating archive "C:/VoiceBridgeProjects\YesNo\data.zip"
[INFO]  Data succesfully backed up to "C:/VoiceBridgeProjects\YesNo\backup-data20201026-014714.zip"
[INFO]  Preparing new data...
[INFO]  Creating backup of language model...
[INFO]  Creating archive "C:/VoiceBridgeProjects\YesNo\input\task.zip"
[INFO]  Language model succesfully backed up to "C:/VoiceBridgeProjects\YesNo\input\task20201026-014714.zip"
[INFO]  0.549
[INFO]  Loading corpus C:/VoiceBridgeProjects\YesNo\data\full_text.txt...
[INFO]  0.580
[INFO]  Smoothing[1] = ModKN
[INFO]  0.580
[INFO]  Smoothing[2] = ModKN
[INFO]  0.581
[INFO]  Smoothing[3] = ModKN
[INFO]  0.581
[INFO]  Set smoothing algorithms...
[INFO]  0.582
[INFO]  Y 0.000000e+00
[INFO]  0.582
[INFO]  Y 0.000000e+00
[INFO]  0.583
[INFO]  Y 1.000000e+00
[INFO]  Estimating full n-gram model...
[INFO]  Saving vocabulary to C:/VoiceBridgeProjects\YesNo\data\vocab.txt.temp...
[INFO]  Saving LM to C:/VoiceBridgeProjects\YesNo\input\task.arpabo...
[INFO]  Data preparation succeeded!
[INFO]  Preparing dictionary...
[WARNING]       The reference dictionary does not exist or empty. Expecting to have a ready lexicon...
[INFO]  Found already existing lexicon and using it...
[INFO]  Silence phones saved to:
[INFO]    C:/VoiceBridgeProjects\YesNo\data/local\dict\silence_phones.txt
[INFO]  Optional silence saved to:
[INFO]    C:/VoiceBridgeProjects\YesNo\data/local\dict\optional_silence.txt
[INFO]  Non-silence phones saved to:
[INFO]    C:/VoiceBridgeProjects\YesNo\data/local\dict\nonsilence_phones.txt
[INFO]  Extra triphone clustering-related questions saved to:
[INFO]    C:/VoiceBridgeProjects\YesNo\data/local\dict\extra_questions.txt
[INFO]  Preparing language features...
[INFO]  C:/VoiceBridgeProjects\YesNo\data/local\dict\silence_phones.txt is OK!
[INFO]  C:/VoiceBridgeProjects\YesNo\data/local\dict\optional_silence.txt is OK!
[INFO]  C:/VoiceBridgeProjects\YesNo\data/local\dict\nonsilence_phones.txt is OK!
[INFO]  C:/VoiceBridgeProjects\YesNo\data/local\dict\silence_phones.txt and
C:/VoiceBridgeProjects\YesNo\data/local\dict\nonsilence_phones.txt do not overlap! OK!
[INFO]    --> extra_questions.txt is validated with succes!
[INFO]    --> Dictionaries are validated with succes in C:/VoiceBridgeProjects\YesNo\data/local\dict.
[INFO]    --> Creating C:/VoiceBridgeProjects\YesNo\data/local\dict\lexiconp.txt from C:/VoiceBridgeProjects\YesNo\data/local\dict\lexicon.txt.
[INFO]    --> Creating C:/VoiceBridgeProjects\YesNo\data/local\lang\align_lexicon.txt from C:/VoiceBridgeProjects\YesNo\data/local\lang\lexiconp.txt.
[INFO]    --> Validating output directory...
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\phones.txt...
[INFO]    --> "phones.txt" is OK.
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\words.txt...
[INFO]    --> "words.txt" is OK.
[INFO]  Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
[INFO]    --> silence.txt and nonsilence.txt are disjoint.
[INFO]    --> silence.txt and disambig.txt are disjoint.
[INFO]    --> disambig.txt and nonsilence.txt are disjoint.
[INFO]    --> disjoint property is OK.
[INFO]  Checking summation: silence.txt, nonsilence.txt, disambig.txt ...
[INFO]    --> summation property is OK.
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\phones\context_indep{.txt, .int, .csl} ...
[INFO]    --> 1 entry/entries in "context_indep.txt".
[INFO]  "context_indep.int" corresponds to "context_indep.txt".
[INFO]    --> "context_indep.csl" corresponds to "context_indep.txt".
[INFO]    --> C:/VoiceBridgeProjects\YesNo\data\lang\phones\context_indep{.txt, .int, .csl} are OK!
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\phones\nonsilence{.txt, .int, .csl} ...
[INFO]    --> 7 entry/entries in "nonsilence.txt".
[INFO]  "nonsilence.int" corresponds to "nonsilence.txt".
[INFO]    --> "nonsilence.csl" corresponds to "nonsilence.txt".
[INFO]    --> C:/VoiceBridgeProjects\YesNo\data\lang\phones\nonsilence{.txt, .int, .csl} are OK!
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\phones\silence{.txt, .int, .csl} ...
[INFO]    --> 1 entry/entries in "silence.txt".
[INFO]  "silence.int" corresponds to "silence.txt".
[INFO]    --> "silence.csl" corresponds to "silence.txt".
[INFO]    --> C:/VoiceBridgeProjects\YesNo\data\lang\phones\silence{.txt, .int, .csl} are OK!
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\phones\optional_silence{.txt, .int, .csl} ...
[INFO]    --> 1 entry/entries in "optional_silence.txt".
[INFO]  "optional_silence.int" corresponds to "optional_silence.txt".
[INFO]    --> "optional_silence.csl" corresponds to "optional_silence.txt".
[INFO]    --> C:/VoiceBridgeProjects\YesNo\data\lang\phones\optional_silence{.txt, .int, .csl} are OK!
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\phones\disambig{.txt, .int, .csl} ...
[INFO]    --> 3 entry/entries in "disambig.txt".
[INFO]  "disambig.int" corresponds to "disambig.txt".
[INFO]    --> "disambig.csl" corresponds to "disambig.txt".
[INFO]    --> C:/VoiceBridgeProjects\YesNo\data\lang\phones\disambig{.txt, .int, .csl} are OK!
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\phones\roots{.txt, .int} ...
[INFO]    --> 6 entry/entries in "roots.txt".
[INFO]    --> cat.int corresponds to cat.txt.
[INFO]    --> C:/VoiceBridgeProjects\YesNo\data\lang\phones\roots{.txt, .int} are OK!
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\phones\sets{.txt, .int} ...
[INFO]    --> 6 entry/entries in "sets.txt".
[INFO]    --> cat.int corresponds to cat.txt.
[INFO]    --> C:/VoiceBridgeProjects\YesNo\data\lang\phones\sets{.txt, .int} are OK!
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\phones\extra_questions{.txt, .int} ...
[INFO]    --> 2 entry/entries in "extra_questions.txt".
[INFO]    --> cat.int corresponds to cat.txt.
[INFO]    --> C:/VoiceBridgeProjects\YesNo\data\lang\phones\extra_questions{.txt, .int} are OK!
[INFO]  Checking optional_silence.txt ...
[INFO]    --> C:/VoiceBridgeProjects\YesNo\data\lang\phones\optional_silence.txt is OK.
[INFO]  Checking disambiguation symbols: #0 and #1
[INFO]    --> phones/disambig.txt has "#0" and "#1".
[INFO]    --> phones/disambig.txt is OK.
[INFO]  Checking topo ...
[INFO]    --> "topo" is OK.
[INFO]  Checking word-level disambiguation symbols...
[INFO]  Checking C:/VoiceBridgeProjects\YesNo\data\lang\oov{.txt, .int} ...
[INFO]    --> 1 entry/entries in "oov.txt".
[INFO]    --> cat.int corresponds to cat.txt.
[INFO]    --> C:/VoiceBridgeProjects\YesNo\data\lang\oov{.txt, .int} are OK!
[INFO]    --> lang/L.fst is olabel sorted.
[INFO]    --> lang/path_L_disambig.fst is olabel sorted.
[INFO]    --> SUCCESS [validating lang directory C:/VoiceBridgeProjects\YesNo\data\lang ].
[INFO]  Preparing language models for test...
[INFO]  Reduced num-states from 10 to 10
[INFO]  min weigth=-5.77482e-07 max weigth=-0.693147.
[WARNING]        FST is not stochastic!C:/VoiceBridgeProjects\YesNo\data\lang_test_tg\G.fst.
[INFO]  Language models for test preparation succeeded!


[INFO]  Starting MFCC features extraction...
[INFO]    --> Validating utt2spk...
[INFO]    --> Validating spk2utt...
[INFO]    --> Validating text...
[INFO]    --> Validating wav.scp...
[INFO]  Successfully validated data-directory C:/VoiceBridgeProjects\YesNo\data\train_yesno.
[INFO]  No segments file exists: assuming wav.scp indexed by utterance.
[INFO]  Succeeded creating MFCC features for train_yesno.
[INFO]    --> Validating utt2spk...
[INFO]    --> Validating spk2utt...
[INFO]    --> Validating text...
[INFO]    --> Validating wav.scp...
[INFO]  Successfully validated data-directory C:/VoiceBridgeProjects\YesNo\data\test_yesno.
[INFO]  No segments file exists: assuming wav.scp indexed by utterance.
[ERROR]  [SplitScp] You are splitting into too many pieces! [reduce number of jobs (nj)].
[ERROR] Feature extraction failed.

LibriSpeech test:

I will not post the entire console output since it is very long. The first warning is [WARNING] FST is not stochastic!C:/VoiceBridgeProjects\LibriSpeech\data\lang_test_tg\G.fst. (which also showed for the YesNo example), and which shows just after the MFCC features extraction step. Many more warnings start to appear to stages 4 and 5.

Stage 4 shows the following output:

[INFO]  STAGE 4: Maximum Likelihood re-estimation of GMM-based acoustic model...
[INFO]  TransitionModel::Update, objf change is 0.0156152 per frame over 1.7508e+06 frames.
[INFO]  0 probabilities floored, 531 out of 1046 transition-states skipped due to insuffient data (it is normal to have some skipped.)
[WARNING]       Gaussian has too little data but not removing it because it is the last Gaussian: i = 0, occ = 0, weight = 1
[WARNING]       Gaussian has too little data but not removing it because it is the last Gaussian: i = 0, occ = 0, weight = 1
[WARNING]       Gaussian has too little data but not removing it because it is the last Gaussian: i = 0, occ = 0, weight = 1
[WARNING]       Gaussian has too little data but not removing it because it is the last Gaussian: i = 0, occ = 0, weight = 1
[WARNING]       Gaussian has too little data but not removing it because it is the last Gaussian: i = 0, occ = 0, weight = 1
[INFO]  0 variance elements floored in 0 Gaussians, out of 127
[INFO]  Removed 0 Gaussians due to counts < --min-gaussian-occupancy=3 and --remove-low-count-gaussians=true
[INFO]  Split 127 states with target = 127, power = 0.25, perturb_factor = 0.01 and min_count = 20, split #Gauss from 127 to 127

Here is a typical pass in Stage 5:

[INFO]   >> Pass 7.
[WARNING]       The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
[WARNING]       Retrying utterance 5639-40744-0012 with beam 40
[WARNING]       Retrying utterance 4446-2273-0016 with beam 40
[WARNING]       Did not successfully decode file 4446-2273-0016, len = 963
[WARNING]       Retrying utterance 5142-36377-0012 with beam 40
[WARNING]       No alignment for utterance 4446-2273-0016
[INFO]  TransitionModel::Update, objf change is 0.00101131 per frame over 1.74984e+06 frames.
[INFO]  12 probabilities floored, 538 out of 1046 transition-states skipped due to insuffient data (it is normal to have some skipped.)
[WARNING]       Gaussian has too little data but not removing it because it is the last Gaussian: i = 0, occ = 0, weight = 1
[WARNING]       Gaussian has too little data but not removing it because it is the last Gaussian: i = 0, occ = 0, weight = 1
[WARNING]       Gaussian has too little data but not removing it because it is the last Gaussian: i = 0, occ = 0, weight = 1
[WARNING]       Gaussian has too little data but not removing it because it is the last Gaussian: i = 0, occ = 0, weight = 1
[WARNING]       Gaussian has too little data but not removing it because it is the last Gaussian: i = 0, occ = 0, weight = 1
[INFO]  0 variance elements floored in 0 Gaussians, out of 272
[INFO]  Removed 0 Gaussians due to counts < --min-gaussian-occupancy=10 and --remove-low-count-gaussians=true
[INFO]  Split 127 states with target = 301, power = 0.25, perturb_factor = 0.01 and min_count = 20, split #Gauss from 272 to 301

After 39 passes, stage 5 ends with the following error:

[INFO]  The optional-silence phone SIL occupies 4.9% of frames overall.
[INFO]  Limiting the stats to the 88.8% of frames not covered by an utterance-[begin/end] phone, optional-silence SIL occupies 0.0% of frames.
[INFO]  Assuming 100 frames per second, the alignments represent 4.86 hours of data, or 4.62 hours if SIL frames are excluded.
[INFO]  Utterance-internal optional-silences SIL comprise 0.0% of utterance-internal phones, with duration (median, mean, 95-percentile) = (0, 0.0, 0).
[INFO]  Done training monophone system in C:/VoiceBridgeProjects\LibriSpeech\data\train_librispeech\mono0a.
[INFO]  num-pdfs 127
[INFO]  context-width 1
[INFO]  central-position 0
[INFO]  min weigth=-0.024336 max weigth=-0.0246625.
[INFO]  LG is not stochastic.
[INFO]  min weigth=-0.024336 max weigth=-0.0246625.
[INFO]  CLG is not stochastic.
[INFO]  min weigth=0.282973 max weigth=-0.070831.
[INFO]  HCLGa is not stochastic.
[INFO]  number of phones 342
[INFO]  number of pdfs 127
[INFO]  number of transition-ids 2172
[INFO]  number of transition-states 1046

[INFO]  Decoding language model tg...
[INFO]  Need to split the data directory. Splitting...
[ERROR]  Refusing to split data because number of speakers 4 is less than the number of output .scp files 8.
[ERROR] Decoding failed for language model tg


[INFO]  *****************
[INFO]  ****  ERROR! ****
[INFO]  *****************
@AI-TOOLKIT
Copy link
Owner

Hello Tom,
According to your computer's description the examples should work immediately and fine (nobody reported such a problem up to now). The test system for VoiceBridge was a 4 thread computer and thus if you have also such system then there can not be any problem unless you have modified something.
The output indicates that you have probably changed the number of threads to 8 instead of leaving the default 4 and this may be the problem.

In the examples VoiceBridge determines automatically the number of threads and the splitting. This is an improvement compared to Kaldi. There are also many more such improvements and thus the VoiceBridge implementation is quite different from Kaldi.
In the example code this is set with the following line of code: "int numthreads = concurentThreadsSupported;"

Br,
Zoltan

@tfburns
Copy link
Author

tfburns commented Oct 25, 2020

Hi Zoltan,
I didn't change anything so not sure why both examples fail to run.

@AI-TOOLKIT
Copy link
Owner

AI-TOOLKIT commented Oct 25, 2020

Hello Tom,
The line in your output "the number of output .scp files 8" indicates that there are too many scp files. There should be only 4. This may happen if you change the code and set 8 cores instead of 4 or maybe the folders have got clutter (maybe you copied the scp files manually into the same folder, etc.).
Restart your test with the original files (clean up or replace the example data folders). The examples should work fine on your system without any modification!

Update: I have just tested VoiceBridge on a similar system as yours with the YesNo example (TestYesNo()) and it works fine with the last lines of the output as follows:

...
[INFO] Set1: WER 2.09% +- 3.72%

[INFO] *****************
[INFO] **** ALL OK! ****
[INFO] *****************

Br,
Zoltan

@tfburns
Copy link
Author

tfburns commented Oct 26, 2020

Hi Zoltan,
Okay, that was it! I didn't modify the code but I must have copied and pasted files into an incorrect folder while following the installation instructions. Perhaps the readme can be made a bit clearer, e.g. showing full relative paths of where things are meant to go?
Thanks a lot and sorry for the trouble,
Tom

@tfburns tfburns closed this as completed Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants