Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hainan's RNNLM setup #37

Open
wants to merge 186 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
186 commits
Select commit Hold shift + click to select a range
d0a06e3
in progress...
hainan-xv Sep 1, 2016
03f6bd8
change design - have 2 wordlists
hainan-xv Sep 5, 2016
1bae64a
can train RNNLMs
hainan-xv Sep 12, 2016
912770f
Added evaluation binary. Not tested yet
hainan-xv Sep 12, 2016
451e2a4
evaluation reveals problems...
hainan-xv Sep 21, 2016
1d4e7c8
binaries are working now
hainan-xv Sep 27, 2016
55c41b1
small chagne
hainan-xv Sep 27, 2016
f7181e4
Merge branch 'master' into rnnlm
hainan-xv Sep 27, 2016
3805d0f
fix some bugs
hainan-xv Oct 24, 2016
1ba7357
finally working
hainan-xv Oct 25, 2016
5212f54
implemented LSTM-LM
hainan-xv Oct 27, 2016
1b7b23b
aesthetic chagnes
hainan-xv Oct 28, 2016
2f22d48
more aesthetic chagnes
hainan-xv Oct 28, 2016
52e35c6
scripts are pretty and runnable
hainan-xv Oct 29, 2016
37dc6c5
ami lstm tuned - better than rnnlm
hainan-xv Oct 31, 2016
9ed83d6
divide egs into archives for rnnlm training
hainan-xv Oct 31, 2016
d00217c
add script for WSJ (untuned)
hainan-xv Nov 1, 2016
52652ec
add words per second diagnostics; add PPL with OOS penalties
hainan-xv Nov 2, 2016
0f85a2d
starting to write the new LmNnetCompute class
hainan-xv Nov 2, 2016
bd11e28
starting to write the new LmNnetCompute class, include new files
hainan-xv Nov 2, 2016
b12b1d0
add show-progress
hainan-xv Nov 3, 2016
2755d54
continue implementing RNNLM; change designs
hainan-xv Nov 4, 2016
49737e9
buggy rnnlm-init
hainan-xv Nov 4, 2016
e582e09
fix the memery but in LmNnet; rnnlm-init is working
hainan-xv Nov 4, 2016
e1b33cd
there is a bug in rnnlm-train; included testing scripts
hainan-xv Nov 4, 2016
7dd21ad
still debugging
hainan-xv Nov 4, 2016
3ea9ef3
still debugging...
hainan-xv Nov 7, 2016
5df77b4
realized that I used a bad design w.r.t. egs input projections... com…
hainan-xv Nov 7, 2016
75b323d
new design...
hainan-xv Nov 7, 2016
a4d40c9
now able to get derivatives w.r.t. input
hainan-xv Nov 7, 2016
e61de3c
can't find the bug...
hainan-xv Nov 7, 2016
7cfe475
found the bug. it seems to be working now
hainan-xv Nov 8, 2016
7e20e28
re-write the first layers with CPU; not finished yet
hainan-xv Nov 9, 2016
7acd96a
re-write the underlying Component class to use CPU matrices
hainan-xv Nov 10, 2016
cd2ea74
re-write the underlying Component class to use CPU matrices - added l…
hainan-xv Nov 10, 2016
f75e5c9
get rid of some redundant code
hainan-xv Nov 14, 2016
169fc0f
re-wrote backpropagation with sparse matrix; buggy
hainan-xv Nov 14, 2016
b23955d
back using GPU for output layers
hainan-xv Nov 15, 2016
42a3ac9
done with regular RNNLMs with the new design\!
hainan-xv Nov 16, 2016
86abb85
setup tested to be correct
hainan-xv Nov 17, 2016
e7c5d8a
linear componnet
hainan-xv Nov 19, 2016
03eabbd
linear componnet, make the config more flexible
hainan-xv Nov 20, 2016
147cf87
add rnnlm-compute-prob; add a lot of const to the classes for better …
hainan-xv Nov 21, 2016
5c4da99
fix bugs in previous commit
hainan-xv Nov 21, 2016
ecb4e7f
small changes
hainan-xv Nov 22, 2016
063b140
starting the sample-affine-logsoftmax-component
hainan-xv Nov 27, 2016
9299526
add sampling and forward code
hainan-xv Nov 28, 2016
2b345ef
add code for back-prob; so-far only implemented on CPU for easier deb…
hainan-xv Nov 29, 2016
0d1eba4
changes in architecture
hainan-xv Dec 5, 2016
2352a5b
new based classes finished
hainan-xv Dec 5, 2016
3a16b1f
runable but getting nan's
hainan-xv Dec 5, 2016
a4d32f8
still getting nan's but I think I know why
hainan-xv Dec 6, 2016
d20661b
done with the normalized layer; to run on data
hainan-xv Dec 7, 2016
0e8e5b5
runnable; getting nan's
hainan-xv Dec 7, 2016
3b44e8b
re-write in cu-matrix
hainan-xv Dec 9, 2016
78eb7c1
preparing for reviews
hainan-xv Dec 19, 2016
d7c5a73
to review
hainan-xv Dec 19, 2016
02f0498
fixed some small bugs; no *unexpected* nan's anymore
hainan-xv Dec 21, 2016
e18eb37
Merge branch 'master' of https://github.com/hainan-xv/kaldi
hainan-xv Dec 21, 2016
0438594
Merge branch 'master' into my-rnnlm
hainan-xv Dec 21, 2016
14d0bd6
add per-comp-max-change code; still not training correctly
hainan-xv Dec 21, 2016
2c12ba5
re-design some of the code
hainan-xv Dec 22, 2016
69510ca
more changes
hainan-xv Dec 26, 2016
131c8d9
add code for coverting prob vectors
hainan-xv Dec 27, 2016
07e0a9b
fix a bug when a correct output label appears more than once in a min…
hainan-xv Dec 27, 2016
ff1ab0e
pipeline draft finished
hainan-xv Dec 27, 2016
b3f0c08
running now
hainan-xv Dec 28, 2016
d9677a3
fix rand() issue
hainan-xv Dec 28, 2016
0670dc5
sorting before sampling; fix Rand() issue for Windows
hainan-xv Dec 29, 2016
22a84c6
adding the unigram code; made it worse...
hainan-xv Dec 30, 2016
1e555ee
cleaned a bit
hainan-xv Dec 30, 2016
167dfd9
test ideas...
hainan-xv Dec 30, 2016
c1c44d6
test ideas; add some comment for NormalizeVec()
hainan-xv Dec 30, 2016
47b9fa0
new linear transform added which helped PPL; add more comments for th…
hainan-xv Dec 30, 2016
0ff3342
the normalized-layer method works like magic
hainan-xv Jan 2, 2017
53d2b63
fixed small bugs
hainan-xv Jan 3, 2017
2d2dfe2
add test code for SamplingWithoutReplacement()
Dec 31, 2016
b38eb6a
modify code to ensure the test won't terminates if the convergence te…
Jan 1, 2017
3580ed9
add a normalize-to-one component
hainan-xv Jan 4, 2017
ce2af52
restructured duplicate code; add test for k = 1, 2, and n
Jan 5, 2017
cd110a0
use pointer instead of reference in function args; minor changes like…
Jan 5, 2017
e245a3d
applied several tricks to the normalized-component method; getting ve…
hainan-xv Jan 7, 2017
34e78a1
sigmoid + normalize trick
hainan-xv Jan 8, 2017
ed64185
more tricks...
hainan-xv Jan 9, 2017
95a6f96
Merge pull request #1 from keli78/my-rnnlm
hainan-xv Jan 10, 2017
8194443
tuning sampling method; found small bug in sampling code
hainan-xv Jan 11, 2017
0540dd7
fix sampling bug
hainan-xv Jan 11, 2017
d0bffb9
major improvement on the sampling method
hainan-xv Jan 11, 2017
bc21318
fix one hidden issue in the sampling code caused by numerical problem…
hainan-xv Jan 12, 2017
a7fb871
make rnnlm-compute-probs work; can report train and dev PPLs (normali…
hainan-xv Jan 13, 2017
a3d19ca
writing rnnlm-show-progress (in progress still)
hainan-xv Jan 17, 2017
3e574ae
rnnlm-show-progress, seg fault
hainan-xv Jan 17, 2017
33a004c
rnnlm-compute-probs working
hainan-xv Jan 18, 2017
ebab9f7
supporting new objf with no sampling now
hainan-xv Jan 19, 2017
3a664b9
rnnlm merge with shortcut
hainan-xv Jan 20, 2017
f9e81b8
some changes to make it more efficient
hainan-xv Jan 21, 2017
1795a5c
cuda kernels for sparse matrix affine forward/backward prop
freewym Jan 24, 2017
ae9b986
Merge pull request #4 from freewym/shortcut2
hainan-xv Jan 24, 2017
6a43ff7
use binary search instead of linear search in sampling; fix a bug in …
hainan-xv Jan 24, 2017
1a8536c
fix a bug where input matrix to nnet3 in rnnlm is uninitialized
hainan-xv Jan 27, 2017
3b70acf
rebuild comparable baseline
hainan-xv Jan 27, 2017
2af0e93
add NaturalGradientLinearComponent
hainan-xv Jan 28, 2017
1142e60
natural gradient on rnnlm
hainan-xv Jan 28, 2017
c79e1b1
aesthetic changes
hainan-xv Jan 30, 2017
09b35fd
more aesthetic changes
hainan-xv Feb 3, 2017
62c48b2
sample a word version1 (IO is written by myself)
Jan 23, 2017
7ae1bff
sample a word version 2 (use fst Symbol table and kaldi io)
Feb 8, 2017
dd8aea9
more aesthetic changes; add test code
hainan-xv Feb 8, 2017
d6ae8e7
add natural gradient for the output layer
hainan-xv Feb 20, 2017
799f355
fix a bug with max-change for the in/out layers
hainan-xv Feb 21, 2017
0f0870b
tuning
hainan-xv Feb 26, 2017
b49997a
adversarial training
freewym Feb 1, 2017
a6adf89
added NnetComputer object to adversarial step to accout for the code …
freewym Feb 1, 2017
e8df0c5
Travis re-check
freewym Feb 1, 2017
0aa1bc5
fix
freewym Feb 2, 2017
7c23ceb
fix2
freewym Feb 2, 2017
bac0a08
added natural gradient freezing code
freewym Feb 3, 2017
8fe4b95
fix3
freewym Feb 4, 2017
c2e65e0
fix4
freewym Feb 4, 2017
6b074f2
added option adversarial-training-prob; changes according to the last…
freewym Feb 18, 2017
e3da23c
assert fix
freewym Feb 18, 2017
2c2312b
changes from < to <= for adversarial_training_prob to make it more re…
freewym Feb 18, 2017
511e136
change from adversarial-training-prob to adversarial-trainining-inter…
freewym Feb 19, 2017
6ea1135
2-stage sampling works
hainan-xv Mar 3, 2017
049c437
try out non-natural affine for LM
hainan-xv Mar 3, 2017
69e328b
merge with latest master
hainan-xv Mar 3, 2017
6c5da3f
merge wangs update method with rnnlm
hainan-xv Mar 3, 2017
5967298
rnnlm + wang's update
hainan-xv Mar 3, 2017
bcba8fb
unigram initialization for non-natural gradient layer
hainan-xv Mar 7, 2017
b5bdf3c
new idea about grouping...; back the old code
hainan-xv Mar 13, 2017
2268dad
grouping code finished but undebugged
hainan-xv Mar 13, 2017
868cf8c
grouping with bigram done
hainan-xv Mar 14, 2017
c8baae7
to add through test code for 2-stage sampling
hainan-xv Mar 20, 2017
dd777c0
add test for DoGroupingCDF
hainan-xv Mar 21, 2017
efcf0bb
generate samples in egs
hainan-xv Mar 27, 2017
1fdc779
generate samples in egs, seems to be working
hainan-xv Mar 27, 2017
2fcb4f8
using double in sampling to prevent issues
hainan-xv Mar 28, 2017
04996e8
fix a bug in computing weights of histories
Mar 28, 2017
1190b3d
rnnlm training draft finished
hainan-xv Apr 3, 2017
62e5f9b
Add history-weight test
Apr 6, 2017
03ec444
fixed Makefile
freewym Apr 6, 2017
60608b4
Merge pull request #9 from freewym/wangs-update
hainan-xv Apr 6, 2017
12d387a
debugging...
hainan-xv Apr 12, 2017
eb37f98
Merge branch 'wangs-update' of https://github.com/hainan-xv/kaldi int…
hainan-xv Apr 12, 2017
f9db3f3
a natural gradient bug... reasons yet to find
hainan-xv Apr 12, 2017
f92dd96
get rid of alternative classes
hainan-xv Apr 17, 2017
e6d9d19
get rid of LmComponent base class
hainan-xv Apr 17, 2017
77c7306
fix bug in sampling
hainan-xv Apr 18, 2017
045398c
fix more bugs in sampling
hainan-xv Apr 18, 2017
d61ff54
new sampling code passes tests
hainan-xv Apr 18, 2017
26a2a11
fixed the sampling bug; pipeline working in easy conditions
hainan-xv Apr 19, 2017
4fdde14
fixed a bug in nnet-example io functions
hainan-xv Apr 19, 2017
8403b67
still buggy...
hainan-xv Apr 19, 2017
d1202d3
fixed more bugs
hainan-xv Apr 20, 2017
b2d4bac
still cant find the bug...
hainan-xv Apr 21, 2017
38ef328
got rid of normalizeonecomponent
hainan-xv Apr 24, 2017
12f119e
add interface for lattice-rescoring
hainan-xv Apr 25, 2017
2103485
Add ComputeOutputWords function; remove auto; remove histories and hi…
Apr 28, 2017
449c15b
Merge branch 'wangs-update' into rnnlm-shortcut
keli78 Apr 28, 2017
af2a4c2
add test code for LmOutputComponent
hainan-xv Apr 28, 2017
7de1e33
adding the ComputeLogProb interface to LmOutputComponent
hainan-xv Apr 28, 2017
9506f72
Merge pull request #8 from keli78/rnnlm-shortcut
hainan-xv Apr 28, 2017
af29d26
found bugs in natural gradient init/IO code
hainan-xv May 2, 2017
9fc7168
change the conditionlogprob function to be const
hainan-xv May 3, 2017
f23806c
fixed a small bug; found a big bug; will fix by tmr
hainan-xv May 4, 2017
4c4a6c9
fixed the bug. seems to be working
hainan-xv May 4, 2017
e84e403
turning off preconditioning on the output side
hainan-xv May 6, 2017
7fa005c
fix bug in ComputeLogprobOfWordGivenHistory which didn't consider the…
hainan-xv May 6, 2017
ef6a357
change how backprop is done in training; now call backprop only once
hainan-xv May 7, 2017
9bddfc5
merge with latest master
hainan-xv May 9, 2017
595bd10
nnet3 rnnlm rescoring
freewym May 1, 2017
265b9fa
fix
freewym May 4, 2017
799111f
fix Makefile
freewym May 5, 2017
7de184e
bug fix
freewym May 6, 2017
c1d5f0c
bug fix related to zero-size of an internal matrix
freewym May 7, 2017
9f88f56
more efficient implementation
freewym May 10, 2017
e1925eb
add assert
freewym May 10, 2017
ffc8a11
Merge pull request #11 from freewym/wangs-update
hainan-xv May 10, 2017
fad5050
change the nneteg to store samples as an IO
hainan-xv May 11, 2017
d95a352
merge with kaldi_52
hainan-xv May 11, 2017
d6b1e4c
merge with kaldi_52
hainan-xv May 11, 2017
ba5b446
fixed the scripts; working now
hainan-xv May 11, 2017
7e179be
script tested
hainan-xv May 12, 2017
fb4eb99
lattice-rescoring support seperate wordlists for input and output
hainan-xv May 12, 2017
c41bbc9
should work now
hainan-xv May 17, 2017
388fcf5
ready
hainan-xv May 17, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions egs/ami/s5/local/nnet3-rnnlm/run-lstm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
#!/bin/bash

train_text=data/sdm1/train/text
dev_text=data/sdm1/dev/text

type=lstm

num_words_in=10000
num_words_out=10000

stage=-100
sos="<s>"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is typically called bos, not sos.

eos="</s>"
oos="<oos>"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume this means out of set. maybe best to clarify via a comment?


max_param_change=20
num_iters=30

shuffle_buffer_size=5000 # This "buffer_size" variable controls randomization of the samples
minibatch_size=64

initial_learning_rate=0.01
final_learning_rate=0.0005
learning_rate_decline_factor=1.1

num_lstm_layers=1
cell_dim=64
hidden_dim=256
recurrent_projection_dim=0
non_recurrent_projection_dim=64
norm_based_clipping=true
clipping_threshold=30
label_delay=0 # 5
splice_indexes=0

. cmd.sh
. path.sh
. parse_options.sh || exit 1;

outdir=data/sdm1/lstm-$initial_learning_rate-$final_learning_rate-$learning_rate_decline_factor-$minibatch_size-$type
srcdir=data/local/dict

set -e

mkdir -p $outdir

if [ $stage -le -4 ]; then
cat $srcdir/lexicon.txt | awk '{print $1}' | grep -v -w '!SIL' > $outdir/wordlist.all

cat $train_text | awk -v w=$outdir/wordlist.all \
'BEGIN{while((getline<w)>0) v[$1]=1;}
{for (i=2;i<=NF;i++) if ($i in v) printf $i" ";else printf "<unk> ";print ""}'|sed 's/ $//g' \
| shuf --random-source=$train_text > $outdir/train.txt.0
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't rely on shuf, it's not always installed and we don't like adding dependencies. we typically use utils/shuffle_list.pl


cat $dev_text | awk -v w=$outdir/wordlist.all \
'BEGIN{while((getline<w)>0) v[$1]=1;}
{for (i=2;i<=NF;i++) if ($i in v) printf $i" ";else printf "<unk> ";print ""}'|sed 's/ $//g' \
| shuf --random-source=$dev_text > $outdir/dev.txt.0

cat $outdir/train.txt.0 $outdir/wordlist.all | sed "s= =\n=g" | grep . | sort | uniq -c | sort -k1 -n -r | awk '{print $2,$1}' > $outdir/unigramcounts.txt

echo $sos 0 > $outdir/wordlist.in
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE the fact that you use the same symbol for both the BOS and EOS symbols...
at some point we may want to revisit this. The issue is that it's common in RNNLM stuff to share/tie the input and output parameter matrices. In this case it may be desirable to have separate symbols for BOS and EOS- otherwise the model is forced to share their representation, which might not be very ideal.

echo $oos 1 >> $outdir/wordlist.in
cat $outdir/unigramcounts.txt | head -n $num_words_in | awk '{print $1,1+NR}' >> $outdir/wordlist.in

echo $eos 0 > $outdir/wordlist.out
echo $oos 1 >> $outdir/wordlist.out

cat $outdir/unigramcounts.txt | head -n $num_words_out | awk '{print $1,1+NR}' >> $outdir/wordlist.out

cat $outdir/train.txt.0 | awk -v sos="$sos" -v eos="$eos" '{print sos,$0,eos}' > $outdir/train.txt
cat $outdir/dev.txt.0 | awk -v sos="$sos" -v eos="$eos" '{print sos,$0,eos}' > $outdir/dev.txt
fi

num_words_in=`wc -l $outdir/wordlist.in | awk '{print $1}'`
num_words_out=`wc -l $outdir/wordlist.out | awk '{print $1}'`

if [ $stage -le -3 ]; then
rnnlm-get-egs $outdir/train.txt $outdir/wordlist.in $outdir/wordlist.out ark,t:$outdir/egs
fi

if [ $stage -le -2 ]; then

steps/rnnlm/make_lstm_configs.py \
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll want to have a more xconfig-based mechanism for this.

--splice-indexes "$splice_indexes " \
--num-lstm-layers $num_lstm_layers \
--feat-dim $num_words_in \
--cell-dim $cell_dim \
--hidden-dim $hidden_dim \
--recurrent-projection-dim $recurrent_projection_dim \
--non-recurrent-projection-dim $non_recurrent_projection_dim \
--norm-based-clipping $norm_based_clipping \
--clipping-threshold $clipping_threshold \
--num-targets $num_words_out \
--label-delay $label_delay \
$outdir/configs || exit 1;

fi

if [ $stage -le 0 ]; then
nnet3-init --binary=false $outdir/configs/layer1.config $outdir/0.mdl
fi


cat data/local/dict/lexicon.txt | awk '{print $1}' > $outdir/wordlist.all.1
cat $outdir/wordlist.in $outdir/wordlist.out | awk '{print $1}' > $outdir/wordlist.all.2
cat $outdir/wordlist.all.[12] | sort -u > $outdir/wordlist.all
#rm $outdir/wordlist.all.[12]
cp $outdir/wordlist.all $outdir/wordlist.rnn
touch $outdir/unk.probs

mkdir -p $outdir/log/
if [ $stage -le $num_iters ]; then
start=1
# if [ $stage -gt 1 ]; then
# start=$stage
# fi
learning_rate=$initial_learning_rate

for n in `seq $start $num_iters`; do
echo for iter $n, learning rate is $learning_rate
[ $n -ge $stage ] && (
$cuda_cmd $outdir/log/train.rnnlm.$n.log nnet3-train \
--max-param-change=$max_param_change "nnet3-copy --learning-rate=$learning_rate $outdir/$[$n-1].mdl -|" \
"ark:nnet3-shuffle-egs --buffer-size=$shuffle_buffer_size --srand=$n ark:$outdir/egs ark:- | nnet3-merge-egs --minibatch-size=$minibatch_size ark:- ark:- |" $outdir/$n.mdl
)

learning_rate=`echo $learning_rate | awk -v d=$learning_rate_decline_factor '{printf("%f", $1/d)}'`
if (( $(echo "$final_learning_rate > $learning_rate" |bc -l) )); then
learning_rate=$final_learning_rate
fi

[ $n -ge $stage ] && (
nw=`wc -l $outdir/wordlist.all | awk '{print $1 - 3}'` # <s>, </s>, <oos>
nw=`wc -l $outdir/wordlist.all | awk '{print $1}'` # <s>, </s>, <oos>
# nw=`wc -l data/sdm1/cued_rnn_ce_1/unigram.counts | awk '{print $1}'`
# $decode_cmd $outdir/dev.ppl.$n.log rnnlm-eval $outdir/$n.mdl $outdir/wordlist.in $outdir/wordlist.out $outdir/dev.txt $outdir/dev-probs-iter-$n.txt
echo $decode_cmd $outdir/dev.ppl.$n.log rnnlm-eval --num-words=$nw $outdir/$n.mdl $outdir/wordlist.in $outdir/wordlist.out $outdir/dev.txt $outdir/dev-probs-iter-$n.txt
$decode_cmd $outdir/dev.ppl.$n.log rnnlm-eval --num-words=$nw $outdir/$n.mdl $outdir/wordlist.in $outdir/wordlist.out $outdir/dev.txt $outdir/dev-probs-iter-$n.txt
nw=`cat $outdir/dev.txt | awk '{a+=NF-1}END{print a}' `
to_cost=`cat $outdir/dev-probs-iter-$n.txt | awk '{a+=$1}END{print -a}'`
ppl=`echo $to_cost $nw | awk '{print exp($1/$2)}'`
echo DEV PPL on model $n.mdl is $ppl | tee $outdir/log/dev.ppl.$n.txt
) &
done
cp $outdir/$num_iters.mdl $outdir/rnnlm
fi

./local/rnnlm/run-rescoring.sh --rnndir $outdir/ --type $type
35 changes: 35 additions & 0 deletions egs/ami/s5/local/nnet3-rnnlm/run-rescoring.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash

mic=sdm1
n=50
ngram_order=4
rnndir=data/nnet3_rnnlm_200_256_0
id=rnn

. ./utils/parse_options.sh
. ./cmd.sh
. ./path.sh

set -e

local/nnet3-rnnlm/run-rnnlm-train.sh --use-gpu yes --stage -20 --num-iters 160

[ ! -f $rnndir/rnnlm ] && echo "Can't find RNNLM model" && exit 1;

final_lm=ami_fsh.o3g.kn
LM=$final_lm.pr1-7

for decode_set in dev eval; do
( dir=exp/$mic/nnet3/tdnn_sp/
decode_dir=${dir}/decode_${decode_set}

steps/lmrescore_rnnlm_lat.sh \
--cmd "$decode_cmd --mem 16G" \
--rnnlm-ver nnet3rnnlm --weight 0.5 --max-ngram-order $ngram_order \
data/lang_$LM $rnndir \
data/$mic/${decode_set}_hires ${decode_dir} \
${decode_dir}.rnnlm.lat.${ngram_order}gram
) &
done

wait
Loading