Joint training with new l2reg technique #4

pegahgh · 2016-01-29T00:28:49Z

No description provided.

There were some stray tabs in the Windows INSTALL file.

Annoyingly, github's zip download doesn't seem to preserve line endings. See http://stackoverflow.com/questions/17347611/downloading-a-zip-from-github-removes-newlines-from-text-files

Patch on windows requires CRLF. This should leave them unchanged.

Should be fixed in the windows_line_endings branch.

Previous version would only look in the root dir which obviously doesn't work.

The previous commits stop git mangling the file endings, but the file endings in the repository were LF, where they should have been CRLF.

…ns, to work for larger matrices.

Windows line endings

Windows docs

It previously didn't mention that you need --enable-openblas (if using OpenBLAS).

…s the most recent commit).

arpa-file-parser.cc: Added a warning when declared count of n-grams is 0 const-arpa-lm.cc: Print an extended error message instead of simple ASSERT

Separated ARPA parsing from const LM construction

Windows docs

…zation).

…n leaky-hmm (got rid of the 'special state').

… linear function of cross-entropy

into joint-training-l2reg

danpovey · 2016-01-29T00:44:43Z

Pegah, development is now in the 'chain' branch in the official kaldi
repo. Could you please try to apply your changes to that one instead?
There have not been many code changes between those two branches.

Dan

On Thu, Jan 28, 2016 at 7:28 PM, pegahgh notifications@github.com wrote:

You can view, comment on, or merge this pull request online at:

#4
Commit Summary

addded new l2-regularization method, which regress chain output to
be linear function of cross-entropy

Merge branch 'leaky-hmm-merge-xent' of
https://github.com/danpovey/kaldi into joint-training-l2reg

some modification to new l2_regularization method

small fix to chain-training.cc

File Changes

M src/chain/chain-training.cc
https://github.com/danpovey/kaldi/pull/4/files#diff-0 (55)

M src/chain/chain-training.h
https://github.com/danpovey/kaldi/pull/4/files#diff-1 (1)

M src/cudamatrix/cu-kernels-ansi.h
https://github.com/danpovey/kaldi/pull/4/files#diff-2 (2)

M src/cudamatrix/cu-kernels.cu
https://github.com/danpovey/kaldi/pull/4/files#diff-3 (16)

M src/cudamatrix/cu-kernels.h
https://github.com/danpovey/kaldi/pull/4/files#diff-4 (2)

M src/cudamatrix/cu-matrix.cc
https://github.com/danpovey/kaldi/pull/4/files#diff-5 (2)

M src/cudamatrix/cu-matrix.h
https://github.com/danpovey/kaldi/pull/4/files#diff-6 (2)

M src/cudamatrix/cu-vector.cc
https://github.com/danpovey/kaldi/pull/4/files#diff-7 (18)

M src/cudamatrix/cu-vector.h
https://github.com/danpovey/kaldi/pull/4/files#diff-8 (2)

M src/matrix/kaldi-matrix.cc
https://github.com/danpovey/kaldi/pull/4/files#diff-9 (2)

M src/matrix/kaldi-matrix.h
https://github.com/danpovey/kaldi/pull/4/files#diff-10 (2)

M src/nnet3/nnet-chain-diagnostics.cc
https://github.com/danpovey/kaldi/pull/4/files#diff-11 (18)

M src/nnet3/nnet-chain-training.cc
https://github.com/danpovey/kaldi/pull/4/files#diff-12 (15)

Patch Links:

https://github.com/danpovey/kaldi/pull/4.patch

https://github.com/danpovey/kaldi/pull/4.diff

—
Reply to this email directly or view it on GitHub
#4.

…-training-l2reg

danpovey · 2016-01-29T01:23:18Z

src/chain/chain-training.cc

+      // L = -0.5 * l2_regularize * 
+      //    \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2),
+      // where the target_ji = scale_i * xent_output_ji + offset_i. 
+      // scale_i = \sum_j (nnet_output_ji * xent_output_ji) / \sum_j(xent_output_ji^2)


Pegah, I don't think this equation is right. You are first optimizing w.r.t. 'scale' and then 'offset'-- they need to be optimized jointly.

btw, this is linear regression in one dimension. you could just look up the equations if you want.

i'm deleting your relevant jobs till this is fixed.

Dear Dan
Hi
No, I didn't optimize first w.r.t scale and then offset. I jointly solved
them and I substitute offset as a function of scale in 1st equation and
solved scale.

On Thu, Jan 28, 2016 at 8:42 PM, Daniel Povey notifications@github.com
wrote:

In src/chain/chain-training.cc
#4 (comment):

BaseFloat scale = supervision.weight * opts.l2_regularize;

*l2_term = -0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans);

if (nnet_output_deriv)

nnet_output_deriv->AddMat(-1.0 \* scale, nnet_output);

BaseFloat scale_coeff = supervision.weight * opts.l2_regularize;

// If xent_output provided, l2 penalty is trying to regress the chain output

// to be a linear function of cross-entropy output.

// It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x + offset - y)^2,

// where x is cross-entropy output and y is chain output.

if (xent_output) {

//compute offset and scale

// The objecitve is to minimize L w.r.t scale_i, offset_i,

// L = -0.5 \* l2_regularize *

// \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2),

// where the target_ji = scale_i \* xent_output_ji + offset_i.

// scale_i = \sum_j (nnet_output_ji \* xent_output_ji) / \sum_j(xent_output_ji^2)

i'm deleting your relevant jobs till this is fixed.

—
Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/4/files#r51216219.

Something doesn't seem right about the equation- it doesn't seem like it
has the correct shift invariance. I think you missed a term involving
scale * offset when you computed the derivatives and solved.
Dan

On Fri, Jan 29, 2016 at 2:11 AM, pegahgh notifications@github.com wrote:

In src/chain/chain-training.cc
#4 (comment):

BaseFloat scale = supervision.weight * opts.l2_regularize;

*l2_term = -0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans);

if (nnet_output_deriv)

nnet_output_deriv->AddMat(-1.0 \* scale, nnet_output);

BaseFloat scale_coeff = supervision.weight * opts.l2_regularize;

// If xent_output provided, l2 penalty is trying to regress the chain output

// to be a linear function of cross-entropy output.

// It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x + offset - y)^2,

// where x is cross-entropy output and y is chain output.

if (xent_output) {

//compute offset and scale

// The objecitve is to minimize L w.r.t scale_i, offset_i,

// L = -0.5 \* l2_regularize *

// \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2),

// where the target_ji = scale_i \* xent_output_ji + offset_i.

// scale_i = \sum_j (nnet_output_ji \* xent_output_ji) / \sum_j(xent_output_ji^2)

Dear Dan Hi No, I didn't optimize first w.r.t scale and then offset. I
jointly solved them and I substitute offset as a function of scale in 1st
equation and solved scale.
… <#-1527430838_>
On Thu, Jan 28, 2016 at 8:42 PM, Daniel Povey notifications@github.com
wrote: In src/chain/chain-training.cc <#4 (comment)
https://github.com/danpovey/kaldi/pull/4#discussion_r51216219>: > -
BaseFloat scale = supervision.weight * opts.l2_regularize; > - *l2_term =
-0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans); > - if
(nnet_output_deriv) > - nnet_output_deriv->AddMat(-1.0 * scale,
nnet_output); > + BaseFloat scale_coeff = supervision.weight *
opts.l2_regularize; > + // If xent_output provided, l2 penalty is trying to
regress the chain output > + // to be a linear function of cross-entropy
output. > + // It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x

offset - y)^2, > + // where x is cross-entropy output and y is chain
output. > + if (xent_output) { > + //compute offset and scale > + // The
objecitve is to minimize L w.r.t scale_i, offset_i, > + // L = -0.5 *
l2_regularize * > + // \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji -
target_ji)^2), > + // where the target_ji = scale_i * xent_output_ji +
offset_i. > + // scale_i = \sum_j (nnet_output_ji * xent_output_ji) /
\sum_j(xent_output_ji^2) i'm deleting your relevant jobs till this is
fixed. — Reply to this email directly or view it on GitHub <
https://github.com/danpovey/kaldi/pull/4/files#r51216219>.

—
Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/4/files#r51231663.

from master

Tim Hutt and others added 30 commits January 11, 2016 10:49

Tabs to spaces

c32b235

There were some stray tabs in the Windows INSTALL file.

Add note about patch failing due to line endings

cc50db7

Annoyingly, github's zip download doesn't seem to preserve line endings. See http://stackoverflow.com/questions/17347611/downloading-a-zip-from-github-removes-newlines-from-text-files

Don't convert patch file line endings to LF

1aac4b2

Patch on windows requires CRLF. This should leave them unchanged.

Removed note about git archive mangling patch line endings

40136da

Should be fixed in the windows_line_endings branch.

Don't mangle patch file line endings in all directories

50d8431

Previous version would only look in the root dir which obviously doesn't work.

Change line endings in Windows patch file to CRLF

57e7f78

The previous commits stop git mangling the file endings, but the file endings in the repository were LF, where they should have been CRLF.

fix to how CUDA block and grid sizes are computed for common operatio…

155dbc0

…ns, to work for larger matrices.

Merge pull request kaldi-asr#444 from Timmmm/windows_line_endings

1017986

Windows line endings

Merge pull request kaldi-asr#445 from Timmmm/windows_docs

ad004c4

Windows docs

Clarify the generate_solution.pl command in the Windows INSTALL

138a9c8

It previously didn't mention that you need --enable-openblas (if using OpenBLAS).

Formatting clean up and convert INSTALL to markdown

98e45e2

Rename INSTALL to INSTALL.md so github renders it.

1ee5616

Github markdown requires more spaces

4456ab0

minor documentation change

1698b39

bug fix to a rather old script: get_lda_block.sh (in fact this revert…

83b94ae

…s the most recent commit).

Separated ARPA parsing from const LM construction

e0cbd32

Clarify that MLK and OpenBLAS are alternatives.

9abd21c

Improved error messages for ARPA file parsing

6265183

arpa-file-parser.cc: Added a warning when declared count of n-grams is 0 const-arpa-lm.cc: Print an extended error message instead of simple ASSERT

Changes per @danpovey's review in kaldi-asr#458

1aab0b6

Merge pull request kaldi-asr#458 from kkm000/arpa-1

61a551d

Separated ARPA parsing from const LM construction

Merge pull request kaldi-asr#448 from Timmmm/windows_docs

549af84

Windows docs

chain branch: Adding results for 4v (regarding cross-entropy regulari…

c3fedd1

…zation).

Code simplification and cleanup that was enabled by the implementatio…

ffcb552

…n leaky-hmm (got rid of the 'special state').

chain branch: add sorting on num-transitions (for very tiny speedup).

4d42ea2

chain branch: adding results to a script; a couple of new scripts.

ca2772b

chain branch: script edits with new results shown.

3342530

Merge branch 'master' into chain

2f585de

change to arpa-file-parser.cc to suppress spurious compiler warning

e46406b

chain branch: various new tuning-scripts for chain model.

3431b74

chain branch: more Switchboard tuning scripts, with results.

8676636

danpovey and others added 5 commits January 28, 2016 16:08

chain models: add results from tuning scripts

37261b5

addded new l2-regularization method, which regress chain output to be…

eb6d9de

… linear function of cross-entropy

Merge branch 'leaky-hmm-merge-xent' of https://github.com/danpovey/kaldi

af380cd

into joint-training-l2reg

some modification to new l2_regularization method

188824f

small fix to chain-training.cc

8269e43

Merge branch 'chain' of https://github.com/kaldi-asr/kaldi into joint…

c427693

…-training-l2reg

danpovey reviewed Jan 29, 2016
View reviewed changes

fixed scale equation

a4a0cfb

danpovey pushed a commit that referenced this pull request Nov 7, 2019

Merge pull request #4 from kaldi-asr/master

c7eb4c5

from master

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joint training with new l2reg technique #4

Joint training with new l2reg technique #4

pegahgh commented Jan 29, 2016

danpovey commented Jan 29, 2016

danpovey Jan 29, 2016

danpovey Jan 29, 2016

danpovey Jan 29, 2016

pegahgh Jan 29, 2016

danpovey Jan 29, 2016

Joint training with new l2reg technique #4

Are you sure you want to change the base?

Joint training with new l2reg technique #4

Conversation

pegahgh commented Jan 29, 2016

danpovey commented Jan 29, 2016

danpovey Jan 29, 2016

Choose a reason for hiding this comment

danpovey Jan 29, 2016

Choose a reason for hiding this comment

danpovey Jan 29, 2016

Choose a reason for hiding this comment

pegahgh Jan 29, 2016

Choose a reason for hiding this comment

danpovey Jan 29, 2016

Choose a reason for hiding this comment