Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joint training with new l2reg technique #4

Open
wants to merge 37 commits into
base: leaky-hmm-merge-xent
Choose a base branch
from

Conversation

pegahgh
Copy link

@pegahgh pegahgh commented Jan 29, 2016

No description provided.

Tim Hutt and others added 30 commits January 11, 2016 10:49
There were some stray tabs in the Windows INSTALL file.
Patch on windows requires CRLF. This should leave them unchanged.
Should be fixed in the windows_line_endings branch.
Previous version would only look in the root dir which obviously doesn't work.
The previous commits stop git mangling the file endings, but the file endings in the repository were LF, where they should have been CRLF.
It previously didn't mention that you need --enable-openblas (if using OpenBLAS).
arpa-file-parser.cc: Added a warning when declared count of n-grams is 0
const-arpa-lm.cc: Print an extended error message instead of simple ASSERT
Separated ARPA parsing from const LM construction
…n leaky-hmm (got rid of the 'special state').
@danpovey
Copy link
Owner

Pegah, development is now in the 'chain' branch in the official kaldi
repo. Could you please try to apply your changes to that one instead?
There have not been many code changes between those two branches.

Dan

On Thu, Jan 28, 2016 at 7:28 PM, pegahgh notifications@github.com wrote:


You can view, comment on, or merge this pull request online at:

#4
Commit Summary

  • addded new l2-regularization method, which regress chain output to
    be linear function of cross-entropy
  • Merge branch 'leaky-hmm-merge-xent' of
    https://github.com/danpovey/kaldi into joint-training-l2reg
  • some modification to new l2_regularization method
  • small fix to chain-training.cc

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#4.

// L = -0.5 * l2_regularize *
// \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2),
// where the target_ji = scale_i * xent_output_ji + offset_i.
// scale_i = \sum_j (nnet_output_ji * xent_output_ji) / \sum_j(xent_output_ji^2)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pegah, I don't think this equation is right. You are first optimizing w.r.t. 'scale' and then 'offset'-- they need to be optimized jointly.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, this is linear regression in one dimension. you could just look up the equations if you want.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm deleting your relevant jobs till this is fixed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear Dan
Hi
No, I didn't optimize first w.r.t scale and then offset. I jointly solved
them and I substitute offset as a function of scale in 1st equation and
solved scale.

On Thu, Jan 28, 2016 at 8:42 PM, Daniel Povey notifications@github.com
wrote:

In src/chain/chain-training.cc
#4 (comment):

  • BaseFloat scale = supervision.weight * opts.l2_regularize;
  • *l2_term = -0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans);
  • if (nnet_output_deriv)
  •  nnet_output_deriv->AddMat(-1.0 \* scale, nnet_output);
    
  • BaseFloat scale_coeff = supervision.weight * opts.l2_regularize;
  • // If xent_output provided, l2 penalty is trying to regress the chain output
  • // to be a linear function of cross-entropy output.
  • // It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x + offset - y)^2,
  • // where x is cross-entropy output and y is chain output.
  • if (xent_output) {
  •  //compute offset and scale
    
  •  // The objecitve is to minimize L w.r.t scale_i, offset_i,
    
  •  // L = -0.5 \* l2_regularize *
    
  •  //    \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2),
    
  •  // where the target_ji = scale_i \* xent_output_ji + offset_i.
    
  •  // scale_i = \sum_j (nnet_output_ji \* xent_output_ji) / \sum_j(xent_output_ji^2)
    

i'm deleting your relevant jobs till this is fixed.


Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/4/files#r51216219.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something doesn't seem right about the equation- it doesn't seem like it
has the correct shift invariance. I think you missed a term involving
scale * offset when you computed the derivatives and solved.
Dan

On Fri, Jan 29, 2016 at 2:11 AM, pegahgh notifications@github.com wrote:

In src/chain/chain-training.cc
#4 (comment):

  • BaseFloat scale = supervision.weight * opts.l2_regularize;
  • *l2_term = -0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans);
  • if (nnet_output_deriv)
  •  nnet_output_deriv->AddMat(-1.0 \* scale, nnet_output);
    
  • BaseFloat scale_coeff = supervision.weight * opts.l2_regularize;
  • // If xent_output provided, l2 penalty is trying to regress the chain output
  • // to be a linear function of cross-entropy output.
  • // It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x + offset - y)^2,
  • // where x is cross-entropy output and y is chain output.
  • if (xent_output) {
  •  //compute offset and scale
    
  •  // The objecitve is to minimize L w.r.t scale_i, offset_i,
    
  •  // L = -0.5 \* l2_regularize *
    
  •  //    \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2),
    
  •  // where the target_ji = scale_i \* xent_output_ji + offset_i.
    
  •  // scale_i = \sum_j (nnet_output_ji \* xent_output_ji) / \sum_j(xent_output_ji^2)
    

Dear Dan Hi No, I didn't optimize first w.r.t scale and then offset. I
jointly solved them and I substitute offset as a function of scale in 1st
equation and solved scale.
… <#-1527430838_>
On Thu, Jan 28, 2016 at 8:42 PM, Daniel Povey notifications@github.com
wrote: In src/chain/chain-training.cc <#4 (comment)
https://github.com/danpovey/kaldi/pull/4#discussion_r51216219>: > -
BaseFloat scale = supervision.weight * opts.l2_regularize; > - *l2_term =
-0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans); > - if
(nnet_output_deriv) > - nnet_output_deriv->AddMat(-1.0 * scale,
nnet_output); > + BaseFloat scale_coeff = supervision.weight *
opts.l2_regularize; > + // If xent_output provided, l2 penalty is trying to
regress the chain output > + // to be a linear function of cross-entropy
output. > + // It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x

  • offset - y)^2, > + // where x is cross-entropy output and y is chain
    output. > + if (xent_output) { > + //compute offset and scale > + // The
    objecitve is to minimize L w.r.t scale_i, offset_i, > + // L = -0.5 *
    l2_regularize * > + // \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji -
    target_ji)^2), > + // where the target_ji = scale_i * xent_output_ji +
    offset_i. > + // scale_i = \sum_j (nnet_output_ji * xent_output_ji) /
    \sum_j(xent_output_ji^2) i'm deleting your relevant jobs till this is
    fixed. — Reply to this email directly or view it on GitHub <
    https://github.com/danpovey/kaldi/pull/4/files#r51216219>.


Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/4/files#r51231663.

danpovey pushed a commit that referenced this pull request Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants