-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joint training with new l2reg technique #4
base: leaky-hmm-merge-xent
Are you sure you want to change the base?
Joint training with new l2reg technique #4
Conversation
There were some stray tabs in the Windows INSTALL file.
Annoyingly, github's zip download doesn't seem to preserve line endings. See http://stackoverflow.com/questions/17347611/downloading-a-zip-from-github-removes-newlines-from-text-files
Patch on windows requires CRLF. This should leave them unchanged.
Should be fixed in the windows_line_endings branch.
Previous version would only look in the root dir which obviously doesn't work.
The previous commits stop git mangling the file endings, but the file endings in the repository were LF, where they should have been CRLF.
…ns, to work for larger matrices.
Windows line endings
Windows docs
It previously didn't mention that you need --enable-openblas (if using OpenBLAS).
…s the most recent commit).
arpa-file-parser.cc: Added a warning when declared count of n-grams is 0 const-arpa-lm.cc: Print an extended error message instead of simple ASSERT
Separated ARPA parsing from const LM construction
Windows docs
…n leaky-hmm (got rid of the 'special state').
… linear function of cross-entropy
Pegah, development is now in the 'chain' branch in the official kaldi Dan On Thu, Jan 28, 2016 at 7:28 PM, pegahgh notifications@github.com wrote:
|
…-training-l2reg
// L = -0.5 * l2_regularize * | ||
// \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2), | ||
// where the target_ji = scale_i * xent_output_ji + offset_i. | ||
// scale_i = \sum_j (nnet_output_ji * xent_output_ji) / \sum_j(xent_output_ji^2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pegah, I don't think this equation is right. You are first optimizing w.r.t. 'scale' and then 'offset'-- they need to be optimized jointly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, this is linear regression in one dimension. you could just look up the equations if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm deleting your relevant jobs till this is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear Dan
Hi
No, I didn't optimize first w.r.t scale and then offset. I jointly solved
them and I substitute offset as a function of scale in 1st equation and
solved scale.
On Thu, Jan 28, 2016 at 8:42 PM, Daniel Povey notifications@github.com
wrote:
In src/chain/chain-training.cc
#4 (comment):
- BaseFloat scale = supervision.weight * opts.l2_regularize;
- *l2_term = -0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans);
- if (nnet_output_deriv)
nnet_output_deriv->AddMat(-1.0 \* scale, nnet_output);
- BaseFloat scale_coeff = supervision.weight * opts.l2_regularize;
- // If xent_output provided, l2 penalty is trying to regress the chain output
- // to be a linear function of cross-entropy output.
- // It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x + offset - y)^2,
- // where x is cross-entropy output and y is chain output.
- if (xent_output) {
//compute offset and scale
// The objecitve is to minimize L w.r.t scale_i, offset_i,
// L = -0.5 \* l2_regularize *
// \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2),
// where the target_ji = scale_i \* xent_output_ji + offset_i.
// scale_i = \sum_j (nnet_output_ji \* xent_output_ji) / \sum_j(xent_output_ji^2)
i'm deleting your relevant jobs till this is fixed.
—
Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/4/files#r51216219.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something doesn't seem right about the equation- it doesn't seem like it
has the correct shift invariance. I think you missed a term involving
scale * offset when you computed the derivatives and solved.
Dan
On Fri, Jan 29, 2016 at 2:11 AM, pegahgh notifications@github.com wrote:
In src/chain/chain-training.cc
#4 (comment):
- BaseFloat scale = supervision.weight * opts.l2_regularize;
- *l2_term = -0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans);
- if (nnet_output_deriv)
nnet_output_deriv->AddMat(-1.0 \* scale, nnet_output);
- BaseFloat scale_coeff = supervision.weight * opts.l2_regularize;
- // If xent_output provided, l2 penalty is trying to regress the chain output
- // to be a linear function of cross-entropy output.
- // It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x + offset - y)^2,
- // where x is cross-entropy output and y is chain output.
- if (xent_output) {
//compute offset and scale
// The objecitve is to minimize L w.r.t scale_i, offset_i,
// L = -0.5 \* l2_regularize *
// \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2),
// where the target_ji = scale_i \* xent_output_ji + offset_i.
// scale_i = \sum_j (nnet_output_ji \* xent_output_ji) / \sum_j(xent_output_ji^2)
Dear Dan Hi No, I didn't optimize first w.r.t scale and then offset. I
jointly solved them and I substitute offset as a function of scale in 1st
equation and solved scale.
… <#-1527430838_>
On Thu, Jan 28, 2016 at 8:42 PM, Daniel Povey notifications@github.com
wrote: In src/chain/chain-training.cc <#4 (comment)
https://github.com/danpovey/kaldi/pull/4#discussion_r51216219>: > -
BaseFloat scale = supervision.weight * opts.l2_regularize; > - *l2_term =
-0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans); > - if
(nnet_output_deriv) > - nnet_output_deriv->AddMat(-1.0 * scale,
nnet_output); > + BaseFloat scale_coeff = supervision.weight *
opts.l2_regularize; > + // If xent_output provided, l2 penalty is trying to
regress the chain output > + // to be a linear function of cross-entropy
output. > + // It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x
- offset - y)^2, > + // where x is cross-entropy output and y is chain
output. > + if (xent_output) { > + //compute offset and scale > + // The
objecitve is to minimize L w.r.t scale_i, offset_i, > + // L = -0.5 *
l2_regularize * > + // \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji -
target_ji)^2), > + // where the target_ji = scale_i * xent_output_ji +
offset_i. > + // scale_i = \sum_j (nnet_output_ji * xent_output_ji) /
\sum_j(xent_output_ji^2) i'm deleting your relevant jobs till this is
fixed. — Reply to this email directly or view it on GitHub <
https://github.com/danpovey/kaldi/pull/4/files#r51216219>.—
Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/4/files#r51231663.
No description provided.