-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add epsilon and ddof (delta degrees of freedom) arguments to Normalize. #1964
Conversation
60dafbf
to
34f5f10
Compare
!build |
a1bfcd1
to
57bfed8
Compare
CI MESSAGE: [1322135]: BUILD STARTED |
Improve normalization precision on CPU. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
57bfed8
to
18dc925
Compare
var = ((x - mean).astype(np.float)**2).mean(axis = axes, keepdims = True) | ||
stddev = np.sqrt(var) | ||
|
||
if stddev is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can just use numpy.std
at least for some case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's going to be a bit artificial, but I can give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking to validate if or definition of ddof matches the one from numpy. But it is just a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
CI MESSAGE: [1322135]: BUILD FAILED |
CI MESSAGE: [1322135]: BUILD PASSED |
Apply epsilon to explicit scalar stddev. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
ffff42f
to
7da97ae
Compare
!build |
CI MESSAGE: [1322839]: BUILD STARTED |
CI MESSAGE: [1322839]: BUILD PASSED |
scalar_inv_stddev = scale_ / spec_.GetArgument<float>("stddev"); | ||
float scalar_stddev = spec_.GetArgument<float>("stddev"); | ||
if (epsilon_) | ||
scalar_inv_stddev = scale_ * rsqrt(scalar_stddev*scalar_stddev + epsilon_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me too long to connect the fact that we're adding epsilon to Variance
and we're handling the Standard deviation
here, hence the square and square root. Can you maybe add a comment that would make it more obvious here, the docs for epsilon is quite distant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
int64_t v = volume(inv.shape.tensor_shape_span(i)); | ||
if (epsilon) { | ||
for (int64_t j = 0; j < v; j++) { | ||
inv.data[i][j] = scale * rsqrt(stddev.data[i][j] * stddev.data[i][j] + epsilon); | ||
} | ||
} else { | ||
for (int64_t j = 0; j < v; j++) { | ||
inv.data[i][j] = stddev.data[i][j] ? scale / stddev.data[i][j] : 0; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Doxygen.
float scale = scale_; | ||
if (v > degrees_of_freedom_) { | ||
rdiv = static_cast<float>(1.0 / (v - degrees_of_freedom_)); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we error in such cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy will give you infinite stddev - so dividing by it will produce zero - I'm sort-of emulating that.
// | ||
// rsqrt needs an extra step of Newton-Raphson refinement: | ||
// rough = approx_rsqrt(x) | ||
// precise = rough * (3 + x*y*y) * 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but what is y
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I'm already nitpicking there is 3 + xyy and in the code it's actually 3-xyy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix that.
// Vectorized version of the loop below | ||
|
||
// We calculate the following: | ||
// mul * rsqrt(data[i] + eps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// mul * rsqrt(data[i] + eps) | |
// mul * rsqrt(data[i] * rdiv + eps) |
Maybe that should go into @brief
section of this function?
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
!build |
CI MESSAGE: [1328752]: BUILD STARTED |
CI MESSAGE: [1328752]: BUILD PASSED |
Signed-off-by: Michał Zientkiewicz mzient@gmail.com
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
epsilon
added to varianceddof
(delta degrees of freedom), subtracted from variance's denominatorJIRA TASK: N/A