Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add epsilon and ddof (delta degrees of freedom) arguments to Normalize. #1964

Merged
merged 3 commits into from
May 18, 2020

Conversation

mzient
Copy link
Contributor

@mzient mzient commented May 14, 2020

Signed-off-by: Michał Zientkiewicz mzient@gmail.com

Why we need this PR?

Pick one, remove the rest

  • It adds new feature (degrees of freedom) needed to maintain compatibility with parameters of stddev in PyTorch and in numpy
  • It adds regularizing term to normalization (added to variance)

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    • Add epsilon added to variance
    • And ddof (delta degrees of freedom), subtracted from variance's denominator
    • Improve normalization precision on CPU (Newton-Raphson step)
    • Add ddof and epsilon to tests
  • Affected modules and functionalities:
    • Normalize operator
  • Key points relevant for the review:
    • ?
  • Validation and testing:
    • End-to-end python test
  • Documentation (including examples):
    • New arguments documented in schema

JIRA TASK: N/A

@mzient mzient requested a review from a team May 14, 2020 17:46
@mzient mzient force-pushed the NormalizeRegularizingTerm branch from 60dafbf to 34f5f10 Compare May 14, 2020 17:51
@mzient
Copy link
Contributor Author

mzient commented May 14, 2020

!build

@mzient mzient force-pushed the NormalizeRegularizingTerm branch 2 times, most recently from a1bfcd1 to 57bfed8 Compare May 14, 2020 18:00
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1322135]: BUILD STARTED

Improve normalization precision on CPU.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mzient mzient force-pushed the NormalizeRegularizingTerm branch from 57bfed8 to 18dc925 Compare May 14, 2020 18:03
var = ((x - mean).astype(np.float)**2).mean(axis = axes, keepdims = True)
stddev = np.sqrt(var)

if stddev is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can just use numpy.std at least for some case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be a bit artificial, but I can give it a shot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking to validate if or definition of ddof matches the one from numpy. But it is just a suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1322135]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1322135]: BUILD PASSED

Apply epsilon to explicit scalar stddev.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mzient mzient force-pushed the NormalizeRegularizingTerm branch from ffff42f to 7da97ae Compare May 14, 2020 21:32
@mzient
Copy link
Contributor Author

mzient commented May 14, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1322839]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1322839]: BUILD PASSED

scalar_inv_stddev = scale_ / spec_.GetArgument<float>("stddev");
float scalar_stddev = spec_.GetArgument<float>("stddev");
if (epsilon_)
scalar_inv_stddev = scale_ * rsqrt(scalar_stddev*scalar_stddev + epsilon_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me too long to connect the fact that we're adding epsilon to Variance and we're handling the Standard deviation here, hence the square and square root. Can you maybe add a comment that would make it more obvious here, the docs for epsilon is quite distant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +94 to 103
int64_t v = volume(inv.shape.tensor_shape_span(i));
if (epsilon) {
for (int64_t j = 0; j < v; j++) {
inv.data[i][j] = scale * rsqrt(stddev.data[i][j] * stddev.data[i][j] + epsilon);
}
} else {
for (int64_t j = 0; j < v; j++) {
inv.data[i][j] = stddev.data[i][j] ? scale / stddev.data[i][j] : 0;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Doxygen.

float scale = scale_;
if (v > degrees_of_freedom_) {
rdiv = static_cast<float>(1.0 / (v - degrees_of_freedom_));
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we error in such cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy will give you infinite stddev - so dividing by it will produce zero - I'm sort-of emulating that.

//
// rsqrt needs an extra step of Newton-Raphson refinement:
// rough = approx_rsqrt(x)
// precise = rough * (3 + x*y*y) * 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but what is y?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm already nitpicking there is 3 + xyy and in the code it's actually 3-xyy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix that.

// Vectorized version of the loop below

// We calculate the following:
// mul * rsqrt(data[i] + eps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// mul * rsqrt(data[i] + eps)
// mul * rsqrt(data[i] * rdiv + eps)

Maybe that should go into @brief section of this function?

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mzient
Copy link
Contributor Author

mzient commented May 18, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1328752]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1328752]: BUILD PASSED

@mzient mzient merged commit 08cfe99 into NVIDIA:master May 18, 2020
@mzient mzient deleted the NormalizeRegularizingTerm branch May 18, 2020 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants