-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw when PCA generates invalid eigenvectors #5349
Conversation
@@ -461,6 +461,9 @@ internal PcaModelParameters(IHostEnvironment env, int rank, float[][] eigenVecto | |||
{ | |||
_eigenVectors[i] = new VBuffer<float>(eigenVectors[i].Length, eigenVectors[i]); | |||
_meanProjected[i] = VectorUtils.DotProduct(in _eigenVectors[i], in mean); | |||
Host.CheckParam(_eigenVectors[i].GetValues().All(FloatUtils.IsFinite), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same check is done when deserializing, so it's ok to also add it in the constructor:
machinelearning/src/Microsoft.ML.PCA/PcaTrainer.cs
Lines 506 to 509 in a769365
var vi = ctx.Reader.ReadFloatArray(_dimension); | |
Host.CheckDecode(vi.All(FloatUtils.IsFinite)); | |
_eigenVectors[i] = new VBuffer<float>(_dimension, vi); | |
_meanProjected[i] = VectorUtils.DotProduct(in _eigenVectors[i], in _mean); |
The custom message was requested by the customer that opened the original issue on NimbusML.
Codecov Report
@@ Coverage Diff @@
## master #5349 +/- ##
==========================================
- Coverage 74.04% 73.97% -0.08%
==========================================
Files 1019 1019
Lines 189949 189975 +26
Branches 20429 20429
==========================================
- Hits 140651 140531 -120
- Misses 43786 43918 +132
- Partials 5512 5526 +14
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Fixes microsoft/NimbusML#497
As discussed there, the problem is that when PCA generates eigenvectors with NaN values, a cryptic exception is thrown on NimbusML during prediction and not during training. It's thrown during prediction because to do prediction NimbusML saves the model to disk and loads it back, and during deserialization there's a check that prevents loading eigenvectors that contain NaNs.
In this PR I'm adding an exception to the constructor of PcaModelParameters so that a more readable exception is thrown during training of either NimbusML or ML.NET, so there's no need to wait until prediction for NimbusML to throw it.