Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw when PCA generates invalid eigenvectors #5349

Merged
merged 1 commit into from
Aug 17, 2020

Conversation

antoniovs1029
Copy link
Member

@antoniovs1029 antoniovs1029 commented Aug 15, 2020

Fixes microsoft/NimbusML#497

As discussed there, the problem is that when PCA generates eigenvectors with NaN values, a cryptic exception is thrown on NimbusML during prediction and not during training. It's thrown during prediction because to do prediction NimbusML saves the model to disk and loads it back, and during deserialization there's a check that prevents loading eigenvectors that contain NaNs.

In this PR I'm adding an exception to the constructor of PcaModelParameters so that a more readable exception is thrown during training of either NimbusML or ML.NET, so there's no need to wait until prediction for NimbusML to throw it.

@antoniovs1029 antoniovs1029 requested a review from a team as a code owner August 15, 2020 05:57
@@ -461,6 +461,9 @@ internal PcaModelParameters(IHostEnvironment env, int rank, float[][] eigenVecto
{
_eigenVectors[i] = new VBuffer<float>(eigenVectors[i].Length, eigenVectors[i]);
_meanProjected[i] = VectorUtils.DotProduct(in _eigenVectors[i], in mean);
Host.CheckParam(_eigenVectors[i].GetValues().All(FloatUtils.IsFinite),
Copy link
Member Author

@antoniovs1029 antoniovs1029 Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same check is done when deserializing, so it's ok to also add it in the constructor:

var vi = ctx.Reader.ReadFloatArray(_dimension);
Host.CheckDecode(vi.All(FloatUtils.IsFinite));
_eigenVectors[i] = new VBuffer<float>(_dimension, vi);
_meanProjected[i] = VectorUtils.DotProduct(in _eigenVectors[i], in _mean);

The custom message was requested by the customer that opened the original issue on NimbusML.

@codecov
Copy link

codecov bot commented Aug 15, 2020

Codecov Report

Merging #5349 into master will decrease coverage by 0.07%.
The diff coverage is 96.15%.

@@            Coverage Diff             @@
##           master    #5349      +/-   ##
==========================================
- Coverage   74.04%   73.97%   -0.08%     
==========================================
  Files        1019     1019              
  Lines      189949   189975      +26     
  Branches    20429    20429              
==========================================
- Hits       140651   140531     -120     
- Misses      43786    43918     +132     
- Partials     5512     5526      +14     
Flag Coverage Δ
#Debug 73.97% <96.15%> (-0.08%) ⬇️
#production 69.74% <100.00%> (-0.10%) ⬇️
#test 87.68% <95.65%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
test/Microsoft.ML.Tests/AnomalyDetectionTests.cs 99.29% <95.65%> (-0.71%) ⬇️
src/Microsoft.ML.PCA/PcaTrainer.cs 81.87% <100.00%> (+0.16%) ⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0.00%> (-20.52%) ⬇️
...osoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs 83.60% <0.00%> (-7.27%) ⬇️
src/Microsoft.ML.Data/Training/TrainerUtils.cs 66.86% <0.00%> (-3.82%) ⬇️
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 83.19% <0.00%> (-3.37%) ⬇️
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs 85.23% <0.00%> (-3.33%) ⬇️
...L.AutoML/TrainerExtensions/TrainerExtensionUtil.cs 84.71% <0.00%> (-1.66%) ⬇️
src/Microsoft.ML.Sweeper/AsyncSweeper.cs 71.42% <0.00%> (-1.37%) ⬇️
...crosoft.ML.StandardTrainers/Optimizer/Optimizer.cs 71.96% <0.00%> (-1.16%) ⬇️
... and 10 more

@antoniovs1029 antoniovs1029 merged commit 1b1872f into dotnet:master Aug 17, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants