Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore hidden columns in AutoML schema checks of validation data #4490

Merged
merged 2 commits into from Nov 21, 2019

Conversation

@daholste
Copy link
Contributor

daholste commented Nov 20, 2019

Closes #4491

When the AutoML API consumes data, it validates schema consistency between the train and validation data.

There are two bugs in this logic:

  1. The API asserts that the count of columns in the train and validation data must be equal. This throws an exception if the two data views have the same number of active columns but a different number of hidden columns. This PR updates to assert that the # of active (not hidden) columns in the train and validation data are equal.

  2. If either the train or validation data has a hidden column with a type that differs from an active column of the same name, an exception is thrown. This PR restricts type consistency checks to active columns only.

@daholste daholste added the AutoML label Nov 20, 2019
@daholste daholste requested a review from dotnet/mlnet-automl as a code owner Nov 20, 2019
Copy link
Member

justinormont left a comment

LGTM.

@@ -183,14 +183,19 @@ private static void ValidateValidationData(IDataView trainData, IDataView valida

const string schemaMismatchError = "Training data and validation data schemas do not match.";

if (trainData.Schema.Count != validationData.Schema.Count)
if (trainData.Schema.Count(c => !c.IsHidden) != validationData.Schema.Count(c => !c.IsHidden))
{
throw new ArgumentException($"{schemaMismatchError} Train data has '{trainData.Schema.Count}' columns," +
$"and validation data has '{validationData.Schema.Count}' columns.", nameof(validationData));
}

foreach (var trainCol in trainData.Schema)

This comment has been minimized.

Copy link
@justinormont

justinormont Nov 21, 2019

Member

Could use a comment...

Suggested change
foreach (var trainCol in trainData.Schema)
// Also indirectly checks for new columns in the validation datasets as we above enforce the column counts are equal
foreach (var trainCol in trainData.Schema)

I was otherwise going to suggest we check the reverse direction. A comment helps future readers to not have to think through it.

@codecov

This comment has been minimized.

Copy link

codecov bot commented Nov 21, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@b7db4fa). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master    #4490   +/-   ##
=========================================
  Coverage          ?   74.85%           
=========================================
  Files             ?      908           
  Lines             ?   159880           
  Branches          ?    17215           
=========================================
  Hits              ?   119678           
  Misses            ?    35384           
  Partials          ?     4818
Flag Coverage Δ
#Debug 74.85% <100%> (?)
#production 70.2% <100%> (?)
#test 90.19% <100%> (?)
Impacted Files Coverage Δ
...rosoft.ML.AutoML.Tests/UserInputValidationTests.cs 100% <100%> (ø)
...crosoft.ML.AutoML/Utils/UserInputValidationUtil.cs 91.81% <100%> (ø)
@daholste daholste merged commit 71e97ea into dotnet:master Nov 21, 2019
17 of 19 checks passed
17 of 19 checks passed
MachineLearning-CodeCoverage #20191121.5 failed
Details
MachineLearning-CodeCoverage (Windows_x64 Build_Debug) Windows_x64 Build_Debug was canceled
Details
MachineLearning-CI #20191121.5 succeeded
Details
MachineLearning-CI (Centos_x64_NetCoreApp30 Debug_Build) Centos_x64_NetCoreApp30 Debug_Build succeeded
Details
MachineLearning-CI (Centos_x64_NetCoreApp30 Release_Build) Centos_x64_NetCoreApp30 Release_Build succeeded
Details
MachineLearning-CI (MacOS_x64_NetCoreApp21 Debug_Build) MacOS_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (MacOS_x64_NetCoreApp21 Release_Build) MacOS_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Ubuntu_x64_NetCoreApp21 Debug_Build) Ubuntu_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Ubuntu_x64_NetCoreApp21 Release_Build) Ubuntu_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp21 Debug_Build) Windows_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp21 Release_Build) Windows_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp30 Debug_Build) Windows_x64_NetCoreApp30 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp30 Release_Build) Windows_x64_NetCoreApp30 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetFx461 Debug_Build) Windows_x64_NetFx461 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetFx461 Release_Build) Windows_x64_NetFx461 Release_Build succeeded
Details
MachineLearning-CI (Windows_x86_NetCoreApp21 Debug_Build) Windows_x86_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x86_NetCoreApp21 Release_Build) Windows_x86_NetCoreApp21 Release_Build succeeded
Details
WIP Ready for review
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.