Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upIgnore hidden columns in AutoML schema checks of validation data #4490
Conversation
LGTM. |
@@ -183,14 +183,19 @@ private static void ValidateValidationData(IDataView trainData, IDataView valida | |||
|
|||
const string schemaMismatchError = "Training data and validation data schemas do not match."; | |||
|
|||
if (trainData.Schema.Count != validationData.Schema.Count) | |||
if (trainData.Schema.Count(c => !c.IsHidden) != validationData.Schema.Count(c => !c.IsHidden)) | |||
{ | |||
throw new ArgumentException($"{schemaMismatchError} Train data has '{trainData.Schema.Count}' columns," + | |||
$"and validation data has '{validationData.Schema.Count}' columns.", nameof(validationData)); | |||
} | |||
|
|||
foreach (var trainCol in trainData.Schema) |
This comment has been minimized.
This comment has been minimized.
justinormont
Nov 21, 2019
Member
Could use a comment...
Suggested change
foreach (var trainCol in trainData.Schema) | |
// Also indirectly checks for new columns in the validation datasets as we above enforce the column counts are equal | |
foreach (var trainCol in trainData.Schema) |
I was otherwise going to suggest we check the reverse direction. A comment helps future readers to not have to think through it.
This comment has been minimized.
This comment has been minimized.
codecov
bot
commented
Nov 21, 2019
•
Codecov Report
@@ Coverage Diff @@
## master #4490 +/- ##
=========================================
Coverage ? 74.85%
=========================================
Files ? 908
Lines ? 159880
Branches ? 17215
=========================================
Hits ? 119678
Misses ? 35384
Partials ? 4818
|
71e97ea
into
dotnet:master
17 of 19 checks passed
17 of 19 checks passed
MachineLearning-CI (Centos_x64_NetCoreApp30 Debug_Build)
Centos_x64_NetCoreApp30 Debug_Build succeeded
Details
MachineLearning-CI (Centos_x64_NetCoreApp30 Release_Build)
Centos_x64_NetCoreApp30 Release_Build succeeded
Details
MachineLearning-CI (MacOS_x64_NetCoreApp21 Debug_Build)
MacOS_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (MacOS_x64_NetCoreApp21 Release_Build)
MacOS_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Ubuntu_x64_NetCoreApp21 Debug_Build)
Ubuntu_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Ubuntu_x64_NetCoreApp21 Release_Build)
Ubuntu_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp21 Debug_Build)
Windows_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp21 Release_Build)
Windows_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp30 Debug_Build)
Windows_x64_NetCoreApp30 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp30 Release_Build)
Windows_x64_NetCoreApp30 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetFx461 Debug_Build)
Windows_x64_NetFx461 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetFx461 Release_Build)
Windows_x64_NetFx461 Release_Build succeeded
Details
MachineLearning-CI (Windows_x86_NetCoreApp21 Debug_Build)
Windows_x86_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x86_NetCoreApp21 Release_Build)
Windows_x86_NetCoreApp21 Release_Build succeeded
Details
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
daholste commentedNov 20, 2019
•
edited
Closes #4491
When the AutoML API consumes data, it validates schema consistency between the train and validation data.
There are two bugs in this logic:
The API asserts that the count of columns in the train and validation data must be equal. This throws an exception if the two data views have the same number of active columns but a different number of hidden columns. This PR updates to assert that the # of active (not hidden) columns in the train and validation data are equal.
If either the train or validation data has a hidden column with a type that differs from an active column of the same name, an exception is thrown. This PR restricts type consistency checks to active columns only.