Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the database cursor to return default for DBNull #4070

Merged
merged 3 commits into from Aug 7, 2019

Conversation

@tannergooding
Copy link
Member

commented Aug 6, 2019

This updates the DatabaseLoaderCursor to support nullable columns by treating them as default.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Aug 6, 2019

Peek data in DataView: Showing 2 rows with the columns
######################################################
Row--> | Label:False| Label:False| Feat01:32| Feat02:3| Feat03:5| Feat04:| Feat05:1| Feat06:0| Feat07:0| Feat08:61| Feat09:5| Feat10:0| Feat11:1| Feat12:3157| Feat13:5| Cat14:e5f3fd8d| Cat15:a0aaffa6| Cat16:6faa15d5| Cat17:da8a3421| Cat18:3cd69f23| Cat19:6fcd6dcb| Cat20:ab16ed81| Cat21:43426c29| Cat22:1df5e154| Cat23:7de9c0a9| Cat24:6652dc64| Cat25:99eb4e27| Cat26:00c5ffb7| Cat27:be4ee537| Cat28:f3bbfe99| Cat29:4cdc3efa| Cat30:d20856aa| Cat31:a1eb1511| Cat32:9512c20b| Cat33:febfd863| Cat34:a3323ca1| Cat35:c8e1ee56| Cat36:1752e9e8| Cat37:75350c8a| Cat38:991321ea| Cat39:b757e957| Cat14Encoded:1| Cat14Encoded:Sparse vector of size 7, 0 explicit values| Cat15Encoded:1| Cat15Encoded:Sparse vector of size 8, 0 explicit values| Cat16Encoded:1| Cat16Encoded:Sparse vector of size 8, 0 explicit values| Cat17Encoded:1| Cat17Encoded:Sparse vector of size 7, 0 explicit values| Cat18Encoded:1| Cat18Encoded:Sparse vector of size 7, 0 explicit values| Cat19Encoded:1| Cat19Encoded:Sparse vector of size 3, 0 explicit values| Cat20Encoded:1| Cat20Encoded:Sparse vector of size 8, 0 explicit values| Cat21Encoded:1| Cat21Encoded:Sparse vector of size 7, 0 explicit values| Cat22Encoded:1| Cat22Encoded:Sparse vector of size 5, 0 explicit values| Cat23Encoded:1| Cat23Encoded:Sparse vector of size 7, 0 explicit values| Cat24Encoded:1| Cat24Encoded:Sparse vector of size 7, 0 explicit values| Cat25Encoded:1| Cat25Encoded:Sparse vector of size 8, 0 explicit values| Cat26Encoded:1| Cat26Encoded:Sparse vector of size 5, 0 explicit values| Cat27Encoded:1| Cat27Encoded:Sparse vector of size 5, 0 explicit values| Cat28Encoded:1| Cat28Encoded:Sparse vector of size 7, 0 explicit values| Cat29Encoded:1| Cat29Encoded:Sparse vector of size 4, 0 explicit values| Cat30Encoded:1| Cat30Encoded:Sparse vector of size 3, 0 explicit values| Cat31Encoded:1| Cat31Encoded:Sparse vector of size 5, 0 explicit values| Cat32Encoded:1| Cat32Encoded:Sparse vector of size 5, 0 explicit values| Cat33Encoded:1| Cat33Encoded:Sparse vector of size 7, 0 explicit values| Cat34Encoded:1| Cat34Encoded:Sparse vector of size 7, 0 explicit values| Cat35Encoded:1| Cat35Encoded:Sparse vector of size 7, 0 explicit values| Cat36Encoded:1| Cat36Encoded:Sparse vector of size 6, 0 explicit values| Cat37Encoded:1| Cat37Encoded:Sparse vector of size 8, 0 explicit values| Cat38Encoded:1| Cat38Encoded:Sparse vector of size 5, 0 explicit values| Cat39Encoded:1| Cat39Encoded:Sparse vector of size 6, 0 explicit values| Feat01Featurized:Sparse vector of size 100, 3 explicit values| Feat02Featurized:Sparse vector of size 248, 2 explicit values| Feat03Featurized:Sparse vector of size 65, 2 explicit values| Feat04Featurized:Sparse vector of size 104, 0 explicit values| Feat05Featurized:Sparse vector of size 57, 2 explicit values| Feat06Featurized:Sparse vector of size 19, 2 explicit values| Feat07Featurized:Dense vector of size 7| Feat08Featurized:Sparse vector of size 153, 3 explicit values| Feat09Featurized:Sparse vector of size 80, 2 explicit values| Feat10Featurized:Dense vector of size 6| Feat11Featurized:Sparse vector of size 25, 2 explicit values| Feat12Featurized:Sparse vector of size 366, 5 explicit values| Feat13Featurized:Sparse vector of size 63, 2 explicit values| Features:Sparse vector of size 1758, 41 explicit values

Row--> | Label:False| Label:False| Feat01:| Feat02:233| Feat03:1| Feat04:146| Feat05:1| Feat06:0| Feat07:0| Feat08:99| Feat09:7| Feat10:0| Feat11:1| Feat12:3101| Feat13:1| Cat14:62770d79| Cat15:ad984203| Cat16:62bec60d| Cat17:386c49ee| Cat18:e755064d| Cat19:6fcd6dcb| Cat20:b5f5eb62| Cat21:d1f2cc8b| Cat22:2e4e821f| Cat23:2e027dc1| Cat24:0c7c4231| Cat25:12716184| Cat26:00c5ffb7| Cat27:be4ee537| Cat28:f70f0d0b| Cat29:4cdc3efa| Cat30:d20856aa| Cat31:628f1b8d| Cat32:9512c20b| Cat33:c38e2f28| Cat34:14f65a5d| Cat35:25b1b089| Cat36:d7c1fc0b| Cat37:34a9b905| Cat38:ff654802| Cat39:ed10571d| Cat14Encoded:2| Cat14Encoded:Sparse vector of size 7, 1 explicit values| Cat15Encoded:2| Cat15Encoded:Sparse vector of size 8, 1 explicit values| Cat16Encoded:2| Cat16Encoded:Sparse vector of size 8, 1 explicit values| Cat17Encoded:2| Cat17Encoded:Sparse vector of size 7, 1 explicit values| Cat18Encoded:2| Cat18Encoded:Sparse vector of size 7, 1 explicit values| Cat19Encoded:1| Cat19Encoded:Sparse vector of size 3, 0 explicit values| Cat20Encoded:2| Cat20Encoded:Sparse vector of size 8, 1 explicit values| Cat21Encoded:2| Cat21Encoded:Sparse vector of size 7, 1 explicit values| Cat22Encoded:2| Cat22Encoded:Sparse vector of size 5, 1 explicit values| Cat23Encoded:2| Cat23Encoded:Sparse vector of size 7, 1 explicit values| Cat24Encoded:2| Cat24Encoded:Sparse vector of size 7, 1 explicit values| Cat25Encoded:2| Cat25Encoded:Sparse vector of size 8, 1 explicit values| Cat26Encoded:1| Cat26Encoded:Sparse vector of size 5, 0 explicit values| Cat27Encoded:1| Cat27Encoded:Sparse vector of size 5, 0 explicit values| Cat28Encoded:2| Cat28Encoded:Sparse vector of size 7, 1 explicit values| Cat29Encoded:1| Cat29Encoded:Sparse vector of size 4, 0 explicit values| Cat30Encoded:1| Cat30Encoded:Sparse vector of size 3, 0 explicit values| Cat31Encoded:2| Cat31Encoded:Sparse vector of size 5, 1 explicit values| Cat32Encoded:1| Cat32Encoded:Sparse vector of size 5, 0 explicit values| Cat33Encoded:2| Cat33Encoded:Sparse vector of size 7, 1 explicit values| Cat34Encoded:2| Cat34Encoded:Sparse vector of size 7, 1 explicit values| Cat35Encoded:2| Cat35Encoded:Sparse vector of size 7, 1 explicit values| Cat36Encoded:2| Cat36Encoded:Sparse vector of size 6, 1 explicit values| Cat37Encoded:2| Cat37Encoded:Sparse vector of size 8, 1 explicit values| Cat38Encoded:2| Cat38Encoded:Sparse vector of size 5, 1 explicit values| Cat39Encoded:2| Cat39Encoded:Sparse vector of size 6, 1 explicit values| Feat01Featurized:Sparse vector of size 100, 0 explicit values| Feat02Featurized:Sparse vector of size 248, 4 explicit values| Feat03Featurized:Sparse vector of size 65, 2 explicit values| Feat04Featurized:Sparse vector of size 104, 4 explicit values| Feat05Featurized:Sparse vector of size 57, 2 explicit values| Feat06Featurized:Sparse vector of size 19, 2 explicit values| Feat07Featurized:Dense vector of size 7| Feat08Featurized:Sparse vector of size 153, 3 explicit values| Feat09Featurized:Sparse vector of size 80, 2 explicit values| Feat10Featurized:Dense vector of size 6| Feat11Featurized:Sparse vector of size 25, 2 explicit values| Feat12Featurized:Sparse vector of size 366, 5 explicit values| Feat13Featurized:Sparse vector of size 63, 2 explicit values| Features:Sparse vector of size 1758, 64 explicit values

Training model...
elapsed time for training the model = 665277
Evaluating the model...
elapsed time for evaluating the model = 686433
************************************************************
*       Metrics for ====Evaluation Metrics for Large datasets stored in Database==== binary classification model
*-----------------------------------------------------------
*       Accuracy: 97.08%
*       Area Under Curve:      71.04%
*       Area under Precision recall Curve:  7.39%
*       F1Score:  NaN
*       LogLoss:  .18
*       LogLossReduction:  .06
*       PositivePrecision:
*       PositiveRecall:
*       NegativePrecision:  .97
*       NegativeRecall:  100.00%
************************************************************
=============== Press any key ===============
@@ -236,79 +236,79 @@ private Delegate CreateGetterDelegate<TValue>(int col)
private ValueGetter<bool> CreateBooleanGetterDelegate(ColInfo colInfo)
{
int columnIndex = GetColumnIndex(colInfo);
return (ref bool value) => value = DataReader.GetBoolean(columnIndex);
return (ref bool value) => value = DataReader.IsDBNull(columnIndex) ? default : DataReader.GetBoolean(columnIndex);

This comment has been minimized.

Copy link
@eerhardt

eerhardt Aug 6, 2019

Member

I don't think blindly turning null into default is really the correct thing to do here.

I wonder if we should have optional behaviors that the user can opt into. Potentially something like:

  1. (default behavior) throw on nulls so the user knows they have to make some decision.
  2. Turn null into default.
  3. Convert nullable integer columns into float/double, and use NaN to designate null values. They can then use the Replace N/A transforms available to them in the rest of the pipeline.
  4. The user can always change their schema (either inserting the data into a different table and replacing null as appropriate, creating a special stored proc or SELECT statement to do the null conversion) as an option to get around the exception as well.

See @TomFinley's comments at #673 (comment) for more thoughts here.

@codemzs @ebarsoumMS - any thoughts on the appropriate behavior here?

@eerhardt eerhardt merged commit bbb6b15 into dotnet:master Aug 7, 2019

17 checks passed

MachineLearning-CI Build #20190806.19 succeeded
Details
MachineLearning-CI (Centos_x64_NetCoreApp30 Debug_Build) Centos_x64_NetCoreApp30 Debug_Build succeeded
Details
MachineLearning-CI (Centos_x64_NetCoreApp30 Release_Build) Centos_x64_NetCoreApp30 Release_Build succeeded
Details
MachineLearning-CI (MacOS_x64_NetCoreApp21 Debug_Build) MacOS_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (MacOS_x64_NetCoreApp21 Release_Build) MacOS_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Ubuntu_x64_NetCoreApp21 Debug_Build) Ubuntu_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Ubuntu_x64_NetCoreApp21 Release_Build) Ubuntu_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp21 Debug_Build) Windows_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp21 Release_Build) Windows_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp30 Debug_Build) Windows_x64_NetCoreApp30 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp30 Release_Build) Windows_x64_NetCoreApp30 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetFx461 Debug_Build) Windows_x64_NetFx461 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetFx461 Release_Build) Windows_x64_NetFx461 Release_Build succeeded
Details
MachineLearning-CI (Windows_x86_NetCoreApp21 Debug_Build) Windows_x86_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x86_NetCoreApp21 Release_Build) Windows_x86_NetCoreApp21 Release_Build succeeded
Details
WIP Ready for review
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.