Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SQLite database to test loading of datasets in non-Windows builds #5080

Merged
merged 5 commits into from May 5, 2020

Conversation

mstfbl
Copy link
Contributor

@mstfbl mstfbl commented May 1, 2020

Fix #4156 and related to #4175

MSSQL databases are not supported on Mac and Linux builds. As such, I've generated an equivalent SQLite database, and re-activated these disabled Iris tests so that these unit tests can run on non-Windows builds and produce equivalent test results.

Reactivated tests on Linux and MacOS list:

  • IrisLightGbm()
  • IrisLightGbmWithLoadColumnName()
  • IrisVectorLightGbm()
  • IrisVectorLightGbmWithLoadColumnName()
  • IrisSdcaMaximumEntropy()

@mstfbl mstfbl marked this pull request as ready for review May 1, 2020 22:25
@mstfbl mstfbl requested a review from a team as a code owner May 1, 2020 22:25
@mstfbl mstfbl self-assigned this May 1, 2020
@codecov
Copy link

codecov bot commented May 1, 2020

Codecov Report

Merging #5080 into master will decrease coverage by 0.01%.
The diff coverage is 47.69%.

@@            Coverage Diff             @@
##           master    #5080      +/-   ##
==========================================
- Coverage   75.66%   75.64%   -0.02%     
==========================================
  Files         993      993              
  Lines      178157   178192      +35     
  Branches    19125    19125              
==========================================
- Hits       134800   134799       -1     
- Misses      38136    38164      +28     
- Partials     5221     5229       +8     
Flag Coverage Δ
#Debug 75.64% <47.69%> (-0.02%) ⬇️
#production 71.62% <ø> (-0.02%) ⬇️
#test 88.64% <47.69%> (-0.04%) ⬇️
Impacted Files Coverage Δ
test/Microsoft.ML.Tests/DatabaseLoaderTests.cs 82.91% <42.37%> (-8.27%) ⬇️
test/Microsoft.ML.TestFrameworkCommon/Datasets.cs 100.00% <100.00%> (ø)
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0.00%> (-20.52%) ⬇️
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 80.67% <0.00%> (-6.73%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0.00%> (-1.46%) ⬇️
src/Microsoft.ML.AutoML/Sweepers/Parameters.cs 84.32% <0.00%> (-0.85%) ⬇️
...icrosoft.ML.AutoML/Experiment/SuggestedPipeline.cs 92.78% <0.00%> (ø)
...L.AutoML/TrainerExtensions/TrainerExtensionUtil.cs 86.89% <0.00%> (ø)
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 62.25% <0.00%> (+2.64%) ⬆️
#Resolved

@codecov
Copy link

codecov bot commented May 2, 2020

Codecov Report

Merging #5080 into master will decrease coverage by 0.01%.
The diff coverage is 47.69%.

@@            Coverage Diff             @@
##           master    #5080      +/-   ##
==========================================
- Coverage   75.66%   75.64%   -0.02%     
==========================================
  Files         993      993              
  Lines      178157   178192      +35     
  Branches    19125    19125              
==========================================
- Hits       134800   134799       -1     
- Misses      38136    38164      +28     
- Partials     5221     5229       +8     
Flag Coverage Δ
#Debug 75.64% <47.69%> (-0.02%) ⬇️
#production 71.62% <ø> (-0.02%) ⬇️
#test 88.64% <47.69%> (-0.04%) ⬇️
Impacted Files Coverage Δ
test/Microsoft.ML.Tests/DatabaseLoaderTests.cs 82.91% <42.37%> (-8.27%) ⬇️
test/Microsoft.ML.TestFrameworkCommon/Datasets.cs 100.00% <100.00%> (ø)
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0.00%> (-20.52%) ⬇️
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 80.67% <0.00%> (-6.73%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0.00%> (-1.46%) ⬇️
src/Microsoft.ML.AutoML/Sweepers/Parameters.cs 84.32% <0.00%> (-0.85%) ⬇️
...icrosoft.ML.AutoML/Experiment/SuggestedPipeline.cs 92.78% <0.00%> (ø)
...L.AutoML/TrainerExtensions/TrainerExtensionUtil.cs 86.89% <0.00%> (ø)
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 62.25% <0.00%> (+2.64%) ⬆️
#Resolved

@mstfbl
Copy link
Contributor Author

mstfbl commented May 2, 2020

Currently tests on MacOS and Linux builds are failing as the version of ML.TestDatabase is not yet updated. This PR in the testdata repo, once merged, will fix this.
Update: An update NuGet package for the machinelearning-testdata repo in ML .NET is also needed to obtain iris.sqlite.

@mstfbl
Copy link
Contributor Author

mstfbl commented May 2, 2020

Marking this PR as draft for now, until the NuGet package for the up-to-date NuGet package for the machinelearning-testdata repo is available, where the necessary iris.sqlite is in its TestDatasets directory with version 0.0.6-test.

@mstfbl mstfbl marked this pull request as draft May 2, 2020 22:10
@mstfbl mstfbl marked this pull request as ready for review May 4, 2020 22:01
@@ -20,6 +20,7 @@
https://dotnet.myget.org/F/dotnet-core/api/v3/index.json;
https://dotnet.myget.org/F/roslyn-analyzers/api/v3/index.json;
https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json;
https://pkgs.dev.azure.com/dnceng/public/_packaging/machinelearning-testdata/nuget/v3/index.json;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is necessary to obtain the most up-to-date NuGet package for TestDatasets.

return new DatabaseSource(
SqlClientFactory.Instance,
GetMSSQLConnectionString(TestDatasets.irisDb.name),
String.Format(command, $@"""{TestDatasets.irisDb.trainFilename}"""));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing TestDatasets.irisDb.trainFilename as $@"""{TestDatasets.irisDb.trainFilename}""" is necessary to correctly pass the table name as \"iris-train\" in MSSQL.

@mstfbl mstfbl requested a review from harishsk May 4, 2020 23:47
Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@mstfbl mstfbl merged commit 401928a into dotnet:master May 5, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a tests database for the DatabaseLoader tests on macOS and Linux
4 participants