Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix race condition for test MulticlassTreeFeaturizedLRTest #4950

Merged

Conversation

@frank-dong-ms
Copy link
Member

frank-dong-ms commented Mar 18, 2020

method in TreeEnsembleFeaturizerTransform will be called from multi-threading, make variable "temp" as local variable to avoid race condition.

@frank-dong-ms frank-dong-ms requested a review from dotnet/mlnet-core as a code owner Mar 18, 2020
@frank-dong-ms frank-dong-ms requested review from harishsk, sharwell and mstfbl Mar 18, 2020

ValueMapper<TInput, Single> mapper;
if (seed == 0)
{
mapper =
(in TInput src, ref Single dst) =>
{
ulong temp = 0;

This comment has been minimized.

Copy link
@harishsk

harishsk Mar 18, 2020

Member

Very nice find! :-) Can you please add some comments about this change? Otherwise some future developer might want to "optimize" this and move it back.

Also, this pattern deserves more investigation. Do we have other mappers that are using variables set outside the mapper function definition? #Resolved

This comment has been minimized.

Copy link
@frank-dong-ms

frank-dong-ms Mar 18, 2020

Author Member

Yes, sure, I will add comments on this.


In reply to: 394130751 [](ancestors = 394130751)

This comment has been minimized.

Copy link
@frank-dong-ms

frank-dong-ms Mar 18, 2020

Author Member

@sharwell do you have any tool recommendation to detect similar race condition issue either static code analyze or run time detection?


In reply to: 394131983 [](ancestors = 394131983,394130751)

This comment has been minimized.

Copy link
@frank-dong-ms

frank-dong-ms Mar 18, 2020

Author Member

No similar issue found from other mappers but mapper are not only function used in multi-threading condition and it is impossible to view for all similar issue.


In reply to: 394130751 [](ancestors = 394130751)

This comment has been minimized.

Copy link
@sharwell

sharwell Mar 18, 2020

Member

I'm not aware of such a tool. It would be easier to write a more targeted tool that identified variables captured by a lambda so they could be reviewed for cases like this. #Resolved

This comment has been minimized.

Copy link
@frank-dong-ms

frank-dong-ms Mar 18, 2020

Author Member

Got it, thanks


In reply to: 394334398 [](ancestors = 394334398)

@@ -742,6 +742,8 @@ public static IDataTransform CreateForEntryPoint(IHostEnvironment env, Arguments
mapper =
(in TInput src, ref Single dst) =>
{
// Attention: this method will be used in multipe threading,
// don't put temp variable outside of this method to avoid race condition

This comment has been minimized.

Copy link
@harishsk

harishsk Mar 18, 2020

Member

Nitpick: The comment is slightly confusing. Can you please rephrase it? Say, something like:

This method is called from multiple threads. Do not move the temp variable outside this method. If you do, the variable is shared between the threads and results in a race condition. #Resolved

This comment has been minimized.

Copy link
@harishsk

harishsk Mar 18, 2020

Member

Isnt there a test that can be enabled back because of this fix? #Resolved

This comment has been minimized.

Copy link
@frank-dong-ms

frank-dong-ms Mar 18, 2020

Author Member

No, the test (MulticlassTreeFeaturizedLRTest) is not disabled but I see several fail on machine learning full test set.
As test machine only has 2 cores so the failure rate is not that high but if we run the test locally the failure rate is more than 10 percent.


In reply to: 394607002 [](ancestors = 394607002)

This comment has been minimized.

Copy link
@frank-dong-ms

frank-dong-ms Mar 18, 2020

Author Member

Sure, thanks


In reply to: 394606675 [](ancestors = 394606675)

@frank-dong-ms frank-dong-ms merged commit 26ffb3f into dotnet:master Mar 18, 2020
17 checks passed
17 checks passed
MachineLearning-CI Build #20200318.7 succeeded
Details
MachineLearning-CI (Centos_x64_NetCoreApp30 Debug_Build) Centos_x64_NetCoreApp30 Debug_Build succeeded
Details
MachineLearning-CI (Centos_x64_NetCoreApp30 Release_Build) Centos_x64_NetCoreApp30 Release_Build succeeded
Details
MachineLearning-CI (MacOS_x64_NetCoreApp21 Debug_Build) MacOS_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (MacOS_x64_NetCoreApp21 Release_Build) MacOS_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Ubuntu_x64_NetCoreApp21 Debug_Build) Ubuntu_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Ubuntu_x64_NetCoreApp21 Release_Build) Ubuntu_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp21 Debug_Build) Windows_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp21 Release_Build) Windows_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp30 Debug_Build) Windows_x64_NetCoreApp30 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp30 Release_Build) Windows_x64_NetCoreApp30 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetFx461 Debug_Build) Windows_x64_NetFx461 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetFx461 Release_Build) Windows_x64_NetFx461 Release_Build succeeded
Details
MachineLearning-CI (Windows_x86_NetCoreApp21 Debug_Build) Windows_x86_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x86_NetCoreApp21 Release_Build) Windows_x86_NetCoreApp21 Release_Build succeeded
Details
WIP Ready for review
Details
license/cla All CLA requirements met.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.