Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve test stability #4503

Open
wants to merge 1 commit into
base: master
from

Conversation

@frank-dong-ms
Copy link
Member

frank-dong-ms commented Nov 26, 2019

  1. add datatime when start/finish tests
  2. add timeout for thread waiting to prevent infinite wait
  3. fix cancel token cancel twice issue
@frank-dong-ms frank-dong-ms requested a review from dotnet/mlnet-core as a code owner Nov 26, 2019
@@ -90,7 +90,7 @@ public void Wait(long position, CancellationToken token = default(CancellationTo
ev = new WaitStats(position);
_waiters.Add(ev);
}
ev.Event.Wait(token);
ev.Event.Wait(5*60*1000, token);
if (_ex != null)

This comment has been minimized.

Copy link
@harishsk

harishsk Nov 26, 2019

Member

This is a five minute wait. Why five minutes? Can you please give more background behind this choice?

This comment has been minimized.

Copy link
@frank-dong-ms

frank-dong-ms Nov 26, 2019

Author Member

This caused most of our test hanging according to my local test result.
Some processor is waiting for signal to start but seems in some case it never can get the signal to run and the wait hangs the test process. This happens in multiple test cases so I just putting a pretty large timeout here, maybe we need discuss more to understand the root cause


In reply to: 350531489 [](ancestors = 350531489)

@@ -90,7 +90,7 @@ public void Wait(long position, CancellationToken token = default(CancellationTo
ev = new WaitStats(position);
_waiters.Add(ev);
}
ev.Event.Wait(token);
ev.Event.Wait(5*60*1000, token);
if (_ex != null)

This comment has been minimized.

Copy link
@harishsk

harishsk Nov 26, 2019

Member

Can this be a constant that will apply to all use cases? Or does this need to be a parameter to the function? If it is a constant, can you please give it a name and use that instead?

This comment has been minimized.

Copy link
@frank-dong-ms

frank-dong-ms Nov 26, 2019

Author Member

Yes, this should, this number is not reliable now so just put the number directly, let's discuss more in person.


In reply to: 350531730 [](ancestors = 350531730)

@@ -402,7 +402,7 @@ private static DataViewRowCursor ConsolidateCore(IChannelProvider provider, Data
// The waiter event should never be null since this is only
// called after a point where waiter.Register has been called.
ch.AssertValue(waiterEvent);
waiterEvent.Wait();
waiterEvent.Wait(5*60*1000);
waiterEvent = null;

This comment has been minimized.

Copy link
@harishsk

harishsk Nov 26, 2019

Member

Similar comment as the one in OrderedWaiter.cs. If the same constant applies everywhere we should use a name.

public void OnReload()
{
if (!_cts.IsCancellationRequested)
_cts.Cancel();

This comment has been minimized.

Copy link
@harishsk

harishsk Nov 26, 2019

Member

The code seems to be saying if cancellation is not requested, call Cancel(). Am I reading that correctly? Can you please describe the root cause and the fix a bit more thoroughly in the pull request?

This comment has been minimized.

Copy link
@frank-dong-ms

frank-dong-ms Nov 26, 2019

Author Member

This means if the cancellation has not already been requested, then cancel the token.
we should always add this check as calling cancel twice will crash the process. This is one of the reason that test process is crashing.


In reply to: 350532410 [](ancestors = 350532410)

This comment has been minimized.

Copy link
@eerhardt

eerhardt Nov 26, 2019

Member

This is one of the reason that test process is crashing.

Can you provide a test log that shows this is where the test process is crashing? I don't see .Cancel() causing a process crash.

I made a console app that calls Cancel multiple times, and it doesn't even throw an exception.

This comment has been minimized.

Copy link
@sharwell

sharwell Dec 4, 2019

Member

we should always add this check as calling cancel twice will crash the process. This is one of the reason that test process is crashing.

This is not correct. Cancellation a second time is a NOP, and Cancel itself is guarded by a check of IsCancellationRequested.

_cts.Cancel();
else
Console.WriteLine("ModelReloadToken has already been Canceled.");
}

This comment has been minimized.

Copy link
@harishsk

harishsk Nov 26, 2019

Member

ML.NET is a library that can be used in environments where there is no console. I don't think we should be using Console.WriteLine here.
What is the desired behavior? Is this for information only to the user or is this an error condition?

This comment has been minimized.

Copy link
@frank-dong-ms

frank-dong-ms Nov 26, 2019

Author Member

This is just to output some info into test command, this is not for user but rather a error info, I just output some info to see whether this is common test crash case.


In reply to: 350532971 [](ancestors = 350532971)

This comment has been minimized.

Copy link
@eerhardt

eerhardt Nov 26, 2019

Member

We should not be writing to the console inside of our libraries.

Copy link
Member

harishsk left a comment

🕐

public void OnReload() => _cts.Cancel();
public void OnReload()
{
if (!_cts.IsCancellationRequested)

This comment has been minimized.

Copy link
@sharwell
@@ -90,7 +90,7 @@ public void Wait(long position, CancellationToken token = default(CancellationTo
ev = new WaitStats(position);
_waiters.Add(ev);
}
ev.Event.Wait(token);
ev.Event.Wait(5*60*1000, token);

This comment has been minimized.

Copy link
@sharwell

sharwell Dec 4, 2019

Member

📝 This change should be reverted. This is production code, not test code. If you want to add a delay, update the tests to pass a CancellationToken that will cancel after a period of time.

@@ -402,7 +402,7 @@ private static DataViewRowCursor ConsolidateCore(IChannelProvider provider, Data
// The waiter event should never be null since this is only
// called after a point where waiter.Register has been called.
ch.AssertValue(waiterEvent);
waiterEvent.Wait();
waiterEvent.Wait(5*60*1000);

This comment has been minimized.

Copy link
@sharwell

sharwell Dec 4, 2019

Member

📝 This change should be reverted for the same reason I mentioned for OrderedWaiter.cs.

Copy link
Member

sharwell left a comment

All changes to production code in this pull request should be reverted before merge. Explanation given in individual comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.