New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileSystemWatcher tests fail intermittently on Unix due to too many open inotify instances #5660

Closed
tmat opened this Issue Jan 24, 2016 · 16 comments

Comments

Projects
None yet
8 participants
@tmat
Member

tmat commented Jan 24, 2016

http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/centos7.1_release_tst_prtest/1094/console

19:32:38    FileSystemWatcherTests.FileSystemWatcher_EnableRaisingEvents [FAIL]
19:32:38       System.IO.IOException : The configured user limit (128) on the number of inotify instances has been reached.
19:32:38       Stack Trace:
19:32:38             at System.IO.FileSystemWatcher.StartRaisingEvents()
19:32:38             at FileSystemWatcherTests.FileSystemWatcher_EnableRaisingEvents()
19:32:38 Finished:    System.IO.FileSystem.Watcher.Tests
@stephentoub

This comment has been minimized.

Member

stephentoub commented Jan 25, 2016

By default the number of inotify instances allowed is relatively low (128), and we use one to back FileSystemWatcher. If the machine on which the tests are running happens to be doing other things at the same time that creates inotify instances, we could bump up against this.

@mmitche, can we up the default limits on the machines used in CI? I believe this can be done by adding a line like the following to the end of the /etc/sysctl.conf file:

fs.inotify.max_user_instances=1024
@mmitche

This comment has been minimized.

Member

mmitche commented Jan 25, 2016

@stephentoub Yeah....does that need to be updated on all of the *nix machines?

@stephentoub

This comment has been minimized.

Member

stephentoub commented Jan 25, 2016

Thanks, Matt. Only the Linux ones.

@mmitche

This comment has been minimized.

Member

mmitche commented Jan 26, 2016

@stephentoub This is being done on the new dynamically allocated image

@stephentoub

This comment has been minimized.

Member

stephentoub commented Jan 27, 2016

Thanks.

@justinvp

This comment has been minimized.

Collaborator

justinvp commented Mar 16, 2016

Ran in to this again here:

http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/ubuntu_debug_tst_prtest/2902/consoleFull#-21371504001f1a4601-6aec-4fd5-b678-78d4389fd5e8

02:18:22    FileSystemWatcherTests.FileSystemWatcher_WatchingAliasedFolderResolvesToRealPathWhenWatching [FAIL]
02:18:22       System.IO.IOException : The configured user limit (1024) on the number of inotify instances has been reached.
02:18:22       Stack Trace:
02:18:22             at System.IO.FileSystemWatcher.StartRaisingEvents()
02:18:22             at System.IO.FileSystemWatcher.StartRaisingEventsIfNotDisposed()
02:18:22             at System.IO.FileSystemWatcher.set_EnableRaisingEvents(Boolean value)
02:18:22             at FileSystemWatcherTests.FileSystemWatcher_WatchingAliasedFolderResolvesToRealPathWhenWatching()

And here:

http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/ubuntu_release_tst_prtest/2920/consoleFull#-21371504001f1a4601-6aec-4fd5-b678-78d4389fd5e8

01:52:13    FileSystemWatcherTests.FileSystemWatcher_StopCalledOnBackgroundThreadDoesNotDeadlock [FAIL]
01:52:13       System.IO.IOException : The configured user limit (1024) on the number of inotify instances has been reached.
01:52:13       Stack Trace:
01:52:13             at System.IO.FileSystemWatcher.StartRaisingEvents()
01:52:13             at FileSystemWatcherTests.FileSystemWatcher_StopCalledOnBackgroundThreadDoesNotDeadlock()
@stephentoub

This comment has been minimized.

Member

stephentoub commented Mar 16, 2016

@mmitche, is it possible that multiple corefx test runs might be running on the same CI machine at the same time? The error message highlights that the larger limit is being used, but we're still hitting it, which implies to me either a) we're not properly disposing some of these, b) multiple jobs are running concurrently on the same machine and some of our tests that push these values to the limit are causing problems for other test processes, or c) something else on the machine is using a ton of inotify instances.

cc: @sokket

@stephentoub stephentoub reopened this Mar 16, 2016

@mmitche

This comment has been minimized.

Member

mmitche commented Mar 17, 2016

@stephentoub Except for OSX, no.

@stephentoub stephentoub changed the title from FileSystemWatcher_EnableRaisingEvents fails intermittently on CentOS to FileSystemWatcher_EnableRaisingEvents fails intermittently on Unix Mar 20, 2016

@stephentoub

This comment has been minimized.

@stephentoub

This comment has been minimized.

@stephentoub stephentoub changed the title from FileSystemWatcher_EnableRaisingEvents fails intermittently on Unix to FileSystemWatcher tests fail intermittently on Unix due to too many open inotify instances Mar 20, 2016

@stephentoub

This comment has been minimized.

@joshfree joshfree assigned Priya91 and unassigned mmitche Apr 1, 2016

@joshfree joshfree assigned ianhays and unassigned Priya91 May 9, 2016

@joshfree

This comment has been minimized.

Member

joshfree commented May 9, 2016

@ianhays can you take a look since you're already working on FSW related test issues?

@ianhays

This comment has been minimized.

Member

ianhays commented May 9, 2016

In #8231 I added code to guard against failing when a test hits the inotify limit, so the FSW tests should be fine. It's still possible to hit the limit but it's significantly less likely now, especially in an innerloop run.

I'm closing this as (presumably) resolved for now.

@ianhays ianhays closed this May 9, 2016

@ericeil

This comment has been minimized.

@ericeil ericeil reopened this Oct 3, 2016

@ianhays

This comment has been minimized.

Member

ianhays commented Oct 3, 2016

It makes sense that we're hitting this limit again on Outerloop runs. If we're hitting it on innerloop then something else is wrong.

We're hitting it on Outerloop runs because the CreateManyConcurrent test is creating new watchers up to the inotify limit to verify the exception that we throw in that case. The problem is that we run tests in parallel and if another test tries to make a new FSW during or shortly after the CreateManyConcurrent test then it will unexpectedly fail by no fault of its own.

This will be resolved by #11721

@ericeil

This comment has been minimized.

Contributor

ericeil commented Oct 3, 2016

Yes, this was an Outerloop run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment