Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileSystemWatcher tests fail intermittently on Unix due to too many open inotify instances #16208

Closed
tmat opened this issue Jan 24, 2016 · 16 comments · Fixed by dotnet/corefx#11721

Comments

@tmat
Copy link
Member

tmat commented Jan 24, 2016

http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/centos7.1_release_tst_prtest/1094/console

19:32:38    FileSystemWatcherTests.FileSystemWatcher_EnableRaisingEvents [FAIL]
19:32:38       System.IO.IOException : The configured user limit (128) on the number of inotify instances has been reached.
19:32:38       Stack Trace:
19:32:38             at System.IO.FileSystemWatcher.StartRaisingEvents()
19:32:38             at FileSystemWatcherTests.FileSystemWatcher_EnableRaisingEvents()
19:32:38 Finished:    System.IO.FileSystem.Watcher.Tests
@stephentoub
Copy link
Member

By default the number of inotify instances allowed is relatively low (128), and we use one to back FileSystemWatcher. If the machine on which the tests are running happens to be doing other things at the same time that creates inotify instances, we could bump up against this.

@mmitche, can we up the default limits on the machines used in CI? I believe this can be done by adding a line like the following to the end of the /etc/sysctl.conf file:

fs.inotify.max_user_instances=1024

@mmitche
Copy link
Member

mmitche commented Jan 25, 2016

@stephentoub Yeah....does that need to be updated on all of the *nix machines?

@stephentoub
Copy link
Member

Thanks, Matt. Only the Linux ones.

@mmitche
Copy link
Member

mmitche commented Jan 26, 2016

@stephentoub This is being done on the new dynamically allocated image

@stephentoub
Copy link
Member

Thanks.

@justinvp
Copy link
Contributor

Ran in to this again here:

http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/ubuntu_debug_tst_prtest/2902/consoleFull#-21371504001f1a4601-6aec-4fd5-b678-78d4389fd5e8

02:18:22    FileSystemWatcherTests.FileSystemWatcher_WatchingAliasedFolderResolvesToRealPathWhenWatching [FAIL]
02:18:22       System.IO.IOException : The configured user limit (1024) on the number of inotify instances has been reached.
02:18:22       Stack Trace:
02:18:22             at System.IO.FileSystemWatcher.StartRaisingEvents()
02:18:22             at System.IO.FileSystemWatcher.StartRaisingEventsIfNotDisposed()
02:18:22             at System.IO.FileSystemWatcher.set_EnableRaisingEvents(Boolean value)
02:18:22             at FileSystemWatcherTests.FileSystemWatcher_WatchingAliasedFolderResolvesToRealPathWhenWatching()

And here:

http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/ubuntu_release_tst_prtest/2920/consoleFull#-21371504001f1a4601-6aec-4fd5-b678-78d4389fd5e8

01:52:13    FileSystemWatcherTests.FileSystemWatcher_StopCalledOnBackgroundThreadDoesNotDeadlock [FAIL]
01:52:13       System.IO.IOException : The configured user limit (1024) on the number of inotify instances has been reached.
01:52:13       Stack Trace:
01:52:13             at System.IO.FileSystemWatcher.StartRaisingEvents()
01:52:13             at FileSystemWatcherTests.FileSystemWatcher_StopCalledOnBackgroundThreadDoesNotDeadlock()

@stephentoub
Copy link
Member

@mmitche, is it possible that multiple corefx test runs might be running on the same CI machine at the same time? The error message highlights that the larger limit is being used, but we're still hitting it, which implies to me either a) we're not properly disposing some of these, b) multiple jobs are running concurrently on the same machine and some of our tests that push these values to the limit are causing problems for other test processes, or c) something else on the machine is using a ton of inotify instances.

cc: @sokket

@stephentoub stephentoub reopened this Mar 16, 2016
@mmitche
Copy link
Member

mmitche commented Mar 17, 2016

@stephentoub Except for OSX, no.

@stephentoub stephentoub changed the title FileSystemWatcher_EnableRaisingEvents fails intermittently on CentOS FileSystemWatcher_EnableRaisingEvents fails intermittently on Unix Mar 20, 2016
@stephentoub
Copy link
Member

@stephentoub
Copy link
Member

@stephentoub stephentoub changed the title FileSystemWatcher_EnableRaisingEvents fails intermittently on Unix FileSystemWatcher tests fail intermittently on Unix due to too many open inotify instances Mar 20, 2016
@stephentoub
Copy link
Member

@joshfree joshfree assigned Priya91 and unassigned mmitche Apr 1, 2016
@joshfree joshfree assigned ianhays and unassigned Priya91 May 9, 2016
@joshfree
Copy link
Member

joshfree commented May 9, 2016

@ianhays can you take a look since you're already working on FSW related test issues?

@ianhays
Copy link
Contributor

ianhays commented May 9, 2016

In dotnet/corefx#8231 I added code to guard against failing when a test hits the inotify limit, so the FSW tests should be fine. It's still possible to hit the limit but it's significantly less likely now, especially in an innerloop run.

I'm closing this as (presumably) resolved for now.

@ianhays ianhays closed this as completed May 9, 2016
@ericeil
Copy link
Contributor

ericeil commented Oct 3, 2016

@ericeil ericeil reopened this Oct 3, 2016
@ianhays
Copy link
Contributor

ianhays commented Oct 3, 2016

It makes sense that we're hitting this limit again on Outerloop runs. If we're hitting it on innerloop then something else is wrong.

We're hitting it on Outerloop runs because the CreateManyConcurrent test is creating new watchers up to the inotify limit to verify the exception that we throw in that case. The problem is that we run tests in parallel and if another test tries to make a new FSW during or shortly after the CreateManyConcurrent test then it will unexpectedly fail by no fault of its own.

This will be resolved by dotnet/corefx#11721

@ericeil
Copy link
Contributor

ericeil commented Oct 3, 2016

Yes, this was an Outerloop run.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants