Skip to content

add change test to FS_event#55

Merged
aantron merged 10 commits intoaantron:masterfrom
glennsl:test/fs_event/change
Apr 2, 2020
Merged

add change test to FS_event#55
aantron merged 10 commits intoaantron:masterfrom
glennsl:test/fs_event/change

Conversation

@glennsl
Copy link
Copy Markdown
Contributor

@glennsl glennsl commented Apr 1, 2020

While trying to troubleshoot a test failure in Onivim relating to fs_event I added this change test to Luv's test suite. Our test only fails on macOS, and while this test passes on both Linux and macOS it still times out on macOS only (or runs indefinitely without the timeout).

I'm having issues with other tests hanging on both platforms as well, so I'm wondering if this specific issue happens on the CI or for others tool, or if it's just me doing something wrong.

@glennsl
Copy link
Copy Markdown
Contributor Author

glennsl commented Apr 1, 2020

Actually, that seems to be an issue with the timeout on all platforms. Perhaps I just don't understand how to use it.

The test is pretty slow at picking up the change event on macOS though. It takes about a second or so.

@glennsl glennsl changed the title [Experiment] add change test to FS_event add change test to FS_event Apr 1, 2020
@aantron
Copy link
Copy Markdown
Owner

aantron commented Apr 1, 2020

I will take a look locally tomorrow morning.

As for CI, the main CI in this repo is Travis. It is running here. It hasn't been reporting status for several of my repos lately, ever since some maintenance Travis did about a week ago.

@aantron
Copy link
Copy Markdown
Owner

aantron commented Apr 1, 2020

One notable thing about the Travis CI is that it should be much faster than the GH Actions setup in this repo, because of caching. So, you can use it for faster iteration.

@aantron
Copy link
Copy Markdown
Owner

aantron commented Apr 2, 2020

The test...

Actually, that seems to be an issue with the timeout on all platforms.

What does this refer to? At least on WSL, the test appears to have finished immediately.

I wonder if the issue in the test on macOS is a race condition. What if you insert, say, a 100ms delay before this?

You can also sample the current time after the delay, and sample it again in the callback, and check that the delta is less than some small value.

@glennsl
Copy link
Copy Markdown
Contributor Author

glennsl commented Apr 2, 2020

What does this refer to? At least on WSL, the test appears to have finished immediately.

With run ~with_timeout:true (), on both Arch and macOS, the test will not finish until timing out. It will still succeed, since all conditions have been met, but seems like run won't stop until all pending operations have completed.

I wonder if the issue in the test on macOS is a race condition. What if you insert, say, a 100ms delay before this?

It seems to make no difference on my Macbook. I'll try it out on the CI too, but want to do a run with just the timings first, to get a baseline.

You can also sample the current time after the delay, and sample it again in the callback, and check that the delta is less than some small value.

Good idea. I used float to compare against 0. with a tolerance of 0.1 so that the actual time will get printed if it fails. I'm not very familiar with Alcotest, so not sure if this is the most elegant way of doing it, but it seems to do the job.

On my Macbook it fails with a delay between 0.9 and 1.3 seconds, and around 1.2 being typical.

@glennsl
Copy link
Copy Markdown
Contributor Author

glennsl commented Apr 2, 2020

Well, we would've gotten a baseline if it didn't hang on macOS and if it had ran tests on the Github Action VMs, but alas... Let's try the 100ms delay then.

@aantron
Copy link
Copy Markdown
Owner

aantron commented Apr 2, 2020

You can run the tests in GH Actions also, just edit this line

- run: opam exec -- dune build -p luv

to be dune runtest instead of dune build.

I had them disabled because of some compatibility issue during the Windows port. I will later go back and see if they work now (if you don't already try it in this PR).

@aantron
Copy link
Copy Markdown
Owner

aantron commented Apr 2, 2020

run won't stop until all pending operations have completed.

That's correct. The default behavior of run is to run until there are no active handles, such as FS_event handles, or until stop is called. Try closing the handle in the callback with Luv.Handle.close event. I wonder if that's a difference on macOS, though I wouldn't expect so, since it wasn't needed in other tests.

It's also possible some other test is interfering with this test's run. Try commenting out all the other test suites in test/tester.ml and all other tests in test/FS_event.ml.

@aantron
Copy link
Copy Markdown
Owner

aantron commented Apr 2, 2020

If you need to investigate what's happening at the C level, you can directly edit the code of libuv in your recursively cloned submodule. Of course, this is only good for local testing. I've previously inserted fprintfs into it to see what was happening.

The Luv build system should pick up the change, so you can just run the tests again. If not, make clean will force a full build. It will take longer each time, though, because the libuv build is opaque to Luv and so is not incremental.

@aantron
Copy link
Copy Markdown
Owner

aantron commented Apr 2, 2020

Ah yeah, the install line above should also have --with-test added.

@glennsl
Copy link
Copy Markdown
Contributor Author

glennsl commented Apr 2, 2020

Adding a delay with Luv.Timer seems to have done the trick! And the fact that Unix.sleep didn't seems to suggest that the race condition is in libuv, not in macOS.

I've tried the same with the test that was failing for us, and the result of running it locally was promising at least.

I'm unsure if it's worth pursuing this further, as the conditions that trigger it seem to be pretty artificial. It's definitely useful to know when writing tests, but the workaround is pretty straightforward. I should at least add a comment to this test explaining why it's needed, but it might be an idea to document it elsewhere as well.

The default behavior of run is to run until there are no active handles, such as FS_event handles, or until stop is called.

Oh, of course. For some reason I thought Luv_FS_event.stop was Luv.Loop.stop 🤷‍♂️ That does seem to make the usefulness of ~with_timeout a bit limited though, since if you use it you have to change the test to manually cancel the entire loop on success. Or just let it run out I guess, which perhaps isn't so bad if it's just used in a few places.

@aantron
Copy link
Copy Markdown
Owner

aantron commented Apr 2, 2020

Have you tried running the test case in isolation? It's possible that the race condition is in the test suite, and is due to some kind of interaction between lingering handles from different tests.

with_timeout is a holdover from an earlier I approach I used to testing, a long time ago. If you don't need it after this PR, I should delete it.

@glennsl
Copy link
Copy Markdown
Contributor Author

glennsl commented Apr 2, 2020

I'm not sure how to run a single test in isolation. Other than commenting out the other tests in fs_event I guess. It doesn't really matter now though, as I don't actually need it anymore, so feel free to remove it!

@aantron aantron merged commit a67ac24 into aantron:master Apr 2, 2020
@aantron
Copy link
Copy Markdown
Owner

aantron commented Apr 2, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants