add change test to FS_event by glennsl · Pull Request #55 · aantron/luv

glennsl · 2020-04-01T20:44:30Z

While trying to troubleshoot a test failure in Onivim relating to fs_event I added this change test to Luv's test suite. Our test only fails on macOS, and while this test passes on both Linux and macOS it still times out on macOS only (or runs indefinitely without the timeout).

I'm having issues with other tests hanging on both platforms as well, so I'm wondering if this specific issue happens on the CI or for others tool, or if it's just me doing something wrong.

glennsl · 2020-04-01T20:53:22Z

Actually, that seems to be an issue with the timeout on all platforms. Perhaps I just don't understand how to use it.

The test is pretty slow at picking up the change event on macOS though. It takes about a second or so.

aantron · 2020-04-01T20:55:07Z

I will take a look locally tomorrow morning.

As for CI, the main CI in this repo is Travis. It is running here. It hasn't been reporting status for several of my repos lately, ever since some maintenance Travis did about a week ago.

aantron · 2020-04-01T20:56:02Z

One notable thing about the Travis CI is that it should be much faster than the GH Actions setup in this repo, because of caching. So, you can use it for faster iteration.

aantron · 2020-04-02T05:03:46Z

The test...

Passes for me locally on WSL.
Passes for me locally on MinGW.
Hangs in Travis on macOS.
Passes in Travis on Ubuntu.

Actually, that seems to be an issue with the timeout on all platforms.

What does this refer to? At least on WSL, the test appears to have finished immediately.

I wonder if the issue in the test on macOS is a race condition. What if you insert, say, a 100ms delay before this?

You can also sample the current time after the delay, and sample it again in the callback, and check that the delta is less than some small value.

glennsl · 2020-04-02T10:04:18Z

What does this refer to? At least on WSL, the test appears to have finished immediately.

With run ~with_timeout:true (), on both Arch and macOS, the test will not finish until timing out. It will still succeed, since all conditions have been met, but seems like run won't stop until all pending operations have completed.

I wonder if the issue in the test on macOS is a race condition. What if you insert, say, a 100ms delay before this?

It seems to make no difference on my Macbook. I'll try it out on the CI too, but want to do a run with just the timings first, to get a baseline.

You can also sample the current time after the delay, and sample it again in the callback, and check that the delta is less than some small value.

Good idea. I used float to compare against 0. with a tolerance of 0.1 so that the actual time will get printed if it fails. I'm not very familiar with Alcotest, so not sure if this is the most elegant way of doing it, but it seems to do the job.

On my Macbook it fails with a delay between 0.9 and 1.3 seconds, and around 1.2 being typical.

glennsl · 2020-04-02T10:11:10Z

Well, we would've gotten a baseline if it didn't hang on macOS and if it had ran tests on the Github Action VMs, but alas... Let's try the 100ms delay then.

aantron · 2020-04-02T10:34:42Z

You can run the tests in GH Actions also, just edit this line

luv/.github/workflows/ci.yml

Line 32 in a2f83d2

- run: opam exec -- dune build -p luv

to be dune runtest instead of dune build.

I had them disabled because of some compatibility issue during the Windows port. I will later go back and see if they work now (if you don't already try it in this PR).

aantron · 2020-04-02T10:38:39Z

run won't stop until all pending operations have completed.

That's correct. The default behavior of run is to run until there are no active handles, such as FS_event handles, or until stop is called. Try closing the handle in the callback with Luv.Handle.close event. I wonder if that's a difference on macOS, though I wouldn't expect so, since it wasn't needed in other tests.

It's also possible some other test is interfering with this test's run. Try commenting out all the other test suites in test/tester.ml and all other tests in test/FS_event.ml.

aantron · 2020-04-02T10:42:56Z

If you need to investigate what's happening at the C level, you can directly edit the code of libuv in your recursively cloned submodule. Of course, this is only good for local testing. I've previously inserted fprintfs into it to see what was happening.

The Luv build system should pick up the change, so you can just run the tests again. If not, make clean will force a full build. It will take longer each time, though, because the libuv build is opaque to Luv and so is not incremental.

aantron · 2020-04-02T11:08:31Z

Ah yeah, the install line above should also have --with-test added.

glennsl · 2020-04-02T11:10:14Z

Adding a delay with Luv.Timer seems to have done the trick! And the fact that Unix.sleep didn't seems to suggest that the race condition is in libuv, not in macOS.

I've tried the same with the test that was failing for us, and the result of running it locally was promising at least.

I'm unsure if it's worth pursuing this further, as the conditions that trigger it seem to be pretty artificial. It's definitely useful to know when writing tests, but the workaround is pretty straightforward. I should at least add a comment to this test explaining why it's needed, but it might be an idea to document it elsewhere as well.

The default behavior of run is to run until there are no active handles, such as FS_event handles, or until stop is called.

Oh, of course. For some reason I thought Luv_FS_event.stop was Luv.Loop.stop 🤷‍♂️ That does seem to make the usefulness of ~with_timeout a bit limited though, since if you use it you have to change the test to manually cancel the entire loop on success. Or just let it run out I guess, which perhaps isn't so bad if it's just used in a few places.

aantron · 2020-04-02T11:14:28Z

Have you tried running the test case in isolation? It's possible that the race condition is in the test suite, and is due to some kind of interaction between lingering handles from different tests.

with_timeout is a holdover from an earlier I approach I used to testing, a long time ago. If you don't need it after this PR, I should delete it.

glennsl · 2020-04-02T11:19:03Z

I'm not sure how to run a single test in isolation. Other than commenting out the other tests in fs_event I guess. It doesn't really matter now though, as I don't actually need it anymore, so feel free to remove it!

aantron · 2020-04-02T12:25:37Z

Thanks!

glennsl added 3 commits April 1, 2020 21:36

add change test to FS_event

059f5db

add timing

25613d8

remove timeout and timings

bf0496d

glennsl changed the title ~~[Experiment] add change test to FS_event~~ add change test to FS_event Apr 1, 2020

add check for max delay

0e0b78c

glennsl added 2 commits April 2, 2020 12:11

add 100ms delay before modification

816eece

use Luv.Timer instead of Unix.sleepf for the delay

3ed54e1

glennsl added 2 commits April 2, 2020 12:46

run tests on github action VMs

a55b95e

install alcotest

15ee3e9

use --with-test instead of installing alcotest explicitly

f7beace

aantron mentioned this pull request Apr 2, 2020

Find out what causes the race condition in #55 #57

Open

Tweaks

d372cfd

aantron merged commit a67ac24 into aantron:master Apr 2, 2020

glennsl deleted the test/fs_event/change branch April 2, 2020 12:32

This was referenced Apr 19, 2020

Luv 0.5.2: libuv binding ocaml/opam-repository#16225

Closed

Luv 0.5.2: libuv binding ocaml/opam-repository#16285

Merged

bryphe mentioned this pull request Apr 3, 2021

File explorer did not reload automatically onivim/oni2#3350

Closed

Uh oh!

Conversation

glennsl commented Apr 1, 2020

Uh oh!

glennsl commented Apr 1, 2020

Uh oh!

aantron commented Apr 1, 2020

Uh oh!

aantron commented Apr 1, 2020

Uh oh!

aantron commented Apr 2, 2020

Uh oh!

glennsl commented Apr 2, 2020

Uh oh!

glennsl commented Apr 2, 2020

Uh oh!

aantron commented Apr 2, 2020

Uh oh!

aantron commented Apr 2, 2020

Uh oh!

aantron commented Apr 2, 2020

Uh oh!

aantron commented Apr 2, 2020

Uh oh!

glennsl commented Apr 2, 2020

Uh oh!

aantron commented Apr 2, 2020

Uh oh!

glennsl commented Apr 2, 2020

Uh oh!

aantron commented Apr 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants