test_io fails in our CI (poll revents=POLLHUP on stdin)

I'm investigating a weird issue, the `test_io` tests started failing deterministicaly within our CI environment. A previously-working combination of libyang+libnetconf2 started failing. The only change which failed is an update of the underlying OS (Fedora 29).

```
 9/11 Test  #1: test_io ..........................***Failed    0.06 sec
[==========] Running 4 test(s).
[ RUN      ] test_write_rpc_10
[ERR]: Session 0: communication socket unexpectedly closed.
[  ERROR   ] --- 0 != 0x5
[   LINE   ] --- /home/ci/src/cesnet-gerrit-czechlight/CzechLight/dependencies/libnetconf2/tests/test_io.c:107: error: Failure!
[  FAILED  ] test_write_rpc_10
[ RUN      ] test_write_rpc_10_bad
[ERR]: Session 0: invalid session to send RPCs.
[       OK ] test_write_rpc_10_bad
[ RUN      ] test_write_rpc_11
[ERR]: Session 0: communication socket unexpectedly closed.
[  ERROR   ] --- 0 != 0x5
[   LINE   ] --- /home/ci/src/cesnet-gerrit-czechlight/CzechLight/dependencies/libnetconf2/tests/test_io.c:107: error: Failure!
[  FAILED  ] test_write_rpc_11
[ RUN      ] test_write_rpc_11_bad
[ERR]: Session 0: invalid session to send RPCs.
[       OK ] test_write_rpc_11_bad
[==========] 4 test(s) run.
[  PASSED  ] 2 test(s).
[  FAILED  ] 2 test(s), listed below:
[  FAILED  ] test_write_rpc_10
[  FAILED  ] test_write_rpc_11

 2 FAILED TEST(S)
```

That didn't make much sense to me, so `strace` to the rescue (see the [full output](https://ci-logs.gerrit.cesnet.cz/t/public/logs//11/1711/10/check/f29-gcc/5889969/job-output.txt.gz), available over IPv6 only). Here's the relevant part:
```
poll([{fd=0, events=POLLIN}], 1, 0)     = 1 ([{fd=0, revents=POLLHUP}])
write(2, "[ERR]: Session 0: communication "..., 60[ERR]: Session 0: communication socket unexpectedly closed.
) = 60
```
I.e, `poll()` returns with `POLLHUP` for `stdin`.

I tried to reproduce this locally (**outside** of the CI system, running the same code within a Fedora 29 VM, but over SSH). An easy way of getting the same behavior as our failure is this:
```shell-session
$ true | ./tests/test_io
[==========] Running 4 test(s).
[ RUN      ] test_write_rpc_10
[ERR]: Session 0: communication socket unexpectedly closed.
[  ERROR   ] --- 0 != 0x5
[   LINE   ] --- /home/ci/dependencies/libnetconf2/tests/test_io.c:107: error: Failure!
[  FAILED  ] test_write_rpc_10
[ RUN      ] test_write_rpc_10_bad
[ERR]: Session 0: invalid session to send RPCs.
```
And as shown under `strace`:
```
$ true | strace -e trace=poll ./tests/test_io
[==========] Running 4 test(s).
[ RUN      ] test_write_rpc_10
poll([{fd=0, events=POLLIN}], 1, 0)     = 1 ([{fd=0, revents=POLLHUP}])
[ERR]: Session 0: communication socket unexpectedly closed.
[  ERROR   ] --- 0 != 0x5
[   LINE   ] --- /home/ci/dependencies/libnetconf2/tests/test_io.c:107: error: Failure!
[  FAILED  ] test_write_rpc_10
```
Now, there has been no change to the way how our CI launches its tests, which is something that puzzles me. It's done via [Zuul](https://zuul-ci.org/docs/zuul/), and the actual system that executes stuff on the build node is Ansible. I have no idea how it deals with `stdin` of the launched processes. I also cannot answer why these tests passed with the same version libnetconf2. The only change that I am aware of is a reinstallation of the build VM (+container), but it is once again done automatically and driven by the same command...

- Would it make sense to operate `test_io` on a pipe created for that purpose? That would make the test more robust to random changes in its environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_io fails in our CI (poll revents=POLLHUP on stdin) #153

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

test_io fails in our CI (poll revents=POLLHUP on stdin) #153

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions