unit tests sometimes have sporadic failures #7

brarcher · 2015-12-23T06:30:01Z

It has been observed at least on OSX that some of the unit tests sometimes have sporadic failures. Following are some example failures as output by tests/run:

-n o tests/ts1.sh: 
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `\'v\217\347' and `\376"i\3317.\345\250V\333w>\346\311\203\034\316=\337~\233n\320\325\005\371\320Sp\301|\247"\036\024\221\247\016\213\222;\256=<c&\3224'.

-n  o tests/tr2.sh: 
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `6\030v\317\3133о\366|\263htv9\0173\340\2421\275F\a\232\360DJ\017\233\037:)\241\023\375\350.\272\r<6\201\002\330\203+\005\221#\355$\343\321F\357T\036\264g[>]\344\200Ę\265s\236\031E\302-\220ǰܺob<\210\004\32415\246\300{ˏ\030\270xrژ\335/\243_ވ8\255y\a\177\362\234!\251N\336\322\371\325p\024\f\241\353&#6\371\204\313\020V\031\311\210V\302\004\\\237\374\316\215!i\357s\231,P\373+\346\303\310tX\300\355\177\247R\347u:3bA-\2148\03114\361\271k\241\376\247/\033\271S\\|,\a#\200w\237\374\002\232!\024\316\346\371C\017\370\354˕\343\241\301\244\025\2763\000iÜP\340\021.\001\301\246\304\363\233\266\022!\030\232L\024\204\311K\030\340\3249.\310\354\a_\t\374{j.$0\021q\267\252<\021\023\260\301Z\235m\005\330H\342~\016\t\242\310\303Oڏ\210S\311\177\275\240\345AwQ g\334\370\302\336\021\207\r}`;8\326Ҵ\270.\363q6\325J,\234(\253QƼ\226V\310\301$W\231A\273\033\000\251\274ѥ\321\322\027\320\000뚦{@\277~-\205\343݅E\200\341\032\203\240\027\3338\366z\351CM6\177C\201\312(N\273\346\201d\200\032\177\371*\177\sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `\rX\266\204(a) (b)' and `\rX\266\204'.

Some of the unit tests when they fail do not emit output to help diagnose the failure. Here is an example invoking tests/ab.sh directly:

$ rc=0
$ attempt=1
$ while [ $rc -eq 0 ]; do tests/ab.sh bin/radamsa ; rc=$?; echo $attempt; attempt=$((attempt+1)); done
1
2
3
4
5
6
7
8
9
10
11
12
$

After 12 attempts a failure was observed, but the reason for the failure is not emitted.

Likely it is the expectation that the unit test results be consistent. If the current revision in git is under development and the sporadic failure is expected or some cleanup is still underway, kindly ignore. I was unable to determine if release v0.4's unit tests encountered sporadic failures as issue #5 affects the v0.4 release.

As a comparison, release v0.3 had consistently passing unit tests.

(As a side note, at least on OSX the built-in echo command in sh does not support the -n option. This is the reason that "-n" is printed before all of the tests in tests/run . Consider reworking when echo is used in that script so that the -n option is unnecessary, if relevant).

The text was updated successfully, but these errors were encountered:

aoh · 2015-12-23T08:51:53Z

Hi,

Great! Build issues are very welcome. I'll fix soonish when I have spare time. My *BSD buildbots are currently offline, so might be that I haven't noticed some issue on BSDish platforms.

There have been many issues with unit tests on OSX due to minor differences in arguments etc. Might make sense to use the version of owl used for building also for sorting and echoing in tests.

aoh · 2015-12-23T18:22:54Z

So the behavior is intended, because radamsa is supposed to pad the input with a low probability with random data, if the input is very short. This is done to improve coverage of test with tiny fixed inputs. The only issues is with OSX sort getting confused by non-textual data. This does not matter, because the tests are probabilistic and are expected to fail on occasion, in which case they are tried again many times before tests/run considers them to fail. Sort stderr is now piped to /dev/null, so it doesn't get in the way.

Does the build work otherwise on OSX?

brarcher · 2015-12-23T19:22:01Z

The only issues is with OSX sort getting confused by non-textual data.

Oh, I did not realize that the output was not representative of a failure. Sorry for the false alarm on those tests.

I've modified ab.sh as follows to determine why it is sometimes failing:

# check bad string insertion happens as intended (more likely within quoted area)
mkdir -p tmp

echo '-----------------------------------------------------------------""---------------------------------------------------------------------------' \
   | $@ -m ab -p od -n 20 > tmp/ab.sh.tmp
cat tmp/ab.sh.tmp | grep -q '^-*\".*%.*\"-*$'
rc=$?
if [ $rc -ne 0 ]; then
   echo "Unexpected output:"
   cat tmp/ab.sh.tmp
   exit 1
fi

Attached is one the output from one such test run which failed.

ab.sh.tmp

This does not matter, because the tests are probabilistic and are expected to fail on occasion, in which case they are tried again many times before tests/run considers them to fail.

When building Radamsa the unit tests automatically run. As unit tests may sporadically fail, could the unit tests be sectioned into their own make target (e.g. "make check") so that they do not run automatically? Otherwise, one may build Radamsa, see a failure, and wonder if the sporadic failure indicates an issue with Radamsa on one's platform.

Sure, I can understand that Radamsa, being probabilistic, may not always fuzz as expected. However, is it possible to modify the unit tests which may fail with some probability so that the tests will be more deterministic?

Does the build work otherwise on OSX?

Seems to work so far. I've not tried the TCP stuff yet, which looks interesting.

aoh · 2015-12-24T07:35:00Z

Makes sense. There is now a separate test build target, which runs the tests if necessary. I also added a thanks-section to readme.md.

aoh added a commit that referenced this issue Dec 23, 2015

ignoring sort errors in tests (related to issue #7)

b2002d4

aoh closed this as completed Dec 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unit tests sometimes have sporadic failures #7

unit tests sometimes have sporadic failures #7

brarcher commented Dec 23, 2015

aoh commented Dec 23, 2015

aoh commented Dec 23, 2015

brarcher commented Dec 23, 2015

aoh commented Dec 24, 2015

unit tests sometimes have sporadic failures #7

unit tests sometimes have sporadic failures #7

Comments

brarcher commented Dec 23, 2015

aoh commented Dec 23, 2015

aoh commented Dec 23, 2015

brarcher commented Dec 23, 2015

aoh commented Dec 24, 2015