Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable tests_rng on KVM #1275

Closed
bmwiedemann opened this issue Dec 20, 2019 · 15 comments
Closed

Disable tests_rng on KVM #1275

bmwiedemann opened this issue Dec 20, 2019 · 15 comments

Comments

@bmwiedemann
Copy link

@bmwiedemann bmwiedemann commented Dec 20, 2019

While working on reproducible builds for openSUSE, I found that
building our python-autobahn package in a 1-core VM with

osc checkout openSUSE:Factory/python-autobahn && cd $_
osc build --noservice -j1 --vm-type=kvm

gets stuck forever in test_rng.py

[   73s] autobahn/rawsocket/test/test_rawsocket_url.py::TestParseWsUrl::test_parse_url16 PASSED [  8%]
[   73s] autobahn/rawsocket/test/test_rawsocket_url.py::TestParseWsUrl::test_parse_url17 PASSED [  8%]
[   73s] autobahn/rawsocket/test/test_rawsocket_url.py::TestParseWsUrl::test_parse_url18 PASSED [  8%]
[  238s] autobahn/test/test_rng.py::TestEntropy::test_depleting
[  238s]
@om26er

This comment has been minimized.

Copy link
Member

@om26er om26er commented Dec 20, 2019

I wonder if its possible to reproduce that on a Ubuntu based system, shall I try a LXD container with only single core allowed to it ?

@bmwiedemann

This comment has been minimized.

Copy link
Author

@bmwiedemann bmwiedemann commented Dec 20, 2019

another option that sometimes reproduces such issues is running taskset 1 COMMAND to force the kernel to only schedule on a single core. Didnt try.

@bmwiedemann

This comment has been minimized.

Copy link
Author

@bmwiedemann bmwiedemann commented Dec 20, 2019

The other option is to do apt install osc obs-build qemu-kvm on Ubuntu or Debian. It then requires a free openSUSE account.

@om26er

This comment has been minimized.

Copy link
Member

@om26er om26er commented Dec 20, 2019

I just took a peek at the test code and it seems to be performance based, so my guess is it may not really be getting stuck 'forever' its rather slow to run and depend on the available entropy

def test_depleting():
    res = {}
    with open('/dev/random', 'rb') as rng:
        for i in range(10000):
            # direct procfs access to "real" RNG
            d = rng.read(1000)  # noqa
            # check available entropy
            with open('/proc/sys/kernel/random/entropy_avail', 'r') as ent:
                ea = int(ent.read()) // 100
                if ea not in res:
                    res[ea] = 0
                res[ea] += 1
    skeys = sorted(res.keys())
    print('\nsystem entropy depletion stats:')
    for k in skeys:
        print('{}: {}'.format(k, res[k]))
@om26er

This comment has been minimized.

Copy link
Member

@om26er om26er commented Dec 20, 2019

ok, I even came with a simple test case

def simple_case():
    with open('/dev/random', 'rb') as rng:
        print("trying to read...")
        d = rng.read(1000)
        print("read...")

I believe the test is bogus and should be disabled.

@bmwiedemann

This comment has been minimized.

Copy link
Author

@bmwiedemann bmwiedemann commented Dec 20, 2019

"forever" was several hours in some cases. Maybe from the fact that there is nothing running in the build VM besides the build job, so that there are no sources of entropy for the kernel to replenish the entropy-pool.

urandom might avoid such trouble, but it probably does not reflect what the test is supposed to prove.

As the code is written atm, it tries to read 10MByte (10000 * 1000) of real entropy while the typical storage on Linux is only ~3000 bits.
Lowering the read to 100 * 10 had the same problem.
Lowering to 2 * 10 made the depleting test pass, but then test_non_depleting failed.

@meejah

This comment has been minimized.

Copy link
Member

@meejah meejah commented Dec 21, 2019

I don't understand what that's even supposed to be testing? (It doesn't assert anything, in any case).

For now, it could be disabled via a "skip" annotation or renaming it (in case, e.g., someone wants to actually run it some time). @oberstet can you comment on the purpose of this test?

"Depleting entropy" isn't really a real thing anyway, AFAIK (according to the cryptographers I respect, anyway), except for the (still interesting!) edge case of "there's actually zero randomness on startup because we're a fresh VM and have zero hardware interrupts happening".

@oberstet

This comment has been minimized.

Copy link
Member

@oberstet oberstet commented Dec 21, 2019

The tests assert test for the number of times the Linux kernel ran into a "entropy depleted" situation due to how exactly the usercode accesses entropy:

self.assertTrue(skeys[0] == 0)

or

self.assertTrue(skeys[0] > 10)

"Depleting entropy": we want real entropy .. and to be sure that we use the right code, we test that doing it wrong does not deplete entropy - and hence the entropy is not the one we want. ~ "proof by contradiction."

if the machine cannot give us real entropy at some point in time, we want to know that, and wait ourself until there is real entropy again.

background: crossbar(fx) wants to use real entropy at certain places - not entropy which is pseudorandomly derived from real entropy (as when reading random bytes via c-lib or via a language run-time). to ensure that we do use the right code in crossbarfx, we first test how we do that from python .. and the helper and test for that have been placed in autobahn.

@oberstet

This comment has been minimized.

Copy link
Member

@oberstet oberstet commented Dec 21, 2019

I believe the test is bogus and should be disabled.

no, the test is not bogus;) the itches are likely related to specifics of the virtualization in use and/or the way or throughput with which entropy sources (devices) are exposed within a "guest" ..

in a sense, one might call the test "bogus": it tests "depletion of entropy" events by forcing that via very quickly reading big chunks of randomness. if the machine (virtual or not) struggles to create real entropy and is honest about that, it has to block the force-reading process - and that will lead to excessive run-time of the test. we could also add a test guard tied to test walltime ..

@oberstet

This comment has been minimized.

Copy link
Member

@oberstet oberstet commented Dec 21, 2019

@bmwiedemann not sure if that helps or is what you'd welcome: we could add a guard in the tests and disable this one when running under KVM as long as we keep it running in our Travis CI here

I'd want to keep the test for us, eg so that we get a heads-up should the python cpy/pypy run-time / c-lib behavior change

@bmwiedemann

This comment has been minimized.

Copy link
Author

@bmwiedemann bmwiedemann commented Dec 21, 2019

That could work.
Or this: instead of always getting 10000000 bytes, couldn't you change the loop to finish as soon as the entropy was depleted once? Still needs testing if that does not block forever.

And it would probably be a good idea to run the non-depleting test before the depleting one, to still have something left in the pool.

@yan12125

This comment has been minimized.

Copy link

@yan12125 yan12125 commented Jan 4, 2020

I got a similar issue on an 8-core physical server. test_depleting runs for more than 4 minutes (before I use Ctrl+C to intterrupt it). Fortunately I found a solution(?) - after installing and starting haveged, test_depleting finishes in less than 4 seconds.

As stated on the linked Arch wiki page, haveged might not be appropriate for virtual machines, though.

felixonmars-bot pushed a commit to felixonmars/archlinux-community that referenced this issue Jan 4, 2020
Ref: crossbario/autobahn-python#1278
Ref: crossbario/autobahn-python#1275



git-svn-id: file:///srv/repos/svn-community/svn@547778 9fca08f4-af9d-4005-b8df-a31f2cc04f65
@oberstet oberstet changed the title tests_rng gets stuck in 1-core VM Disable tests_rng on KVM Jan 7, 2020
@oberstet

This comment has been minimized.

Copy link
Member

@oberstet oberstet commented Jan 7, 2020

rgd haveged : mmmh .. it also says

Warning: The quality of the generated entropy is not guaranteed and sometimes contested (see LCE: Do not play dice with random numbers and Is it appropriate to use haveged as a source of entropy on virtual machines?). Use it at your own risk or use it with a hardware based random number generator with the rng-tools (see #Alternative section)

I wouldn't use it in production. but we are talking about a CI test anyways .. probably we should do 2 things:

  • disable on KVM
  • disable when AUTOBAHN_CI_SKIP_RNG_DEPLETION_TEST is set, so that users can at least manually disable it easily
@yan12125

This comment has been minimized.

Copy link

@yan12125 yan12125 commented Jan 7, 2020

Could there be an environment variable that disables both depleting and non-depleting tests? if I understand #1275 (comment) correctly, the non-depleting test works correctly only if the depleting test is run first.

Use it at your own risk or use it with a hardware based random number generator with the rng-tools

Yep I switched to rng-tools later. It also makes the depleting test finish in a few seconds on my physical server. Not sure if it works on virtual machines or not, though, as it uses hardware random number generators.

@oberstet

This comment has been minimized.

Copy link
Member

@oberstet oberstet commented Jan 14, 2020

fixed via #1292

rng depletion tests are now skipped unless an env var AUTOBAHN_CI_ENABLE_RNG_DEPLETION_TESTS is set (to an arbitrary value).

in our CI, we do set this env var in tox: https://github.com/crossbario/autobahn-python/blob/master/tox.ini#L80 - that is, the RNG tests are still run on Travis and locally

@oberstet oberstet closed this Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.