Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python ClientWrapperTest fails occasionally #1714

Open
danielskeenan opened this issue Feb 18, 2021 · 9 comments · Fixed by #1716
Open

Python ClientWrapperTest fails occasionally #1714

danielskeenan opened this issue Feb 18, 2021 · 9 comments · Fixed by #1716

Comments

@danielskeenan
Copy link
Contributor

This has apparently been a known issue for a while, but I've narrowed down the circumstances it can happen in while working on OpenLightingProject/website#14. Warning - it's a bit involved.

When protobuf is installed from pip (i.e. pip3 install protobuf), ClientWrapperTest fails. When protobuf is installed from the system package manager (i.e. apt-get install python3-protobuf), the test passes. Checked with the latest commit in master (a95a6f7 currently) with Ubuntu 20.10 and Ubuntu 20.04. I haven't properly checked Fedora yet (only a cursory test while writing docs), but I suspect it will have the same issue for reasons detailed below.

Some speculation:

My first thought was an issue between Python2 packages and Python3 packages. These test systems only have Python3, so that's not it.

Further research reveals that GNOME distros (or, at least, Ubuntu and Fedora) come with gnome-online-accounts, which itself depends on python3-protobuf. This means that GNOME users will likely have protobuf for python already installed on their systems. If they install deps from pip (e.g. with pip3 install protobuf numpy), pip won't install another protobuf as it sees the system version already. This also explains why the issue occurs on headless systems (i.e. Travis), as those systems won't have the preinstalled python protobuf and will install it from pip if you ask.

But what's the difference between the two sources? My instinct is that something changed in newer Protobuf releases (I know this is a regular problem here). The latest release on pip is 3.14.0. Ubuntu 20.10 will get you 3.12.3. Ubuntu 20.04 provides 3.6.1. Fedora 33 has 3.12.4. Fedora 32 has 3.11.2.

To confirm this theory, I installed protobuf 3.12.4 from pip and ran tests, which all passed. protobuf 3.13.0 also passes. This leads me to conclude something about protobuf 3.14 is causing problems. Caveat - I havn't tested any of this with Python2.

Test log:

==================================
   OLA 0.10.8: ./test-suite.log
==================================

# TOTAL: 95
# PASS:  94
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: python/ola/ClientWrapperTest.sh
=====================================

...F/usr/lib/python3.8/unittest/case.py:704: ResourceWarning: unclosed <socket.socket fd=4, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0>
  outcome.errors.clear()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
.
======================================================================
FAIL: testEventLoop (__main__.ClientWrapperTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./python/ola/ClientWrapperTest.py", line 195, in testEventLoop
    self.assertTrue(results.d_called - self.start >=
AssertionError: False is not true

----------------------------------------------------------------------
Ran 5 tests in 0.039s

FAILED (failures=1)
FAIL python/ola/ClientWrapperTest.sh (exit status: 1)
@kripton
Copy link
Member

kripton commented Feb 18, 2021

Well investigated. However, it might be related to something else. On my machine (Gentoo Linux, protobuf-3.14.0 installed via system package manager), the test passes:

PASS: python/ola/ClientWrapperTest.sh

Protobuf (and its Python bindings) are already "stable" on Gentoo Linux (at least for amd64): https://packages.gentoo.org/packages/dev-libs/protobuf and https://packages.gentoo.org/packages/dev-python/protobuf-python. I do have pip installed but I actually only use it if I have to. Python versions installed are:

2.7.18_p7(2.7)(09:40:44 02/17/21)(bluetooth examples gdbm ipv6 ncurses readline sqlite ssl threads tk wide-unicode xml -berkdb -build -hardened -libressl -verify-sig -wininst ELIBC="-uclibc")
3.8.8_rc1(3.8)^t(14:23:18 02/18/21)(bluetooth examples gdbm ipv6 ncurses readline sqlite ssl tk xml -build -hardened -libressl -test -verify-sig -wininst)
3.9.2_rc1(3.9)^t(14:26:52 02/18/21)(bluetooth examples gdbm ipv6 ncurses readline sqlite ssl tk xml -build -hardened -libressl -test -verify-sig -wininst)

But basically no package (including protobuf) install bindings for Python 2 anymore.

Configuration of ola-0.10.8 goes like that:

./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking --disable-silent-rules --docdir=/usr/share/doc/ola-0.10.8 --htmldir=/usr/share/doc/ola-0.10.8/html --with-sysroot=/ --libdir=/usr/lib64 --disable-fatal-warnings --disable-osc --enable-examples --enable-libftdi --enable-ftdidmx --enable-http --enable-python-libs --enable-libusb
...
checking whether /usr/bin/python3.8 version is >= 2.6... yes
checking for /usr/bin/python3.8 version... 3.8
checking for /usr/bin/python3.8 platform... linux
checking for /usr/bin/python3.8 script directory... ${prefix}/lib/python3.8/site-packages
checking for /usr/bin/python3.8 extension module directory... ${exec_prefix}/lib/python3.8/site-packages
checking for python3.8 module: google.protobuf... yes
checking for flake8... no
checking for libprotobuf... yes
checking for protoc... /usr/bin/protoc
checking protoc version... 3.14.0
...

Gentoo Linux is a bit exotic for people not used to it. However, if you want I can provide you a Dockerfile so you can reproduce and/or continue investigating with that if you'd like to.

@kripton
Copy link
Member

kripton commented Feb 18, 2021

Just tested also with the latest master (commit a95a6f7) and the tests are also all PASSing. However, my system runs fewer tests than yours:

# TOTAL: 94
# PASS:  94

@danielskeenan danielskeenan changed the title Python ClientWrapperTest fails with protobuf 3.14.0 Python ClientWrapperTest fails on Ubuntu with protobuf 3.14.0 Feb 18, 2021
@danielskeenan
Copy link
Contributor Author

That matches my experience on Fedora with protobuf 3.14, so it appears the issue is confined to Ubuntu. Going to start digging through changelogs.

@danielskeenan
Copy link
Contributor Author

I tweaked the failing test a bit to get better info:

diff --git a/python/ola/ClientWrapperTest.py b/python/ola/ClientWrapperTest.py
index cd1e5cf2d..a24041d15 100644
--- a/python/ola/ClientWrapperTest.py
+++ b/python/ola/ClientWrapperTest.py
@@ -186,13 +186,13 @@ class ClientWrapperTest(unittest.TestCase):
     self.assertIsNotNone(results.c_called)
     self.assertIsNotNone(results.d_called)
 
-    self.assertTrue(results.a_called - self.start <
+    self.assertLess(results.a_called - self.start,
                     datetime.timedelta(milliseconds=5))
-    self.assertTrue(results.b_called - self.start >=
+    self.assertGreaterEqual(results.b_called - self.start,
                     datetime.timedelta(milliseconds=5))
-    self.assertTrue(results.c_called - self.start >=
+    self.assertGreaterEqual(results.c_called - self.start,
                     datetime.timedelta(milliseconds=10))
-    self.assertTrue(results.d_called - self.start >=
+    self.assertGreaterEqual(results.d_called - self.start,
                     datetime.timedelta(milliseconds=15))
 
     sockets[0].close()

With the new test that says what the difference is, it fails with:

AssertionError: datetime.timedelta(microseconds=14612) not greater than or equal to datetime.timedelta(microseconds=15000)

However, I ran the same test a couple more times, without changing anything, and it passed. Are we just dealing with system scheduler variances?

@brucelowekamp
Copy link
Contributor

brucelowekamp commented Feb 18, 2021 via email

@danielskeenan danielskeenan changed the title Python ClientWrapperTest fails on Ubuntu with protobuf 3.14.0 Python ClientWrapperTest fails occasionally Feb 18, 2021
@brucelowekamp
Copy link
Contributor

brucelowekamp commented Feb 18, 2021 via email

@danielskeenan
Copy link
Contributor Author

I've got a VM rebuilding the project on a single core to see what happens (it was difficult to get it to fail locally on my 12-core i7). Will work on a PR that relaxes the timing slightly.

@peternewman
Copy link
Member

Oops, I thought I'd raised this elsewhere and tagged @brucelowekamp for some input but clearly not.

I hit the same thing on the Arm builds on Travis, but only confusingly at some point later the commits to get them merged worked fine, so I guess after some system change or similar, or package upgrade. I get (from https://travis-ci.org/github/OpenLightingProject/ola/jobs/755992758 which is a build of #1687):


FAIL: testEventLoop (__main__.ClientWrapperTest)

----------------------------------------------------------------------

Traceback (most recent call last):

  File "../../python/ola/ClientWrapperTest.py", line 193, in testEventLoop

    datetime.timedelta(milliseconds=5))

AssertionError: datetime.timedelta(0, 0, 4923) not greater than or equal to datetime.timedelta(0, 0, 5000)

peternewman added a commit that referenced this issue Feb 19, 2021
Relax timing on ClientWrapperTest
@danielskeenan
Copy link
Contributor Author

As @peternewman on #1716 (comment), tests are still failing occasionally (at least on ARM) even after #1716 was merged.

Given what DMX timings can be in popular products, it's probably safe to raise the allowed timing delta from 500 µs to 1000 µs (1 ms). I suppose the real question is what resolution we want to guarantee callbacks will run with? Trying to avoid changing tests when perhaps those tests are saying there's a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants