Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixosTests: re-enable networking tests #86486

Closed
wants to merge 2 commits into from

Conversation

@flokli
Copy link
Contributor

flokli commented May 1, 2020

5150378 fixed the long-broken
nixosTests.networking.virtual.

With all tests failures fixed, and #79328 making debugging much easier,
let's re-add it to the tested jobset.

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
@andir
Copy link
Member

andir commented May 2, 2020

@ofborg test networking.scripted.link networking.scripted.privacy networking.scripted.routes networking.scripted.virtual networking.networkd.bond networking.networkd.bridge networking.networkd.dhcpOneIf networking.networkd.dhcpSimple networking.networkd.link networking.networkd.loopback networking.networkd.macvlan networking.networkd.privacy networking.networkd.routes networking.networkd.sit networking.networkd.static networking.networkd.virtual networking.networkd.vlan

@andir
Copy link
Member

andir commented May 2, 2020

Some of the tests seem to be flaky. See the aarch64 build results.

@flokli
Copy link
Contributor Author

flokli commented May 4, 2020

Hm, I tried squinting at the logs to spot the error, the only parts I found were nixos-test driver specific failures:

  File "/nix/store/qv7jilbizwv4cz2rbh46hhkyzavwp4bv-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 748 in process_serial_output
Current thread 0x0000fffff7ff5780 (most recent call first):
/nix/store/rsjp3g6hny6b7m3m6kv753lps0pvdqak-stdenv-linux/setup: line 1271:     6 Aborted                 (core dumped) LOGFILE=$out/log.xml tests='exec(os.environ["testScript"])' /nix/store/5gzzr77gg756pgxdk76wyhjshyfkvcl6-nixos-test-driver-vlan-Networking-Networkd/bin/nixos-test-driver
  File "/nix/store/qv7jilbizwv4cz2rbh46hhkyzavwp4bv-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 748 in process_serial_output
Current thread 0x0000fffff7ff5780 (most recent call first):
/nix/store/rsjp3g6hny6b7m3m6kv753lps0pvdqak-stdenv-linux/setup: line 1271:     6 Aborted                 (core dumped) LOGFILE=$out/log.xml tests='exec(os.environ["testScript"])' /nix/store/5gzzr77gg756pgxdk76wyhjshyfkvcl6-nixos-test-driver-vlan-Networking-Networkd/bin/nixos-test-driver

I couldn't spot some "networking-specific flakiness in there - it looks more like a generic test-driver flakyness, maybe only happening under high load?

@tfc, any ideas?

@flokli flokli force-pushed the flokli:networking-tests-add branch from 6c8867c to c9aed19 May 9, 2020
@flokli
Copy link
Contributor Author

flokli commented May 9, 2020

I rebased on top of 78f2a83, assuming this should fix the observed flakyness.

@flokli
Copy link
Contributor Author

flokli commented May 9, 2020

@ofborg test networking.scripted.link networking.scripted.privacy networking.scripted.routes networking.scripted.virtual networking.networkd.bond networking.networkd.bridge networking.networkd.dhcpOneIf networking.networkd.dhcpSimple networking.networkd.link networking.networkd.loopback networking.networkd.macvlan networking.networkd.privacy networking.networkd.routes networking.networkd.sit networking.networkd.static networking.networkd.virtual networking.networkd.vlan

@flokli
Copy link
Contributor Author

flokli commented May 9, 2020

Hrm, this still fails:

Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads
Thread 0x0000fffff52101e0 (most recent call first):
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 93 in eprint
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 748 in process_serial_output
Thread 0x0000fffff5a111e0 (most recent call first):
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 93 in eprint
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 748 in process_serial_output
Thread 0x0000fffff62121e0 (most recent call first):
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 93 in eprint
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 748 in process_serial_output
Current thread 0x0000fffff7ff5780 (most recent call first):
/nix/store/yaw7vxl73i6ii08yqid69mli216b9p28-stdenv-linux/setup: line 1271:     6 Aborted                 (core dumped) LOGFILE=/dev/null tests='exec(os.environ["testScript"])' /nix/store/xmi7nwsf3fidj6pqhkpfnd8bvjrlbskn-nixos-test-driver-Privacy-Networking-Networkd/bin/nixos-test-driver
builder for '/nix/store/vyz7ylzi3iqwmphlzv9nfchax35dlr8f-vm-test-run-Privacy-Networking-Scripted.drv' failed with exit code 134
flokli referenced this pull request May 9, 2020
If a program (e.g. nixos-install) writes more than 1000 lines to
stderr during execute(), then process_serial_output() deadlocks
waiting for the queue to be processed. So use an unbounded queue
instead.

We should probably get rid of the structured log output (log.xml),
since then we don't need the log queue anymore.
flokli added 2 commits May 1, 2020
5150378 fixed the long-broken
nixosTests.networking.virtual.

With all tests failures fixed, and #79328 making debugging much easier,
let's re-add it to the tested jobset.
@flokli flokli force-pushed the flokli:networking-tests-add branch from c9aed19 to 897d574 May 9, 2020
@flokli flokli requested a review from tfc as a code owner May 9, 2020
@flokli
Copy link
Contributor Author

flokli commented May 9, 2020

@ofborg test networking.scripted.link networking.scripted.privacy networking.scripted.routes networking.scripted.virtual networking.networkd.bond networking.networkd.bridge networking.networkd.dhcpOneIf networking.networkd.dhcpSimple networking.networkd.link networking.networkd.loopback networking.networkd.macvlan networking.networkd.privacy networking.networkd.routes networking.networkd.sit networking.networkd.static networking.networkd.virtual networking.networkd.vlan

@flokli
Copy link
Contributor Author

flokli commented May 9, 2020

Even when removing all the queue stuff, I still got the

Fatal Python error: could not acquire lock for <_io.BufferedWriter name=''> at interpreter shutdown, possibly due to daemon threads

So this seems to come from process_serial_output being in a separate thread, and the underlying print to stderr.

With all the xml/html log output gone since #87191, I'll give it a try to rework this to make use of Pythons native logging framework, which is supposed to be thread safe.

@flokli
Copy link
Contributor Author

flokli commented May 22, 2020

Converting to draft until the discussion following #87191 (comment) has been done.

Copy link
Member

Ma27 left a comment

After having read the threads in this PR, #86889 and #87191 I think that it's preferable to have a stable test-driver rather than waiting until we've reached consensus about how $out and logging should look like.

IMHO we can still re-add lost features later on. I may be rather unlucky, but I regularly experience frozen VM-tests (that get "fixed" by restarting them) and right now I have a simple VM using grafana and loki (based on the test-driver) that reproducibly breaks when trying to shut it down (with the error demonstrated in #86889 ) and I think that it's more important to get those kind of (known) issues under control.

@flokli
Copy link
Contributor Author

flokli commented Jul 11, 2020

Yeah, I agree. Feel free to take over this PR - I can't currently pursue this.

@flokli flokli closed this Jul 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.