New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internet "sometimes" freeze until connecting with different template based domU or re-connect WiFi altogether #4077

Open
Aekez opened this Issue Jul 13, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@Aekez

Aekez commented Jul 13, 2018

Qubes OS version:

Qubes 4.0. (final v.).

Note: There is a possibility for this issue to be narrowed to specific devices (driver, firmware, hardware, or OS settings) because this issue doesn't appear to have been reported by someone else yet, and it doesn't occur on my two other Qubes systems.

Affected component(s):

If I had to guess due to the nature of the problem, it's probably in one of these possibilities below.

  • Maybe network driver issue, but it's currently not clear if it is.
  • Maybe specific to sys-net.
    • General non-qubes based network tools, ip routing, wi-fi tools or similar packages.
  • Maybe Qubes inter-VM networking.
    • General non-qubes based network tools, ip routing, wi-fi tools or similar packages.
    • Qubes specific networking tools.
  • I can not test Ethernet vs. Wi-Fi connections
    • Because no Ethernet port available, and hiding USB from dom0 results in kernel panic (due to sound and other devices being internally connected to the USB I assume). So I cannot test if this is wi-fi related issue or not. Maybe there is a specific log that can clarify if Wi-FI is causing the connectivity problems? But then again, it's odd that different templates makes a difference (see more about different templates and connectivity further below).

Feel free to ask me for logs or details you suspect might be helpful narrowing down the exact issue, I'll gladly provide.

  • Especially because this seem to not happen to all Qubes systems (my other two out of three Qubes systems doesn't have this issue). Since no one else has reported it yet, it might seem my Qubes system issue might either be a less-common or a more rare case.


Steps to reproduce the behavior:

Patterns

  • It happens when browsing in similar templates (i.e. when loosing internet connection in firefox, fedora-26/28 AppVM, it also happens to the other fedora-26/28 AppVM, firefox) at exactly the same time.

    • This also happened back when Fedora-26 was not yet EOL.
    • Currently I have not tested other apps loose internet, I realize only now that this would be a good idea, so I'll keep track on that on-wards.
    • However other templates also loose internet connection in firefox, exactly at the same time, so it might not be relevant to check other apps than firefox, since this issue is spreading to other templates, and is not isolated to the one template alone.
    • It is currently unconfirmed if it happens if only one AppVM based template is running. I'd assume knowing this might be relevant though, so I'll try keep an eye out for this on-wards.
  • It occurs presumably very randomly and suddenly, typically with days in-between. Sometimes once a day, sometimes more days where it doesn't happen. Sometimes weeks in-between.

    • The sometime long period where it doesn't happen, seems to make it due to either updates changing, or user (my self) doing something in patterns that triggers it (which I haven't realized so far if I do), or some other unaccounted for variable that can trigger it in non-even patterns
  • This happens only on one of my three Qubes systems.

    • It might possibly go away if I re-install Qubes on this system, or maybe it won't (in which case it's probably driver/firmware/hardware related).
    • Right now I can't afford enough time to re-install Qubes to confirm though, but I'll put up issue irregardless as both scenarios gives rise to an issue.

Expected behavior:

No sudden "presumably random" loose of connections across similar based AppVM based templates.

Actual behavior:

Sudden "presumably random" loose of connections across similar based AppVM based templates.

  • As mentioned previously, if two or more fedora-26/28 (firefox) is running, both will randomly and occasionally loose connection, but however always at the same time.
    • Given the scneario, where I'd estimate around 50% of the time connecting with Whonix fixes the problem immediately in the non-TOR networks, at the very moment Whonix reaches out to the TOR network.
      • This implies the "possibility" something must be going wrong in sys-firewall or sys-net?

I found three relative simple ways to fix this issue when it happens.

  • (Works around 50% of the time (presumably random)) - Sending internet connections from i.e. Whonix fixes the fedora-28 AppVM loose of internet problems the moment Whonix reaches out to the TOR network.
    • I'm currently unsure if starting a new fresh fedora-28 AppVM would have similar behavior to starting Whonix.
      • For example is there a difference of new started domains compared to the domains that were running when it occured? or is it the difference between fedora/whonix(debian) domains that causes the 50% of the time fix?
      • It should be noted for clarity that irregardless of the around 50% fix occurrence, it either always 100% fixes it the moment the other template "connects" in those 50% of the times, and in contrast if it doesn't work (0%) at all in the other 50% of the times. The key difference here is therefore "the moment the other template connects", there is no observed time-delay, it either works, or it doesn't.
      • I'll keep track on this from now on if the cause of the issue isn't narrowed down before then.
  • (Works 100% of the time) - Disconnecting Wi-Fi connection, and then immidiately re-connecting).
    • Everything starts working again within normal re-connect time/behavior.
  • (Works 100% of the time) - Restarting sys-net or all of Qubes altogether.
    • Probably induces same "fix effect" as above disconnecting and re-connecting fix.

General notes:

Other uncertainties

  • This issue is happening over a longer time-scale, so it naturally makes exact memories vague without observation logs.
  • I did unfortunately not keep observation logs as I originally in the past quickly found a fix, so I always pushed it away to get on with what I was doing. So I did unfortunately not pay much attention to this issue until today.
    • I have not yet had the chance to confirm if it happens with firefox on debian-9, default template, as I typically use fedora for browsing.
    • It either never happened, or rarely happens on Whonix. Whonix can as mentioned sometime fix connection issues in other non-Whonix internet AppVM's, but I don't recall seeing the issue happen in Whonix.

Related issues:

  • None that I could find, this does not seem similar to the other internet issues that I dug up in search.
    • I had this issue for a good while, 2 months maybe? So I'm curious why no one else has reported it yet.
    • However, as mentioned, this only happens on one of my three Qubes systems, so the possibility something broke specifically in my software/settings or hardware, remains.
@Aekez

This comment has been minimized.

Show comment
Hide comment
@Aekez

Aekez Jul 19, 2018

After paying greater attention to Whonix, it seems this also happens here. But it's acting a little differently due to the extra Tor-bootstrapping processes, and I believe the time bugs like #4097 might also further be complicating the clear picture about what is going on here. For example when time goes out-of-sync, which as far as I can tell messes up the Tor bootstrapping process, and having other reasons internet stops working makes it harder to debug.

In short, overlapping bugs are complicating this current bug issue, but it seems clear at this point that Whonix has the same issue as fedora has. For example it can sometime restore networking by connecting from a different DomU non-Tor internet connection (like Firefox), or reconnecting the WiFi. But because Tor doesn't always 100% get restored after reconnecting WiFi like Fedora does, it's complicating the picture for Whonix (presumably due to other bugs. Like the time bug mentioned above, which seems to sometimes require to restart Qubes fully ((restarting VM's not enough), etc. in order to regain internet access, and also makes the reconnection of WiFi less than 100% success like it is with Fedora).

This issue does not seem to matter in which template or Linux distribution is used.

  • Either its a common bug, or it's something outside the templates.

From a broad perspective, it seems like it can be narrowed down to either;

  • Hardware driver/firmware problem.
  • sys-vm and Router WiFi compatibility
    • I'll try see if I can check a different router in the coming weeks, however all other devices work fine with the current router, so this is only to "rule-out" a router problem, rather than it being a suspect.
  • Possibility for mixed bug and possible poor WiFi signal strength
    • For example sys-net loose of connection, and is unable to recover from the hickup. At which case its a joint bug and environment problem.
    • and similar joint bug and environment scenarios.
  • Something in the inter-VM Qubes networking.

Aekez commented Jul 19, 2018

After paying greater attention to Whonix, it seems this also happens here. But it's acting a little differently due to the extra Tor-bootstrapping processes, and I believe the time bugs like #4097 might also further be complicating the clear picture about what is going on here. For example when time goes out-of-sync, which as far as I can tell messes up the Tor bootstrapping process, and having other reasons internet stops working makes it harder to debug.

In short, overlapping bugs are complicating this current bug issue, but it seems clear at this point that Whonix has the same issue as fedora has. For example it can sometime restore networking by connecting from a different DomU non-Tor internet connection (like Firefox), or reconnecting the WiFi. But because Tor doesn't always 100% get restored after reconnecting WiFi like Fedora does, it's complicating the picture for Whonix (presumably due to other bugs. Like the time bug mentioned above, which seems to sometimes require to restart Qubes fully ((restarting VM's not enough), etc. in order to regain internet access, and also makes the reconnection of WiFi less than 100% success like it is with Fedora).

This issue does not seem to matter in which template or Linux distribution is used.

  • Either its a common bug, or it's something outside the templates.

From a broad perspective, it seems like it can be narrowed down to either;

  • Hardware driver/firmware problem.
  • sys-vm and Router WiFi compatibility
    • I'll try see if I can check a different router in the coming weeks, however all other devices work fine with the current router, so this is only to "rule-out" a router problem, rather than it being a suspect.
  • Possibility for mixed bug and possible poor WiFi signal strength
    • For example sys-net loose of connection, and is unable to recover from the hickup. At which case its a joint bug and environment problem.
    • and similar joint bug and environment scenarios.
  • Something in the inter-VM Qubes networking.

@Aekez Aekez closed this Jul 19, 2018

@Aekez

This comment has been minimized.

Show comment
Hide comment
@Aekez

Aekez Jul 19, 2018

Apologies for the closing, I miss clicked when deciding to post-pone posting here I was in a semi-hurry to get out of the door just now and miss-read the grey button as a normal cancel button.

The original reason for posting is that I found this bug to also affect Whonix, but it's less clear in Whonix because it has other bugs also affecting connectivity, like the time bugs currently being an issue for Tor bootstrapping, and sometimes even requiring a full restart to regain connectivity (when restarting VM's isn't enough). These are unrelated to this issue, however it complicates the clear view of the symptoms as they overlap.

Reason I decided to postpone posting (that is until I missclicked close button...) is in part because I might be able to find more information on my own before reporting the Whonix issue (i.e. replacing my WiFi router to narrow down whether this is a Qubes problem or not), and in part due to being on the way out of the door. Apologies, I'll be looking to see if I can gather more information on this issue.

If you have any ideas for useful logs, testing methods or similar, then please feel free to suggest it.

Aekez commented Jul 19, 2018

Apologies for the closing, I miss clicked when deciding to post-pone posting here I was in a semi-hurry to get out of the door just now and miss-read the grey button as a normal cancel button.

The original reason for posting is that I found this bug to also affect Whonix, but it's less clear in Whonix because it has other bugs also affecting connectivity, like the time bugs currently being an issue for Tor bootstrapping, and sometimes even requiring a full restart to regain connectivity (when restarting VM's isn't enough). These are unrelated to this issue, however it complicates the clear view of the symptoms as they overlap.

Reason I decided to postpone posting (that is until I missclicked close button...) is in part because I might be able to find more information on my own before reporting the Whonix issue (i.e. replacing my WiFi router to narrow down whether this is a Qubes problem or not), and in part due to being on the way out of the door. Apologies, I'll be looking to see if I can gather more information on this issue.

If you have any ideas for useful logs, testing methods or similar, then please feel free to suggest it.

@Aekez Aekez reopened this Jul 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment