Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upsys-whonix doesn't connect to Tor after system suspend #1764
Comments
marmarek
added
bug
C: templates
P: major
labels
Feb 19, 2016
marmarek
added this to the Release 3.1 updates milestone
Feb 19, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Feb 19, 2016
Member
Related: after setting time manually, something (sdwdate?) set time constantly:
Feb 19 16:36:52 host systemd[1390]: Time has been changed
Feb 19 16:36:53 host systemd[1]: Time has been changed
Feb 19 16:36:53 host systemd[1390]: Time has been changed
Feb 19 16:36:54 host systemd[1]: Time has been changed
Feb 19 16:36:54 host systemd[1390]: Time has been changed
Feb 19 16:36:55 host systemd[1]: Time has been changed
Feb 19 16:36:55 host systemd[1390]: Time has been changed
Feb 19 16:36:56 host systemd[1]: Time has been changed
Feb 19 16:36:56 host systemd[1390]: Time has been changed
Feb 19 16:36:57 host systemd[1]: Time has been changed
Feb 19 16:36:57 host systemd[1390]: Time has been changed
Feb 19 16:36:58 host systemd[1]: Time has been changed
Feb 19 16:36:58 host systemd[1390]: Time has been changed
Feb 19 16:36:59 host systemd[1]: Time has been changed
Feb 19 16:36:59 host systemd[1390]: Time has been changed
Feb 19 16:37:00 host systemd[1390]: Time has been changed
Feb 19 16:37:00 host systemd[1]: Time has been changed
Feb 19 16:37:01 host systemd[1390]: Time has been changed
Process list:
user@host:~$ ps aux|grep sdwdate
sdwdate 814 0.0 0.6 14372 4220 ? Ss Feb18 0:34 /bin/bash /usr/bin/sdwdate
sdwdate 15460 0.0 0.4 14372 3184 ? S 16:30 0:00 /bin/bash /usr/bin/sdwdate
sdwdate 15461 0.0 0.3 14372 2232 ? S 16:30 0:00 /bin/bash /usr/bin/sdwdate
root 15470 0.0 0.5 51072 3660 ? S 16:30 0:00 sudo INLINEDIR=/var/cache/sdwdate/sclockadj /usr/lib/sdwdate/sclockadj --no-verbose --no-debug --no-first-wait --move-min 500000 --move-max 500000 --wait-min 1000000000 --wait-max 1000000000 --subtract 2992752664174
sdwdate 15472 0.0 0.0 5800 660 ? S 16:30 0:00 sleep 9120
root 15473 29.7 15.5 175864 108168 ? Sl 16:30 2:13 ruby /usr/lib/sdwdate/sclockadj --no-verbose --no-debug --no-first-wait --move-min 500000 --move-max 500000 --wait-min 1000000000 --wait-max 1000000000 --subtract 2992752664174
|
Related: after setting time manually, something (
Process list:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
Feb 19, 2016
Member
The time changed messages are generated by sclockadj. Unrelated.
Requires $someone to finish / create sclockadj2. More info:
- Whonix/sdwdate#4
https://www.whonix.org/wiki/Dev/TimeSync#Adjusting_time_slowly_using_adjtimex.2Fntp_adjtime
It's a known issue.
https://www.whonix.org/wiki/Known_Issues#Suspend_.2F_Hibernate_Issues
Fun to fix.
How to fix... On resume... Inside Whonix.... To do....
- sudo service sdwdate stop
- have dome0 telling Qubes-Whonix a slightly randomized time [1] and
set it using date - sudo service sdwdate start
- might have to restart Tor at this point depending on if they already
fixed the bugs [that I reported already] requiring this
Minor: check if sdwdate is even installed beforehand.
Do we have a ticket for Qubes to implement dispatching a hook on resume?
Once that is done, the above is simple.
[1] code similar to this:
https://github.com/Whonix/bootclockrandomization/blob/master/usr/share/bootclockrandomization/start
|
The time changed messages are generated by sclockadj. Unrelated. - Whonix/sdwdate#4https://www.whonix.org/wiki/Dev/TimeSync#Adjusting_time_slowly_using_adjtimex.2Fntp_adjtime It's a known issue. Fun to fix. How to fix... On resume... Inside Whonix.... To do....
Minor: check if sdwdate is even installed beforehand. Do we have a ticket for Qubes to implement dispatching a hook on resume? [1] code similar to this: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Feb 19, 2016
Member
On Fri, Feb 19, 2016 at 08:29:18AM -0800, Patrick Schleizer wrote:
- sudo service sdwdate stop
I guess, should be start here :)
Do we have a ticket for Qubes to implement dispatching a hook on resume?
Once that is done, the above is simple.
Yes, #1663.
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
|
On Fri, Feb 19, 2016 at 08:29:18AM -0800, Patrick Schleizer wrote:
I guess, should be
Yes, #1663. Best Regards, |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rootkovska
Feb 22, 2016
Member
BTW, I also have started observing problems with time syncing in my sys-whonix gw VM. While Tor connects fine, the sdwdate processes starts consuming 100% cpu (luckily one core only). The only solution is to stop the service. The date is out of sync and shows much into the past (e.g. the previous day). Tested with all the latest patches.
|
BTW, I also have started observing problems with time syncing in my sys-whonix gw VM. While Tor connects fine, the sdwdate processes starts consuming 100% cpu (luckily one core only). The only solution is to stop the service. The date is out of sync and shows much into the past (e.g. the previous day). Tested with all the latest patches. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
Feb 22, 2016
Member
For reference, there is more information on sclockadj here:
https://groups.google.com/d/msg/qubes-users/QO4He5mZDzc/68iyt4-5BgAJ
|
For reference, there is more information on sclockadj here: https://groups.google.com/d/msg/qubes-users/QO4He5mZDzc/68iyt4-5BgAJ |
marmarek
referenced this issue
Feb 28, 2016
Closed
VMs with Whonix-Tor networking timeout and do not recover #1779
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
Mar 11, 2016
Member
I might likely be able to come up with a workaround. Using clock-jump-detector.
Pros:
- If the clock is less than 1 hour in the past or less than 3 hours in the future
- no
100% use of cpuissue - restored Tor connectivity
- no
- If the clock is more than 1 hour in the past or more than 3 hours in the future
- no
100% use of cpuissue - (((no restored Tor connectivity)))
- no
Technical details:
A clock-jump-detector could be invented. A script that runs a loop, that stores unixtime in variable A, waits, stores another unixtime in variable B, then compares those.
When sdwdate sets the time using date (rather than sclockadj) [2], sdwdate would need to to:
- stop the
clock-jump-detectorsystemd service. - set the time using
date. - restart the
clock-jump-detectorsystemd service.
Caveats:
- during 1) to 3) there is room for a race condition [4], probably happening very seldom
- when users change the time manually, it will trigger
clock-jump-detector - As a general Tor (non-Whonix!) issue, if the clock is 1 hour in the past or more than 3 hours in the future, Tor can't connect. So there is no sane way (speak: only a fingerprintable) way to recover Tor automatically in such situations. [1]
- This goes for any Tor-inside-a-VM or Tor-on-the-host project.
- To recover from such situations for a Tor-inside-a-VM (speak: Whonix) project, cooperation of the host (speak: dom0) is required. (That would be
qubes.GetRandomizedTime[3]. [Or at least aqubes.GetTimedom0 qrexec service.])
- It would be a hack. #1663 and
qubes.GetRandomizedTime[3] would be a cleaner solution providing better usability [5].
[1] https://www.whonix.org/wiki/Dev/TimeSync#Tor_Consensus_Method
[2] after boot or when manually instructed so
[3] https://groups.google.com/d/msg/qubes-devel/aN3IOv6JmKw/_XOwbV-EAgAJ
[4] meaning, that the clock-jump-detector mechanism would not work then
[5] recover connectivity no matter how long the computer was being suspended
|
I might likely be able to come up with a workaround. Using Pros:
Technical details: A When
Caveats:
[1] https://www.whonix.org/wiki/Dev/TimeSync#Tor_Consensus_Method |
adrelanos
referenced this issue
Mar 11, 2016
Closed
suspend / resume scripts needed for NetVM and ProxyVM? #1663
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 11, 2016
Member
I think we can add qubes.GetRandomizedTime, based on bootclockrandomization to R3.1 and R3.0. Adding new service is safe in terms of regressions. What is not so safe, is changing qubes.SuspendPre logic (to be called in all the VMs). Needs to be done very carefully.
|
I think we can add |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
Mar 11, 2016
Member
On a second thought, unfortunately qubes.GetRandomizedTime is probably of lower priority for a long time. This is because Xen cannot provide us with a fully independent VM clock anyhow. (details 1 / details 2) So the clock correlation attack that it is supposed to defeat cannot be implemented anyhow. (Sure, it would be a nice-to-have qubes.GetRandomizedTime.)
Would having a simpler qubes.GetTime service make sense in meanwhile? For non-Whonix VMs? Or should it be avoided/skipped and right qubes.GetRandomizedTime be implemented to ease the next iteration far in the future? [avoid far future restricting which types of VMs do not get access to qubes.GetTime]
|
On a second thought, unfortunately Would having a simpler |
added a commit
to adrelanos/qubes-core-admin-linux
that referenced
this issue
Mar 11, 2016
added a commit
to adrelanos/qubes-core-admin
that referenced
this issue
Mar 11, 2016
adrelanos
referenced this issue
in QubesOS/qubes-core-admin
Mar 11, 2016
Closed
implemented dom0 qubes.GetTime #22
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Can you please check if QubesOS/qubes-core-admin#22 would make sense? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 11, 2016
Member
If going non-randomized time option, IMHO simply qubes.SetDateTime can be used - providing appropriate Whonix-specific handler. This handler should:
- Check if provided time is off more than 180s - if so, probably suspend happened. Or maybe check for +-1h, because that is the range really hurting tor, right?
- Depending on above check - either ignore the value, or randomize it slightly and set (taking care to not conflict with
sdwdate)
This would be even better than qubes.GetTime, because it would work without any dom0 modification (so just whonix template will be enough, no need to handle multiple cases).
That said, I have nothing against qubes.GetRandomizedTime. The fact that we have to deal with this very ticket proves that it isn't exactly trivial to get host time, even with clocksource=xen.
|
If going non-randomized time option, IMHO simply
This would be even better than That said, I have nothing against |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
Mar 11, 2016
Member
Marek Marczykowski-Górecki:
If going non-randomized time option, IMHO simply
qubes.SetDateTimecan be used - providing appropriate Whonix-specific handler.
qubes.SetDateTime is called too often. By qvm-sync-clock. Also at
times where we rather avoid it. qubes.GetTime could be used upon real
resume only.
Alternatively I was wondering if the [new] hook notifying the VM of
resume should also notify the VM of the current unixtime as an extra
parameter. (#1663) That would be fine until we get a fully independent
VM clock in far future.
- Check if provided time is off more than 180s - if so, probably suspend happened. Or maybe check for +-1h, because that is the range really hurting tor, right?
- Depending on above check - either ignore the value, or randomize it slightly and set (taking care to not conflict with
sdwdate)
More certainty than probably suspend happened would be good. Because
then we could just act as on boot. short term plan: use sdwdate, set
time using date. long term plan: block networking until sdwdate is
done, set time using sdwdate and date. With sdwdate-alike security and
accuracy. Better fingerprinting defense than "boot" (resume) clock
randomization after suspend alone.
The fact that we have to deal with this very ticket proves that it isn't exactly trivial to get host time, even with
clocksource=xen.
Not trivial, but also not super hard. Just would require quite some time
for research and development.
- requires a kernel module that is not exactly simple to set up on Qubes
generally (following pvgrub instructions) - writing a custom kernel module accessing clocksource xen
- [or some other C / kernel trickery I am not aware off to access
clocksource xen] - dealing with the (un)realiability of clocksource xen. It may work okay
as long as suspend/resume is not involved (for clock correlation
attacks) but not after actual suspend/resume.
|
Marek Marczykowski-Górecki:
Alternatively I was wondering if the [new] hook notifying the VM of
More certainty than probably suspend happened would be good. Because
Not trivial, but also not super hard. Just would require quite some time
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 12, 2016
Member
- writing a custom kernel module accessing clocksource xen
Ok, you're right. It looks like a module is required. Really simple one:
#include <linux/module.h>
#include <linux/timekeeping.h>
int gettime(void) {
struct timespec ts;
x86_platform.get_wallclock(&ts);
printk(KERN_INFO "persistent_clock: %ld.%ld\n", ts.tv_sec, ts.tv_nsec);
return -1;
}
MODULE_LICENSE("GPL");
module_init(gettime);
output:
[986617.296157] persistent_clock: 1457743780.493455504
[986629.957794] persistent_clock: 1457743793.155087310
[986630.458305] persistent_clock: 1457743793.655793983
[986630.859566] persistent_clock: 1457743794.56864426
[986631.218631] persistent_clock: 1457743794.415929105
Which is host (or Xen?) time, even after setting VM time to something totally different (with date -s).
Ok, you're right. It looks like a module is required. Really simple one:
output:
Which is host (or Xen?) time, even after setting VM time to something totally different (with |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 12, 2016
Member
And this time is properly loaded back as system time, when the VM is properly suspended before suspending the host. Which is the case for NetVM (or more precisely: VM with some PCI device).
This is something we consider to change in Qubes 4.0 - properly suspend all the VMs before suspending the host.
Anyway, I think it would be better to do qubes.GetRandomizedTime just now, to not rollback/limit qubes.GetTime in the future. Probably a simple copy&paste from bootclockrandomization, so not really much more work.
|
And this time is properly loaded back as system time, when the VM is properly suspended before suspending the host. Which is the case for NetVM (or more precisely: VM with some PCI device). Anyway, I think it would be better to do |
added a commit
to adrelanos/bootclockrandomization
that referenced
this issue
Mar 13, 2016
added a commit
to adrelanos/qubes-core-admin
that referenced
this issue
Mar 13, 2016
adrelanos
referenced this issue
in QubesOS/qubes-core-admin
Mar 13, 2016
Merged
implemented dom0 qubes.GetRandomizedTime #23
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
Mar 13, 2016
Member
Okay, agreed.
Please check if QubesOS/qubes-core-admin#23 makes sense.
|
Okay, agreed. Please check if QubesOS/qubes-core-admin#23 makes sense. |
added a commit
to adrelanos/sdwdate
that referenced
this issue
Mar 15, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
Mar 15, 2016
Member
We now have:
- dom0 qubes.GetRandomizedTime QubesOS/qubes-core-admin#23
- and suspend-pre, suspend-post and clock-fix handler scripts adrelanos/sdwdate@5dda984 (please have a glimpse if time allows)
- sclockadj2: https://forums.whonix.org/t/sclockadj2-slow-clock-adjuster-prevent-fingerprintable-clock-adjustments
Next thing I'll be working on is suspend / resume scripts (#1663). After that this ticket is trivial to solve. (Just use #1663 to call the sdwdate suspend handler scripts.)
|
We now have:
Next thing I'll be working on is suspend / resume scripts (#1663). After that this ticket is trivial to solve. (Just use #1663 to call the sdwdate suspend handler scripts.) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 15, 2016
Member
- dom0 qubes.GetRandomizedTime QubesOS/qubes-core-admin#23
This is also in current-testing for R3.1.
This is also in current-testing for R3.1. |
adrelanos
referenced this issue
Mar 16, 2016
Open
generic qubes qrexec rpc '.d' "/etc/qubes-rpc.d" drop-in folder #1844
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
Mar 18, 2016
Member
Next thing I'll be working on is suspend / resume scripts (#1663).
Was done by Marek.
After that this ticket is trivial to solve. (Just use #1663 to call the sdwdate suspend handler scripts.)
Done:
Whonix/sdwdate@5c1aea7
Was done by Marek.
Done: |
added a commit
to Whonix/sdwdate
that referenced
this issue
Mar 18, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
Mar 19, 2016
Member
Done. Will be released with Whonix 13.
For testing purposes. Inside Whonix.
sudo sh -x /etc/qubes-rpc/qubes.SuspendPreAll
sudo sh -x /etc/qubes-rpc/qubes.SuspendPostAll
Might require real host system suspend / resume. Not working yet with suspend / resume VM in QVMM yet. Asked about that: #1663 (comment)
|
Done. Will be released with Whonix 13. For testing purposes. Inside Whonix.
Might require real host system suspend / resume. Not working yet with suspend / resume VM in QVMM yet. Asked about that: #1663 (comment) |
adrelanos
referenced this issue
Mar 19, 2016
Closed
Remove "pause VM" button from Qubes Manager #1855
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
Mar 24, 2016
Member
Marek Marczykowski-Górecki:
(BTW @adrelanos, do you consider it as an update to Whonix 12? Or Whonix 13 is soon enough?).
No upgrade to Whonix 12. I release Whonix stable upgrades with the smallest possible diff and as seldom as possible to avoid a disaster. [defined as: broken package manager, breaking connectivity and ability to upgrade for all users at once] There are too many combinations of versions. [Whonix 12 vs 13; Qubes 3.0 vs 3.1; Whonix stable vs testers ; Qubes stable vs testing] Very time consuming to manually test. And we don't have automated Q/A, CI builds, tests, release manager etc. in place. And this is a big change. [sdwdate was rewritten in meanwhile and dependencies changed]
|
Marek Marczykowski-Górecki:
No upgrade to Whonix 12. I release Whonix stable upgrades with the smallest possible diff and as seldom as possible to avoid a disaster. [defined as: broken package manager, breaking connectivity and ability to upgrade for all users at once] There are too many combinations of versions. [Whonix 12 vs 13; Qubes 3.0 vs 3.1; Whonix stable vs testers ; Qubes stable vs testing] Very time consuming to manually test. And we don't have automated Q/A, CI builds, tests, release manager etc. in place. And this is a big change. [sdwdate was rewritten in meanwhile and dependencies changed] |
andrewdavidwong
added
the
C: Whonix
label
Apr 7, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adrelanos
May 17, 2016
Member
What's next in this ticket? It is fixed in Whonix 13 which is due to be released soon-ish. I just now verified that again.
Do we want to keep tickets open until an upgrade has been released to stable that fixes it?
Or do we want to close tickets as soon as the code to implement them is done?
|
What's next in this ticket? It is fixed in Whonix 13 which is due to be released soon-ish. I just now verified that again. Do we want to keep tickets open until an upgrade has been released to stable that fixes it? Or do we want to close tickets as soon as the code to implement them is done? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
May 17, 2016
Member
Do we want to keep tickets open until an upgrade has been released to stable that fixes it?
Or do we want to close tickets as soon as the code to implement them is done?
It's Marek's call, of course, but from what I've observed, I think the current practice is to close them once the code is done. Then, we have qubes-builder-github for notifications once packages containing fixes/features are available in repos.
It's Marek's call, of course, but from what I've observed, I think the current practice is to close them once the code is done. Then, we have |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Please close. |
marmarek commentedFeb 19, 2016
After system suspend, sys-whonix cannot connect to Tor, when trying to reach any site, it logs:
I guess it's because of desynchronized clock - now is Feb 19 16:36:42.
When I set date manually (
date -s 'Feb 19 16:30:00') to some approximately current value, it works again:First of all, currently sys-whonix have no idea when system was suspended. To have any solution for this problem, probably it should change. Is it possible to somehow force reconnection to Tor, even with such large clock difference? IIUC it is required for
sdwdateto sync the time.Another solution would be to properly suspend the VM. This mean the kernel would sync the time after resume (based on clocksource
xen? not sure, but it works forsys-net)./cc @adrelanos