New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird race condition that makes DNS ProxyVM rules disappear #2227

Closed
Rudd-O opened this Issue Aug 4, 2016 · 8 comments

Comments

Projects
None yet
4 participants
@Rudd-O

Rudd-O commented Aug 4, 2016

Qubes 3.2. Fedora 23 minimal template.

I haven't traced the cause, but from time to time (usually after starting the second VM attached to a particular ProxyVM) the PR-QBS rules disappear.

Running /usr/lib/qubes/init/network-proxy-setup.sh in the ProxyVM restores them.

What could it be?

Is there any logging service where events that alter these rules get collected? Should there be? I think there should be.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 4, 2016

Member

I've seen this too, but haven't managed to trace it down yet. It may be somehow related to #1067 or #2210
I think just /usr/lib/qubes/qubes-setup-dnat-to-ns should be enough to fix.

Member

marmarek commented Aug 4, 2016

I've seen this too, but haven't managed to trace it down yet. It may be somehow related to #1067 or #2210
I think just /usr/lib/qubes/qubes-setup-dnat-to-ns should be enough to fix.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Aug 4, 2016

It would be easy to trace if every event that altered networking posted
a log to dom0 about what it just did. The log entries should mark when
the event starts and when the event finishes, and the Qubes service that
receives the log should use a lock that enforces strict ordering of
incoming messages, so that we can discover race conditions.

Rudd-O
http://rudd-o.com/

Rudd-O commented Aug 4, 2016

It would be easy to trace if every event that altered networking posted
a log to dom0 about what it just did. The log entries should mark when
the event starts and when the event finishes, and the Qubes service that
receives the log should use a lock that enforces strict ordering of
incoming messages, so that we can discover race conditions.

Rudd-O
http://rudd-o.com/
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 4, 2016

Member

Fedora by default enables auditing, which include all iptables
modifications. Like this:

[1521334.325092] audit: type=1325 audit(1470348234.636:43327):
table=filter family=2 entries=14
[1521334.325198] audit: type=1300 audit(1470348234.636:43327):
arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40
a3=55b7c9ab5fc0 items=0 ppid=27222 pid=27256 auid=4294967295 uid=0 gid=0
euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts8 ses=4294967295
comm="iptables" exe="/usr/sbin/xtables-multi" key=(null)
[1521334.325272] audit: type=1327 audit(1470348234.636:43327):
proctitle=69707461626C6573002D41004F5554505554002D6A00414343455054

There are also events of service start/stop, so it should be possible to
trace it this way, at least in theory.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Aug 4, 2016

Fedora by default enables auditing, which include all iptables
modifications. Like this:

[1521334.325092] audit: type=1325 audit(1470348234.636:43327):
table=filter family=2 entries=14
[1521334.325198] audit: type=1300 audit(1470348234.636:43327):
arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40
a3=55b7c9ab5fc0 items=0 ppid=27222 pid=27256 auid=4294967295 uid=0 gid=0
euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts8 ses=4294967295
comm="iptables" exe="/usr/sbin/xtables-multi" key=(null)
[1521334.325272] audit: type=1327 audit(1470348234.636:43327):
proctitle=69707461626C6573002D41004F5554505554002D6A00414343455054

There are also events of service start/stop, so it should be possible to
trace it this way, at least in theory.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Aug 4, 2016

On 08/04/2016 10:06 PM, Marek Marczykowski-Górecki wrote:

Fedora by default enables auditing, which include all iptables
modifications. Like this:

That's very much not what I was suggesting. I was suggesting a
centralized logging service so we can correlate events actuated by the
Qubes subsystem. Correlating these events visually between VMs is
guaranteed to discourage me and others from doing so in order to trace
bugs like these.

Rudd-O
http://rudd-o.com/

Rudd-O commented Aug 4, 2016

On 08/04/2016 10:06 PM, Marek Marczykowski-Górecki wrote:

Fedora by default enables auditing, which include all iptables
modifications. Like this:

That's very much not what I was suggesting. I was suggesting a
centralized logging service so we can correlate events actuated by the
Qubes subsystem. Correlating these events visually between VMs is
guaranteed to discourage me and others from doing so in order to trace
bugs like these.

Rudd-O
http://rudd-o.com/
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 4, 2016

Member

Are you talking about #830 ?

Anyway, for this problem all you need to know is in a single VM. Including that new interface resulting from starting another VM - it is new interface in the ProxyVM, it doesn't matter what the other VM do in the meantime.

Member

marmarek commented Aug 4, 2016

Are you talking about #830 ?

Anyway, for this problem all you need to know is in a single VM. Including that new interface resulting from starting another VM - it is new interface in the ProxyVM, it doesn't matter what the other VM do in the meantime.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Aug 4, 2016

No, that is not enough to diagnose the problem effectively. The problem
appears to be connected to starting a second VM connected to the same
ProxyVM.

Without a centralized log of entire system state changes including what
the VMs are doing, bugs like these will remain exceedingly hard to fix.

Rudd-O
http://rudd-o.com/

Rudd-O commented Aug 4, 2016

No, that is not enough to diagnose the problem effectively. The problem
appears to be connected to starting a second VM connected to the same
ProxyVM.

Without a centralized log of entire system state changes including what
the VMs are doing, bugs like these will remain exceedingly hard to fix.

Rudd-O
http://rudd-o.com/

@andrewdavidwong andrewdavidwong added this to the Release 3.2 milestone Aug 5, 2016

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 12, 2016

Haha! Found the bug! And fixed it!

QubesOS/qubes-core-agent-linux#20

Rudd-O commented Oct 12, 2016

Haha! Found the bug! And fixed it!

QubesOS/qubes-core-agent-linux#20

@jpouellet

This comment has been minimized.

Show comment
Hide comment
@jpouellet

jpouellet Nov 23, 2016

Contributor

This appears resolved to me.

I have not experienced this issue since applying QubesOS/qubes-core-agent-linux#20 (and it's been several times the previous average interval of occurrence), and the patch has since been merged to -current.

Close?

Contributor

jpouellet commented Nov 23, 2016

This appears resolved to me.

I have not experienced this issue since applying QubesOS/qubes-core-agent-linux#20 (and it's been several times the previous average interval of occurrence), and the patch has since been merged to -current.

Close?

@marmarek marmarek closed this Nov 23, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment