New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quickly restarting VM leads to race condition, VM startup and keyboard issues #1241

Closed
adrelanos opened this Issue Sep 25, 2015 · 20 comments

Comments

Projects
None yet
3 participants
@adrelanos
Member

adrelanos commented Sep 25, 2015

By switching Qubes VM Manger Firewall tab settings back and forth, I managed to end up with an unstartable VM.

qvm-start my-whonix-ws
--> Creating volatile image: /var/lib/qubes/appvms/my-whonix-ws/volatile.img...
--> Loading the VM (type = AppVM)...
--> Starting Qubes DB...
--> Setting Qubes DB info for the VM...
--> Updating firewall rules...
Traceback (most recent call last):
  File "/usr/bin/qvm-start", line 125, in <module>
    main()
  File "/usr/bin/qvm-start", line 109, in main
    xid = vm.start(verbose=options.verbose, preparing_dvm=options.preparing_dvm, start_guid=not options.noguid, notify_function=tray_notify_generic if options.tray else None)
  File "/usr/lib64/python2.7/site-packages/qubes/modules/000QubesVm.py", line 1788, in start
    netvm.write_iptables_qubesdb_entry()
  File "/usr/lib64/python2.7/site-packages/qubes/modules/006QubesProxyVm.py", line 117, in write_iptables_qubesdb_entry
    self.qdb.rm("/qubes-iptables-domainrules/")
  File "/usr/lib64/python2.7/site-packages/qubes/modules/000QubesVm.py", line 689, in qdb
    self._qdb_connection = QubesDB(self.name)
qubes.qdb.Error: (2, 'No such file or directory')

Can you make head or tail of this?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

Toggling firewall settings of another AppVM behind sys-firewall (not sys-whonix) works as workaround.

Member

adrelanos commented Sep 25, 2015

Toggling firewall settings of another AppVM behind sys-firewall (not sys-whonix) works as workaround.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

Or not. It only makes the "No such file or directory" error in Qubes VM manager temporarily go away. Starting the VM from Qubes VM Manager or command line is still failing.

Member

adrelanos commented Sep 25, 2015

Or not. It only makes the "No such file or directory" error in Qubes VM manager temporarily go away. Starting the VM from Qubes VM Manager or command line is still failing.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 25, 2015

Member

Apparently QubesDB daemon for sys-firewall (or whatever proxyvm you
have there) have died. Check /var/log/qubes/qubesdb.sys-firewall.log.
Check also process list - you should have one qubesdb-daemon process for
each running VM (including dom0).

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Sep 25, 2015

Apparently QubesDB daemon for sys-firewall (or whatever proxyvm you
have there) have died. Check /var/log/qubes/qubesdb.sys-firewall.log.
Check also process list - you should have one qubesdb-daemon process for
each running VM (including dom0).

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

/var/log/qubes/qubesdb.sys-firewall.log:

vchan closed
vchan reconnecting
vchan closed

sys-firewall has qubesdb-daemon running. Also dom0 has a qubesdb-daemon running for sys-firewall.

Member

adrelanos commented Sep 25, 2015

/var/log/qubes/qubesdb.sys-firewall.log:

vchan closed
vchan reconnecting
vchan closed

sys-firewall has qubesdb-daemon running. Also dom0 has a qubesdb-daemon running for sys-firewall.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 25, 2015

Member

And the sys-firewall is the netvm of that AppVM, right? Can you access
its QubesDB from dom0 (qubesdb-list -d sys-firewall /)?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Sep 25, 2015

And the sys-firewall is the netvm of that AppVM, right? Can you access
its QubesDB from dom0 (qubesdb-list -d sys-firewall /)?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

And the sys-firewall is the netvm of that AppVM, right?

Yes.

Can you access its QubesDB from dom0 (qubesdb-list -d sys-firewall /)?

Yes, works. Lists everything.

Member

adrelanos commented Sep 25, 2015

And the sys-firewall is the netvm of that AppVM, right?

Yes.

Can you access its QubesDB from dom0 (qubesdb-list -d sys-firewall /)?

Yes, works. Lists everything.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

And the sys-firewall is the netvm of that AppVM, right?

Ehrm. No. Sorry. my-whonix-ws's netvm is sys-whonix.

Member

adrelanos commented Sep 25, 2015

And the sys-firewall is the netvm of that AppVM, right?

Ehrm. No. Sorry. my-whonix-ws's netvm is sys-whonix.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

sys-whonix also has qubesdb-daemon running.

Member

adrelanos commented Sep 25, 2015

sys-whonix also has qubesdb-daemon running.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

There is another issue in sys-whonix at the moment. But it's not related to Whonix. It also happened in Debian VMs to be earlier. The keyboard layout is messed up. querty even though I usually use qwertz. I had this before. It seems to be a race condition. After reboot it works. So the bug I am reporting here is perhaps just a follow up issue of the one I am describing at the moment.

Member

adrelanos commented Sep 25, 2015

There is another issue in sys-whonix at the moment. But it's not related to Whonix. It also happened in Debian VMs to be earlier. The keyboard layout is messed up. querty even though I usually use qwertz. I had this before. It seems to be a race condition. After reboot it works. So the bug I am reporting here is perhaps just a follow up issue of the one I am describing at the moment.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

qubesdb-read /qubes-keyboard currently fails in sys-whonix. qubesdb-list / works generally, but does not the list the /qubes-keyboard keyword.

Member

adrelanos commented Sep 25, 2015

qubesdb-read /qubes-keyboard currently fails in sys-whonix. qubesdb-list / works generally, but does not the list the /qubes-keyboard keyword.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 25, 2015

Member

Keyboard layout is also passed to the VM through QubesDB, so if there is
a problem with it, keyboard layout also will not be loaded.
Can you access QubesDB of sys-whonix from dom0?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Sep 25, 2015

Keyboard layout is also passed to the VM through QubesDB, so if there is
a problem with it, keyboard layout also will not be loaded.
Can you access QubesDB of sys-whonix from dom0?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

No.

In dom0.

qubesdb-list -d sys-whonix
Failed to connect to sys-whonix daemon
Member

adrelanos commented Sep 25, 2015

No.

In dom0.

qubesdb-list -d sys-whonix
Failed to connect to sys-whonix daemon
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 25, 2015

Member

Ok, so here is the problem. Anything interesting in
/var/log/qubes/qubesdb.sys-whonix.log? Do you remember when
sys-whonix was started? Does it match start time of its
qubesdb-daemon process in dom0? Did you started sys-whonix just
after shutting it down (aka restart)?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Sep 25, 2015

Ok, so here is the problem. Anything interesting in
/var/log/qubes/qubesdb.sys-whonix.log? Do you remember when
sys-whonix was started? Does it match start time of its
qubesdb-daemon process in dom0? Did you started sys-whonix just
after shutting it down (aka restart)?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 25, 2015

Member

I guess it may be some race condition in starting qubesdb-daemon in
dom0, when the previous one is still running for the same VM (which was
just stopped). I've seen this for Windows VMs, but maybe it also happens
for Linux ones...

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Sep 25, 2015

I guess it may be some race condition in starting qubesdb-daemon in
dom0, when the previous one is still running for the same VM (which was
just stopped). I've seen this for Windows VMs, but maybe it also happens
for Linux ones...

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

Anything interesting in /var/log/qubes/qubesdb.sys-whonix.log?

/var/log/qubes/qubesdb.sys-whonix.log:

vchan closed
vchan reconnecting
vchan closed

Do you remember when sys-whonix was started?

No.

Did you started sys-whonix just after shutting it down (aka restart)?

Possibly. I often shut it down and restart right after for testing purposes.

Member

adrelanos commented Sep 25, 2015

Anything interesting in /var/log/qubes/qubesdb.sys-whonix.log?

/var/log/qubes/qubesdb.sys-whonix.log:

vchan closed
vchan reconnecting
vchan closed

Do you remember when sys-whonix was started?

No.

Did you started sys-whonix just after shutting it down (aka restart)?

Possibly. I often shut it down and restart right after for testing purposes.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

Do you have enough info to make this ticket actionable / duplicate of something? Do you want more debug info? Or should I leave it that way for some more time for your consideration?

Just asking, because otherwise I would just try restart sys-whonix or the whole system as workaround.

Member

adrelanos commented Sep 25, 2015

Do you have enough info to make this ticket actionable / duplicate of something? Do you want more debug info? Or should I leave it that way for some more time for your consideration?

Just asking, because otherwise I would just try restart sys-whonix or the whole system as workaround.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 25, 2015

Member

Ok, I think I have all the information needed to fix this. So for now
you can simply restart sys-whonix to have it working again (make sure
that no qubesdb-daemon for it is running before starting it again)

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Sep 25, 2015

Ok, I think I have all the information needed to fix this. So for now
you can simply restart sys-whonix to have it working again (make sure
that no qubesdb-daemon for it is running before starting it again)

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Sep 25, 2015

Member

Alright. Works now again. And btw I just now also was able to reproduce the qubes-keyboard issue with a Debian VM that I shutdown and restarted quickly which strengthens your hypothesis. (#1241 (comment))

Member

adrelanos commented Sep 25, 2015

Alright. Works now again. And btw I just now also was able to reproduce the qubes-keyboard issue with a Debian VM that I shutdown and restarted quickly which strengthens your hypothesis. (#1241 (comment))

@adrelanos adrelanos changed the title from 000QubesVm.py: qubes.qdb.Error: (2, 'No such file or directory') to quickly restarting VM leads to race condition, VM startup and keyboard issues Sep 25, 2015

@marmarek marmarek added this to the Release 3.0 milestone Sep 25, 2015

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Oct 2, 2015

core/start: ensure that the previous QubesDB daemon isn't running
When restarting VM (starting it just after it was shut down), it may
happen that previous `qubesdb-daemon` instance is still running - if VM
doesn't properly terminate the connection, dom0 part will not terminate
immediately, but at next alive check (every 10s). Such `qubesdb-daemon`,
when terminating, will remove pid file and socket file. In case of new
daemon already running it would be those of the new daemon, making the
whole QubesDB of this VM inaccessible for dom0 (`qubesdb-daemon` is
running, but its socket is removed).

To prevent this race, ensure that previous instance is terminated before
starting the new one.
There is no need to manually removing socket file, because if some stale
socket exists, it will be replaced by the new one when new
`qubesdb-daemon` starts up.

QubesOS/qubes-issues#1241

(cherry picked from commit dd1bea9)

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Oct 2, 2015

core/start: ensure that the previous QubesDB daemon isn't running
When restarting VM (starting it just after it was shut down), it may
happen that previous `qubesdb-daemon` instance is still running - if VM
doesn't properly terminate the connection, dom0 part will not terminate
immediately, but at next alive check (every 10s). Such `qubesdb-daemon`,
when terminating, will remove pid file and socket file. In case of new
daemon already running it would be those of the new daemon, making the
whole QubesDB of this VM inaccessible for dom0 (`qubesdb-daemon` is
running, but its socket is removed).

To prevent this race, ensure that previous instance is terminated before
starting the new one.
There is no need to manually removing socket file, because if some stale
socket exists, it will be replaced by the new one when new
`qubesdb-daemon` starts up.

QubesOS/qubes-issues#1241
@schnaser

This comment has been minimized.

Show comment
Hide comment
@schnaser

schnaser Jan 3, 2016

Has a fix been released for this? It also appears to happen when there are "a bunch" of entries in firewall.xml. I can currently reproduce this 100% by:

  1. Create a new VM. Do not start it.
  2. Paste the attached firewall.txt file into /var/lib/qubes/appvms/whatever/firewall.xml
  3. Attempt to start the VM.
  4. Get error

The firewall is nothing special IMO, just a list of the top 50 most popular sites. My banking VM where I first ran into this has 45 entries.

firewall.txt

I notice that the error is slightly different, however so this may well be a separate issue.

error.txt

schnaser commented Jan 3, 2016

Has a fix been released for this? It also appears to happen when there are "a bunch" of entries in firewall.xml. I can currently reproduce this 100% by:

  1. Create a new VM. Do not start it.
  2. Paste the attached firewall.txt file into /var/lib/qubes/appvms/whatever/firewall.xml
  3. Attempt to start the VM.
  4. Get error

The firewall is nothing special IMO, just a list of the top 50 most popular sites. My banking VM where I first ran into this has 45 entries.

firewall.txt

I notice that the error is slightly different, however so this may well be a separate issue.

error.txt

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jan 4, 2016

Member

I think the problem with such "large" firewall is rather #1570

Member

marmarek commented Jan 4, 2016

I think the problem with such "large" firewall is rather #1570

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment