New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qubesadmin/exc.py throws TypeError while initialising QubesException #3809

Open
techgeeknz opened this Issue Apr 9, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@techgeeknz

Qubes OS version:

Qubes release 4.0 (R4.0)

Affected component(s):

qvm-start, probably also other qubesadmin tools


Steps to reproduce the behavior:

  1. Fresh-install Qubes R4.0 with an LVM thin-provisioning metadata volume that is undersized. Expand the thin-provisioning pool to fill the LVM volume group.
  2. Create or start Qubes VMs until the metadata volume reaches its low-water mark.
  3. When the metadata volume is sufficiently full, qvm-start will crash:
Traceback (most recent call last):
  File "/usr/bin/qvm-start", line 5, in <module>
    sys.exit(main())
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_start.py", line 179, in main
    domain.start()
  File "/usr/lib/python3.5/site-packages/qubesadmin/vm/__init__.py", line 100, in start
    self.qubesd_call(self._method_dest, 'admin.vm.Start')
  File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 68, in qubesd_call
    payload_stream)
  File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 483, in qubesd_call
    return self._parse_qubesd_response(return_data)
  File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 102, in _parse_qubesd_response
    raise exc_class(format_string, *args)
  File "/usr/lib/python3.5/site-packages/qubesadmin/exc.py", line 29, in __init__
    message_format % tuple(int(d) if d.isdigit() else d for d in args),
TypeError: not enough arguments for format string

Expected behavior:

qvm-start throws the correct exception, displaying a meaningful error message.

Actual behavior:

exc.py throws a TypeError because of unexpected literal percent (%) signs in the message text (see traceback above)

General notes:

I wrapped line 29 of exc.py within a try..except block:

        try:
                super(QubesException, self).__init__(
                    message_format % tuple(int(d) if d.isdigit() else d for d in args),
                    **kwargs)
        except TypeError:
                super(QubesException, self).__init__(
                    repr((message_format, args)), **kwargs)

This resulted in the following message:

("b'  WARNING: Remaining free space in metadata of thin pool qubes_crypt/pool00 is too low (83.30% >= 80.00%). Resize is recommended.\\n  Cannot create new thin volume, free space in thin pool qubes_crypt/pool00 reached threshold.\\n'", ())

Related issues:

When 100% of the volume group's space is being provisioned, the installer should take steps to ensure the thin pool's metadata volume is appropriately sized. If the intention is to allow the thin pool data and metadata volumes to auto-expand as required, this should also be made clear in the interactive partitioner.

@pdinoto

This comment has been minimized.

Show comment
Hide comment
@pdinoto

pdinoto Apr 11, 2018

Having a similar issue described in this original post on qubes-users mailing list. Will check.

pdinoto commented Apr 11, 2018

Having a similar issue described in this original post on qubes-users mailing list. Will check.

@pdinoto

This comment has been minimized.

Show comment
Hide comment
@pdinoto

pdinoto Apr 12, 2018

I can confirm my issue was also related to the metadata of the default pool00 thin volume being full (>96%). In my case, the result was a broken system, as the volumes for all service VMs became unavailable, even after lvextend qubes_dom0/pool00_tmeta +L+256M and thin_repair qubes_dom0/pool00.

I suspect that having a thin pool metadata exhaustion while performing qube restore/cloning/moving may be corrupting the metadata. The volumes that became inaccessible are all those that were actively used when the issue appeared: all volumes from qubes being restored, -back and -snap volumes of running qubes and for some reason all volumes from sys-* qubes.

(Edited to correct typo indicated ⬇️ by @marmarek)

pdinoto commented Apr 12, 2018

I can confirm my issue was also related to the metadata of the default pool00 thin volume being full (>96%). In my case, the result was a broken system, as the volumes for all service VMs became unavailable, even after lvextend qubes_dom0/pool00_tmeta +L+256M and thin_repair qubes_dom0/pool00.

I suspect that having a thin pool metadata exhaustion while performing qube restore/cloning/moving may be corrupting the metadata. The volumes that became inaccessible are all those that were actively used when the issue appeared: all volumes from qubes being restored, -back and -snap volumes of running qubes and for some reason all volumes from sys-* qubes.

(Edited to correct typo indicated ⬇️ by @marmarek)

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 12, 2018

Member

even after lvextend qubes_dom0/pool00_tdata +L+256M and thin_repair qubes_dom0/pool00.

If that was about metadata, the right volume to extend is "tmeta" one.

Member

marmarek commented Apr 12, 2018

even after lvextend qubes_dom0/pool00_tdata +L+256M and thin_repair qubes_dom0/pool00.

If that was about metadata, the right volume to extend is "tmeta" one.

marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Apr 13, 2018

storage/lvm: fix reporting lvm command error
Escape '%' in error message, as required by Admin API.

Fixes QubesOS/qubes-issues#3809
@techgeeknz

This comment has been minimized.

Show comment
Hide comment
@techgeeknz

techgeeknz Apr 17, 2018

I expect that the consequences of exhausting the tmeta volume are almost as severe as exhausting the tdata volume; in that some (but not all) write requests made to the thin volumes stacked upon it will be denied, leading to data loss and filesystem corruption. From the guest OS's perspective, it will look like a disk media or controller failure has occurred.

It should also be noted, however, that the thin provisioning issue is a separate bug; only superficially related to this one, which is about the software not returning a meaningful error message.

techgeeknz commented Apr 17, 2018

I expect that the consequences of exhausting the tmeta volume are almost as severe as exhausting the tdata volume; in that some (but not all) write requests made to the thin volumes stacked upon it will be denied, leading to data loss and filesystem corruption. From the guest OS's perspective, it will look like a disk media or controller failure has occurred.

It should also be noted, however, that the thin provisioning issue is a separate bug; only superficially related to this one, which is about the software not returning a meaningful error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment