New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage Subsystem Design R4 #1842

Closed
kalkin opened this Issue Mar 16, 2016 · 14 comments

Comments

Projects
None yet
3 participants
@kalkin
Copy link
Member

kalkin commented Mar 16, 2016

Design

This is a deisign proposal for the storage subsystem for QubesOS R4.0. For previous discussions see also the PR: Allow defining and using custom Storage types

Requirements

  • Multiple storage implementations like file based (current storage method), LVM Thin Pools, Btrfs, ZFS ...
  • 3rd parties can develop custom storage implementations
  • uses entry-points for discovery of available storage backends #1238
  • Be independent enough to work on it's own in a separate storage domain
  • The AppVM needs to be able to access a read only root image of the TempalteVM, so it can verify it.
  • Import/Export storage volumes between pools
  • Domains can reuse different pools for different StorageVolumes
  • Differentiate between permanent and temporary attached devices
  • Handle removable devices (or should we instead save permanent?)
  • Pool should know all devices it handles.
  • Add Volume.verify()
  • Update here documented Interfaces
  • Revert, list revert points

Volume

Encapsulates all data about a volume for serialization to qubes.xml and libvirt config.

Interface

class Volume:
    devtype = 'disk'  # Used for libvirt <disk device="devtype">
    domain = None  # None is currently the same as dom0
    name = 'root' #  (root|private|volatile|linux-kernel|foo)
    pool = 'qubes_dom0'  # Pool name
    rw = True
    script = None  # Libvirt' s <disk><script /></disk>
    size = 0
    usage = 0
    vid = None # some kind of id must be unique in a pool. XenStorage uses path for that
    volume_type = 'read-write'  # (read-write|read-only|origin|snapshot|volatile)

    @property
    def config(self):
        ''' return config data for serialization to qubes.xml as (<volume_config>) '''
        return {'name': self.name,
                'pool': self.pool,
                'volume_type': self.volume_type}

    def block_device(self):
        ''' Return :py:class:`qubes.devices.BlockDevice` for serialization in
            the libvirt XML template as <disk>.
        '''
        return BlockDevice(self.path, self.name, self.script, self.rw,
                           self.domain, self.devtype)

Pool

A Pool is used to manage different kind of volumes (File based/LVM/Btrfs/...).

Interface

3rd Parties providing own storage implementations will need to implement the following interface.

class Pool:
    def init(self, vm, name, **kwargs) # TODO Refactor vm out
    def init_volume(volume_config) # Creates a Volume object from given volume_config
    def create(self, volume)  # Create volume on disk
    def commit_template_changes(volume)
    def clone(source_volume, target_volume)
    def rename(volume, old_name, new_name)
    def start(volume)
    def stop(volume) 
    @property
    def volumes(self): # all volumes managed by this pool, TODO imlement this

Storage

The Storage class provides managment methods for domain's volumes. The method are called by the volume at the appropriate time. Currently it's in qubes/storage/__init__.py, but I'm considering to move it somewhere else, or make it a part of QubesVM, because most of the methods just iterate over self.vm.volumes and execute a method. See also my current Storage version

Interface

class Storage:
    def __init__(vm):
        for conf in vm.volumes_config:  # simplified!
            vm.volumes[conf['name']] = pool.init_volume(conf)

    def kernels_dir() # backward compatibility will be removed

    def create_on_disk(self)  # Create volumes on disk, TODO rename to create()
        os.makedirs(self.vm.dir_path)
        for volume in self.vm.volumes.values():
            self.get_pool(volume).create(volume)

        os.umask(old_umask)

    def commit_template_changes()
        for v in self.vm.volumes.values():
            if v.volume_type == 'origin':
                self.get_pool(v).commit_template_changes(v)
    def clone(self, src_vm)
    def rename(old_name, new_name)
    def start()
    def stop() 
    def verify_files(): # Just does some sanity checks currently.
    def get_pool(volume): # Helper method
        return qubes.storage.get_pool(volume, self.vm) 
       # The above will be replaced once vm is refactored out of the pool with: 
       return qubes.storage.get_pool(volume, vm.pool_config[volume.pool])

Further Details

  • QubesVM.volume_config would contain a dict {'volume_name':{config}} from the xml configuration for the current domain.
  • QubesVM.volumes is a dict containing { 'root_img' : Volume, 'private_img' : Volume, ...}
  • Qubes.pool_config contains the pool config parsed from qubes.xml. Will be replaced Qubes.pool_config with Qubes.pools containing *Pool

Open Questions

  • What should be the parameter to Volume.resize()? There might be volume implementations which have strategies for shrinking. Should resize also accept smaller sizes as the current one? Or should we even have extend() and shrink()

EDITS:

  • 2016-03-17 Change signature of Storage.import()
  • 2016-04-10 Document the current API

@kalkin kalkin changed the title Storage Subsystem Design R4 WIP: Storage Subsystem Design R4 Mar 16, 2016

@kalkin kalkin changed the title WIP: Storage Subsystem Design R4 Storage Subsystem Design R4 Mar 17, 2016

@andrewdavidwong andrewdavidwong added this to the Release 4.0 milestone Apr 6, 2016

@marmarek

This comment has been minimized.

Copy link
Member

marmarek commented Apr 10, 2016

Could you remind why Storage.get_pool(volume), not simply Volume.pool (object reference, not its name)?
How volume life cycle looks like? For example qvm-create + qvm-start:

  1. During qvm-create - first instantiate of the QubesVM object - what state is in vm.storage? And what in vm.volume_config (is there such thing?)
  2. After vm.create_on_disk - the same question
  3. After app.save() - how storage-related information will look like in qubes.xml?
  4. After loading of qubes.xml - how each of those objects got loaded? (I assume the final state should be exactly the same as in step 2).
  5. During domain startup - what functions get called and by whom?
@kalkin

This comment has been minimized.

Copy link
Member

kalkin commented Apr 10, 2016

Could you remind why Storage.get_pool(volume), not simply Volume.pool (object reference, not its name)?

Yes object reference is better. I should switch to that.

How volume life cycle looks like?

  • AppVM and TemplateVM have default volume_config set
  • QubesVM.__init__(self, app, xml, volume_config={}, **kwargs) Update volume config
    • If XML is passed, parse XPath(domain/volume-config/volume) and update self.volume_config
    • If a volume_config parameter was passed, update self.volume_config
  • On QubesVM.on_domain_init_loaded() the storage is initialized. Storage.__init__(vm) initializes the volumes from the volume_config:
        if hasattr(vm, 'volume_config'):
            for name, conf in self.vm.volume_config.items():
                pool = self.get_pool(conf['pool'])
                vm.volumes[name] = pool.init_volume(conf)

For example qvm-create

  • QubesVM.create_on_disk(source_template) calls Storage.create_on_disk(source_template) (see design above). At this point Imagefiles are written to disk.

Example qvm-start

  • QubesVM.start() calls Storage.start() which iterates through all volumes of the vm and calls *Pool.start():
        for volume in self.vm.volumes.values():
            self.get_pool(volume).start(volume)
  1. During qvm-create - first instantiate of the QubesVM object - what state is in vm.storage?

No state at all. It just nicer not to have all this logic in QubesVM. I already mentioned, may be Storage should be merged with QubesVM

And what in vm.volume_config (is there such thing?)

Yes there is such thing. See default values for AppVM and for TemplateVM. This are updated like i explained above.

2 After vm.create_on_disk - the same question

Nothing changes.

3 After app.save() - how storage-related information will look like in qubes.xml?

<domain>
...
<volume-config>                                                           
<volume name="root" pool="default" volume_type="snapshot"/>              
<volume name="volatile" pool="default" volume_type="volatile"/>
<volume name="private" pool="mypool" volume_type="read-write"/>
<volume name="kernel" pool="linux-kernel" volume_type="read-only"/>      
</volume-config>  
...
</domain>

4 After loading of qubes.xml - how each of those objects got loaded? (I assume the final state should be exactly the same as in step 2).

  1. QubesVM.__init__ parses the volume_config from xml and updates the default volume_config. This is also how an AppVM could have more volumes, which were assigned to it i.e.: via qvm-block
  2. QubesVM.on_domain_init_loaded() initializes storage, pools(only currently, because of the need of passing a vm), and initializes the volumes.

5 During domain startup - what functions get called and by whom?

  1. QubesVM.start() calls Storage.start() which calls for each volume *Pool.start(volume). At this point the volatile volume is reset, the lvm pool would do a snapshot from the origin for root of an AppVM or any other pool implementation specific thing.
  2. QubesVM.block_devices() property is called by the jinja2 template. It returns [v.block_device() for v in self.volumes.values()]. BlockDevice contains all the data needed to be written to a libvirt xml
@marmarek

This comment has been minimized.

Copy link
Member

marmarek commented Apr 10, 2016

Ok, now it is much clearer for me. Few more questions:

How Volume.vid got initialized and then saved? Is it simply missing in above example xml? Or maybe it is calculated dynamically by vm.volumes[name] = pool.init_volume(conf)?
How root volume of AppVM know which template's root image should be used? Is it done by pool.init_volume(conf) too (how?)? or maybe some other method? The same question for kernel volume (how does it get vm.kernel).
Will vm1.volumes['root'] be different than vm2.volumes['root'] (having the same template)? Will it be different than template.volumes['root'] (I guess so)? How about its vid? Will it be the same, but differ in volume_type attribute?
I assume for template switch, vm.volumes['root'] will be replaced by appropriate event handler (property-set:template event).

@kalkin

This comment has been minimized.

Copy link
Member

kalkin commented Apr 10, 2016

How Volume.vid got initialized and then saved? Is it simply missing in above example xml?

This depends on pool implementation. XenPool does not need it. It only needs vm type, name and volume name to locate the volume. Maybe in future But currently we don't need it.

Or maybe it is calculated dynamically by vm.volumes[name] = pool.init_volume(conf)?
Yes it is. In the xen pool implementation Volumes.vid is set to Volumes.path, but not saved because not needed.

How root volume of AppVM know which template's root image should be used?

This is implementation specific. Here how it looks for XenPool. It instantiates a XenPool for the TemplateVM of the AppVM to get the directory where the original image is. Now when i explained that, this sounds really hacky 😟

The same question for kernel volume (how does it get vm.kernel).

Currently because we pass always the vm object to the pool, it just picks it from self.vm.kernel 😊. I can change the kernel pool so that it takes the default kernel version on creation and saves it as vid if no other vid is provided.

Will vm1.volumes['root'] be different than vm2.volumes['root'](having the same template)?

Will it be the same Volume instance? - No.
Will it be the same device/file? - This is implementation specific.

Will it be different than template.volumes['root'](I guess so)? How about its vid? Will it be the same, but differ in volume_type attribute?

Yes an AppVM would have a volume named root of volume_type snapshot, while the template would have a volume named root of volume_type origin

Generally a vid would be something unique for the pool. Like i explained above you rarely need the vid if you get a vm object passed.

I assume for template switch, vm.volumes['root'] will be replaced by appropriate event handler (property-set:template event).

I must admit I have not thought through this, because if you get passed the vm you can always grab the template and do "autodiscovery", but if we make Pool independent from vm we can use an event handler.

@kalkin

This comment has been minimized.

Copy link
Member

kalkin commented Apr 10, 2016

Generally a vid would be something unique for the pool

I'm sorry, not unique! A vid is just something what makes sense for the pool. I.e an LvmPoolSnapshotVolume would have the vid set to the name of the LvmOriginalVolume volume. So multiple AppVMs would have a root config which has the same vid. This doesn't matter, because the pool logic will make sure that a 'snapshot' volume_type is always mounted read-only and will not change.

@marmarek

This comment has been minimized.

Copy link
Member

marmarek commented Apr 10, 2016

This is implementation specific. Here how it looks for XenPool. It instantiates a XenPool for the TemplateVM of the AppVM to get the directory where the original image is. Now when i explained that, this sounds really hacky 😟

IMO Pool should not have vm reference, but it should be passed to Pool.init_volume. Then, depending on implementation, Volume instance may keep vm reference, or may drop it.

@kalkin

This comment has been minimized.

Copy link
Member

kalkin commented Apr 10, 2016

IMO Pool should not have vm reference, but it should be passed to Pool.init_volume. Then, depending on implementation, Volume instance may keep vm reference, or may drop it

Sounds reasonable.

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue May 20, 2016

qubes/storage: fix getting origin pool
volume_config parameter is about target volume, not origin.

QubesOS/qubes-issues#1842

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue May 21, 2016

storage/file: simplify search for origin volume of snapshot/read-only
Check directly vm.template, throwing AttributeError when not found.
There may be some value in converting it to more descriptive error, but
since that's mostly for internal users (not user facing actions) don't
bother for now.

QubesOS/qubes-issues#1842
@marmarek

This comment has been minimized.

Copy link
Member

marmarek commented May 21, 2016

Design update proposal:

Add Volume.verify, then call it from Storage.verify_files. And do not check files/images/whatever presence elsewhere (for example volume instantiate).

Rationale:

It should be possible to have full QubesVM object without actual files being in place. Of course such VM can't be started etc. The use case I care mostly here is restoring from backup: it consists of loading qubes.xml from the backup, then pick some of them and only then restore files. Possibly changing some properties in the meantime (for example kernel). So full qubes.xml loading must succeed without VM files being in place.

Other use case is some crash (or even user error) resulting in missing VM files. Missing file of one VM should not block all of them (which is the case when exception is raised during qubes.xml loading).

Right now I've tripped over kernel pool: https://github.com/woju/qubes-core-admin/blob/core3-devel/qubes/storage/kernels.py#L61-L63

@kalkin

This comment has been minimized.

Copy link
Member

kalkin commented May 22, 2016

@marmarek Sounds reasonable

@kalkin

This comment has been minimized.

Copy link
Member

kalkin commented May 22, 2016

@marmarek

This comment has been minimized.

Copy link
Member

marmarek commented May 22, 2016

Yes, exactly :)

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

andrewdavidwong added a commit that referenced this issue May 31, 2016

woju added a commit to woju/qubes-core-admin that referenced this issue Jun 2, 2016

qubes/storage: fix getting origin pool
volume_config parameter is about target volume, not origin.

QubesOS/qubes-issues#1842

woju added a commit to woju/qubes-core-admin that referenced this issue Jun 2, 2016

storage/file: simplify search for origin volume of snapshot/read-only
Check directly vm.template, throwing AttributeError when not found.
There may be some value in converting it to more descriptive error, but
since that's mostly for internal users (not user facing actions) don't
bother for now.

QubesOS/qubes-issues#1842
@andrewdavidwong

This comment has been minimized.

Copy link
Member

andrewdavidwong commented Jun 28, 2016

Does per-VM encryption fall under this issue? [https://github.com//issues/1293#issuecomment-229028321]

@kalkin

This comment has been minimized.

Copy link
Member

kalkin commented Jun 30, 2016

@andrewdavidwong Theoretically it should be possible with a custom storage implementation, but the possibility to encrypt the volatile image could be added even to current storage implementations.

@marmarek

This comment has been minimized.

Copy link
Member

marmarek commented Mar 30, 2018

@marmarek marmarek closed this Mar 30, 2018

andrewdavidwong added a commit that referenced this issue Aug 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment