switch ZFS volume backend to pyzfs / libzfs_core #1781

avg-I · 2015-07-17T20:11:40Z

Please DO NOT MERGE yet.
At the moment this pull request and branch are for reviewing only.

Some special actions are needed to test / build this branch (so please ignore failing builds).
ZFS kernel module and userland components should be built and installed using source code in the following branch: https://github.com/ClusterHQ/zfs/tree/libzfs_core-wip
pyzfs / libzfs_core has to be installed from the following branch: https://github.com/ClusterHQ/pyzfs/tree/pyzfs-0.2

See: https://clusterhq.atlassian.net/browse/FLOC-2683

This is a start of the conversion of the ZFS backend to libzfs_core / pyzfs.

The helper wraps lzc functions but also provides a method to schedule any function without creating a wrapper. Currently a thread pool of a reactor is used, but a dedicated pool could be used. Test reactor(s) without a thread pool are also supported.

Also, remove all unit tests for snapshots. The unit tests were based on the fact that the operations were done via zfs command, so they checked that the command was spawned with the correct arguments and the output was correctly parsed. To do: invent the way to test the new implementation

Note that lzc_create() never mounts the filesystem. 'zfs create' used to mount the filesystem, so now we have to do that explicitly and at the moment that is done via 'zfs mount'.

Note: lzc_send()/lzc_receive() do not close their end of the pipe (the file descriptor, in general) when they are done writing to it. Note: this API is synchrnous at the moment including a requirement for a synchronous cleanup of the file descriptors. There are functional tests to verify that.

The properties are readonly and mountpoint properties. Note that one thing that all these operations have in common is that they may require a dataset to be unmounted and/or mounted.

In theory two snapshots can the same 'creation' time because its resolution is on second. In fact, I've seen a test failing because of that. createtxg is always unique, because ZFS does not allow to created more than one snapshot for the same filesystem within the same transaction group.

Also, internal _list_filesystems() method is changed to return an iterator rather than a deferred that produces an iterator. It's not clear why we need the iterator at all because the single user of this method consumes the whole iterator and builds a set.

Note: that we need to explicitly unmount the filesystem before destroying it.

…core _sync_command_error_squashed() became obsolete and was removed. Its unit tests are removed as well.

'zfs umount' refuses to work on a ZFS filesystem if its mountpoint property is set to legacy or none. On the other hand, 'umount' can handle all property settings. So, prefer using the latter. The ZFS backend itself never sets mountpoint to those values, but there is at least one funcitonal test case that sets up a requried scenario by directly fiddling with the property. To implement the change zfs_command() function is split into ext_command() that can handle any external command and zfs_command() convenience facade.

wallrj · 2015-08-03T09:34:38Z

flocker/volume/filesystems/zfs.py

@@ -7,11 +7,13 @@
 from __future__ import absolute_import

 import os
+import libzfs_core


I guess we'll need to think about how to specify this dependency.
Simplest thing would be to add it to requirements.txt but first we'll have to make sure that libzfs_core is available from PyPI.
Perhaps it's better to have some separate requirements.txt files for each of the Flocker dataset backend plugins.

We currently drag in all the requirements.txt in our "other" linux flocker-cli instructions which means that installing flocker-ca involves downloading all the AWS boto, OpenStack Nova / Cinder / Keystone libraries and now libzfs_core and its dependencies...which seems excessive. I wonder how other projects deal with this?

We could perhaps have a zfs-requirements.txt and corresponding separate OS packages you need to install, so e.g. clusterhq-flocker-zfs if you want ZFS backend.

Publishing libzfs_core on PyPI is the plan.
I think that we should do that after its review is completed.

wallrj · 2015-08-03T10:47:17Z

@avg-I I only got halfway through, but I left some comments on what I read.

I know that libzfs_core is intended to be a low level wrapper. Perhaps this branch can contain a rough first version of higher level Sync and Async APIs with just enough Zope Interface methods to support Flocker. And tests and functional tests that exercise that API. Later that can be moved back into libzfs or into a separate external package.

avg-I · 2015-08-03T16:31:41Z

@wallrj your suggestion about having a higher level wrapper makes sense.
I think that I need some help designing it, so that it is sufficiently convenient both in general and for our ZFS backend specifically.

itamarst · 2015-08-05T15:30:59Z

IFilesystem was basically "what we can do as quickly as possible", yeah, not ideal. There's no particular reason it needs to stay this way.

itamarst · 2015-08-05T15:33:06Z

My general thought: if this consistently passes tests and basic comments are addressed, I don't think this needs to be perfect to be merged. The current code after all is not exactly a shining example of quality code. We do need to make sure we file issues in the appropriate epic for the all the necessary improvements (error handling, IFilesystem rewrite) etc. to make this workable.

That being said, we do use this for our demo machines, so we need passing tests (both unit, functional and acceptance) for this to be merged. The acceptance builder is currently in Expected Failures, so you'll have to check it manually on Buildbot.

itamarst · 2015-08-05T15:34:44Z

Since we can't actually proceed with this (even given minimal quality requirements) until we've e.g. done the libzfs_core packaging, I'm going to move this back to Coding state.

exarkun · 2015-08-10T13:59:59Z

flocker/volume/filesystems/zfs.py

+            except Exception as e:
+                message = ZFS_ERROR(zfs_command="lzc_send " + snapshot,
+                                    output=str(e), status=e.errno)
+                message.write(self.logger)


I think Eliot's write_traceback is probably produces better logging results in this case. "except Exception" basically means "I dunno what went wrong" so gathering as much information in the logs as possible makes sense. write_traceback should do a better job of that than this ZFS_ERROR.

Actually only ZFSError is expected here (Exception is just in case, to not miss the clean-up) and that exception should have enough information. Adding the traceback may be a good idea anyway.

Generally, it might be a good idea to add Eliot action logging in the same universal wrapper that turns synchronous pyzfs calls into asynchronous ones. That way, all pyzfs calls and their results would get logged.

Although, in this particular piece of code the synchronous API is used directly.

Actually only ZFSError is expected here (Exception is just in case, to not miss the clean-up) and that exception should have enough information. Adding the traceback may be a good idea anyway.

I guess I'm not entirely sure what information it will have. I know that the exception that used to be handled like this included the output of the zfs child process and that was effectively all of the useful information (even though it excluded some Python stack frames having to do with the implementation of running child processes). Now the exception comes from some cffi-based code and I'm less sure the traceback is irrelevant. It still could be but I'd like to err on the side of seeing it until we're more confident about that.

Also, mostly separately, on this re-read of the code, I think that this function is being changed so that it no longer actually fails if the zfs send operation has failed.

Prior to the change, if the zfs child process exited with an error code, I think process.wait() would have raised an exception and that would have caused this reader function to raise an exception.

With the new version of the code, if lzc_send raises an exception then it gets logged and then reader returns normally - indicating to the caller that the necessary data has been generated. I think the right thing to do might be to just delete all of this exception handling code and let the exception propagate to the caller of reader as it did before?

My understanding is that Popen.wait() returns a status code and does not raise an exception if the status is not success. In fact, I experimented with letting through the lzc_send exceptions and that caused many ZFS functional tests to fail.

Yep, you're right. My mistake. So this code was terribly broken before. It'll still be terribly broken in just the same way, now. But I guess the point of this PR isn't to make the code work, just to make it use pyzfs. 😄

So I guess you can ignore the comments about this exception handling and someone will need to come fix this (and the other issues) later.

And we probably want to change the IFilesystem reader and write interfaces as well...

deferLater() unlike maybeDeferred ensures that the reactor is actually running. This is required by the new ZFS backend, because it uses the reactor's thread pool to run lzc_send() and lzc_receive() in separate threads. That has to be done because those calls lend their context to the kernel for doing the actual send / receive work. The main thread pumps the stream data between the userspace and the kernel.

ci-jenkins-clusterhq · 2015-10-29T17:05:08Z

Can someone verify this PR?

The latter can not automatically derive a snapshot name from the given stream. See ZFS-30 for more details.

Andriy Gapon added 14 commits July 17, 2015 19:07

zfs: reimplement Filesystem._exists using lzc_exists

b3770ed

This is a start of the conversion of the ZFS backend to libzfs_core / pyzfs.

zfs: switch filesystem creation to lzc

9a8d17c

Note that lzc_create() never mounts the filesystem. 'zfs create' used to mount the filesystem, so now we have to do that explicitly and at the moment that is done via 'zfs mount'.

zfs: convert set_maximum_size() to libzfs_core

5f66c2d

zfs: delete a comment paragraph that was meant to be deleted earlier

fa0a0ff

zfs: switch cloning, renaming and setting some properties to lzc

a44576f

The properties are readonly and mountpoint properties. Note that one thing that all these operations have in common is that they may require a dataset to be unmounted and/or mounted.

zfs: replace snapshot listing via 'zfs' with lzc_list_snaps()

bcd8bf1

zfs: convert filesystem (with snapshots) destruction to libzfs_core

35ce7c2

Note: that we need to explicitly unmount the filesystem before destroying it.

zfs: convert the initial setup of the pool root filesystem to libzfs_…

f4eb274

…core _sync_command_error_squashed() became obsolete and was removed. Its unit tests are removed as well.

wallrj reviewed Aug 3, 2015
View reviewed changes

Andriy Gapon added 2 commits August 7, 2015 18:05

zfs: add basic docstrings for _AsyncLZC and the related code

8308e8a

zfs: catch up with lzc_create signature change

3d8f9cb

Andriy Gapon added 2 commits August 10, 2015 13:20

log lzc_send and lzc_receive exceptions

dcc914f

zfs backend: remove _FakeAsyncLZC that fell out of use

4d2bbdb

exarkun reviewed Aug 10, 2015
View reviewed changes

Andriy Gapon added 7 commits August 14, 2015 13:21

lzc_send/lzc_receive: log traceback in addition to exception

e387cf5

add pyzfs to requirements

47dfab0

Merge remote-tracking branch 'origin/master' into pyzfs-FLOC-2683

3123937

update pyzfs requirements

7cfdc3a

Merge remote-tracking branch 'origin/master' into pyzfs-FLOC-2683

80c1a02

Merge branch 'master' into pyzfs-FLOC-2683

ead482e

Andriy Gapon added 2 commits October 30, 2015 13:34

Merge branch 'master' into pyzfs-FLOC-2683

3801ea8

use lzc_receive_with_header() instead of lzc_receive

8ba3807

The latter can not automatically derive a snapshot name from the given stream. See ZFS-30 for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch ZFS volume backend to pyzfs / libzfs_core #1781

switch ZFS volume backend to pyzfs / libzfs_core #1781

avg-I commented Jul 17, 2015

wallrj Aug 3, 2015

wallrj Aug 3, 2015

itamarst Aug 3, 2015

avg-I Aug 3, 2015

wallrj commented Aug 3, 2015

avg-I commented Aug 3, 2015

itamarst commented Aug 5, 2015

itamarst commented Aug 5, 2015

itamarst commented Aug 5, 2015

exarkun Aug 10, 2015

avg-I Aug 13, 2015

avg-I Aug 13, 2015

exarkun Aug 13, 2015

avg-I Aug 13, 2015

exarkun Aug 13, 2015

avg-I Aug 13, 2015

ci-jenkins-clusterhq commented Oct 29, 2015

switch ZFS volume backend to pyzfs / libzfs_core #1781

Are you sure you want to change the base?

switch ZFS volume backend to pyzfs / libzfs_core #1781

Conversation

avg-I commented Jul 17, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wallrj commented Aug 3, 2015

avg-I commented Aug 3, 2015

itamarst commented Aug 5, 2015

itamarst commented Aug 5, 2015

itamarst commented Aug 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ci-jenkins-clusterhq commented Oct 29, 2015