Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Variant unpacker #27

Open
LEW21 opened this issue Sep 27, 2016 · 18 comments
Open

Fix Variant unpacker #27

LEW21 opened this issue Sep 27, 2016 · 18 comments

Comments

@LEW21
Copy link
Owner

LEW21 commented Sep 27, 2016

GLib.Variant.unpack: https://github.com/GNOME/pygobject/blob/master/gi/overrides/GLib.py#L284

Possibly a patch against pygi, but can also be a pydbus's own function.

@LEW21
Copy link
Owner Author

LEW21 commented Sep 27, 2016

  • Unpack 's' to unicodes on Python 2? No idea. I guess it's simpler to have a single codebase when assuming that unicodes don't exist.

@LEW21
Copy link
Owner Author

LEW21 commented Sep 27, 2016

Note: Patch against pygi is not enough, I can't depend on the latest version of pygi as it's not on pypi.

@LEW21
Copy link
Owner Author

LEW21 commented Sep 27, 2016

Note: If I decide to auto-remove trailing \0 from ay when unpacking, I'll need to "repack" all GLib.Variants received from the user - to add trailing \0 (as changing the behavior of the constructor seems evil, and monkeypatching this change is even more evil)

@smcv
Copy link

smcv commented Sep 28, 2016

For context, I am the maintainer of dbus-python, or possibly the unmaintainer. This mostly involves rejecting changes that would make it lose its single most important feature, namely "is compatible with dbus-python". I try to encourage all dbus-python users to use something better instead - perhaps g-i or pydbus.

If I was writing a Python D-Bus binding now, with no compatibility constraints, I think I'd be making all D-Bus types (or at least all types without a perfect match in Python) unmarshal into GVariant-like objects rather than the Python primitives that they superficially resemble, and just giving them the appropriate methods (including __magic__ operator-overloading if required) to make them behave more Pythonically. Indeed, perhaps there's scope for adding __magic__ to the g-i overrides for GVariant to make GVariants behave more Pythonically, like making arrays nicely iterable if they aren't already.

currently [ay are] unpacked to a list of ints, which is always wrong

dbus-python also unpacks ay to a dbus.Array of dbus.Byte (subtypes of: a list of ints). This is consistent with the handling of all other arrays and the handling of lone bytes, but it is never what anyone wants in practice. I think a special case returning either bytes or a pydbus-specific object would be the pragmatic thing to do, similar to how a{xy} becomes a dict, not a list of dict-entry objects.

dbus-python also has a byte_arrays=True flag you can set on various APIs to make them return dbus.ByteArray objects instead (they are a subtype of bytes, if I remember correctly).

Unpack 's' to unicodes on Python 2?

dbus-python does that (its dbus.String is a subtype of unicode in both Python 2 and Python 3). In Python 2, it has a flag utf8_strings=True which you can set on various APIs to make them return dbus.UTF8String, a subtype of str. In Python 3 that flag makes no sense and I don't think it's available any more.

D-Bus type s is guaranteed to be Unicode (on the wire: UTF-8) and not contain any \0.

If I decide to auto-remove trailing \0 from ay when unpacking

You can't. When a generic D-Bus library receives an ay from D-Bus, there is no way to know whether it is intended to be a bytestring (in which case removing the trailing \0 is helpful), or whether it is an opaque blob of data like a JPEG or something (in which case removing a trailing \0 would be harmful, even if the opaque blob of data happens to contain exactly one \0, at the end).

Higher-level things that actually understand the semantics of what they receive - for instance something that uses Flatpak - can make the decision to call `.rstrip(b'\0'); but a generic D-Bus library like pydbus only understands the syntax of the message content, not the semantics, so it can't do that.

In a Python 3 way of thinking, I would personally be inclined to return a bytes with the trailing \0, or possibly a subtype of bytes with a to_bytestring() method that does the same thing as g_variant_get_bytestring():

class ByteArray(bytes):
    def to_bytestring(self):
        if len(self) > 0 and self[-1] == b'\0':
            bytestring = self[:-1]
            if b'\0' not in bytestring:
                return b''
        return b''

@LEW21
Copy link
Owner Author

LEW21 commented Sep 28, 2016

pydbus is a Pythonic dbus library. This means, that its primary goal is to be as intuitive, and standard-python-like, as possible. To hide as much of DBus's implementation details as possible, and present native Python interface - familiar to web developers and other Python users, who do not necessarily have any C / DBus experience.

In terms of features, this means:

  • pydbus uses standard Python types everywhere where it's possible without guessing. ("In the face of ambiguity, refuse the temptation to guess.") That's why all the types we know from introspection data are translated automatically, but if a method has a variant argument - we force the user to provide GLib.Variant. And that's also why we shouldn't unpack nested variants - to be symmetrical.
  • I don't really want millions of switches that modify the behavior; looks like it is possible to pick defaults fitting 98% of cases, and "Special cases aren't special enough to break the rules."
  • Forcing people to read docs and append "\0" in some cases is a no-go. It might be natural in C, but it's possibly the most unintuitive thing in the world in Python.

pydbus is designed to be the highest-level interface to DBus services; for lower level access there is PyGI. Implementing higher-level, per-interface wrappers should not be necessary. It can be used in code, but it's also awesome (and probably the best) for interacting with DBus thru the interactive console.

Looks like the DBus has standarized on '\0'-terminated binary strings. Do you know any APIs that provide non-'\0'-terminated "ay"s? If not, then I think it's time to standarize that "ay"s shall ALWAYS have trailing '\0', and ALWAYS ignore the last character (and throw an exception if it's not \0). This way, the Flatpak case works perfectly, and people can still send true binary data (containing multiple \0) - it'll simply get another \0 appended at the transport layer, and then stripped by the bindings on the other side.

I guess I should collaborate on it with authors of bindings for other languages.

@smcv
Copy link

smcv commented Sep 28, 2016

Looks like the DBus has standarized on '\0'-terminated binary strings. Do you know any APIs that provide non-'\0'-terminated "ay"s?

Yes: every byte array that is not "basically a string", and some byte arrays that are strings too.

For an example of the former, the Capabilities property in BlueZ takes an arbitrary binary blob, presumably as defined by some Bluetooth specification. I doubt you can unilaterally put an extra zero on the end and expect it to still work.

# bluez-5.36/test/simple-endpoint

#Channel Modes: Mono DualChannel Stereo JointStereo
#Frequencies: 16Khz 32Khz 44.1Khz 48Khz
#Subbands: 4 8
#Blocks: 4 8 12 16
#Bitpool Range: 2-64
SBC_CAPABILITIES = dbus.Array([dbus.Byte(0xff), dbus.Byte(0xff), dbus.Byte(2), dbus.Byte(64)])
# JointStereo 44.1Khz Subbands: Blocks: 16 Bitpool Range: 2-32
SBC_CONFIGURATION = dbus.Array([dbus.Byte(0x21), dbus.Byte(0x15), dbus.Byte(2), dbus.Byte(32)])

...

#Channel Modes: Mono DualChannel Stereo JointStereo
#Frequencies: 32Khz 44.1Khz 48Khz
#CRC: YES
#Layer: 3
#Bit Rate: All except Free format
#VBR: Yes
#Payload Format: RFC-2250
MP3_CAPABILITIES = dbus.Array([dbus.Byte(0x3f), dbus.Byte(0x07), dbus.Byte(0xff), dbus.Byte(0xfe)])
# JointStereo 44.1Khz Layer: 3 Bit Rate: VBR Format: RFC-2250
MP3_CONFIGURATION = dbus.Array([dbus.Byte(0x21), dbus.Byte(0x02), dbus.Byte(0x00), dbus.Byte(0x80)])

For an example of the latter, org.freedesktop.DBus.GetConnectionSELinuxSecurityContext() returns a byte-array with no trailing '\0'.

@smcv
Copy link

smcv commented Sep 28, 2016

If you want to coordinate among binding and library authors, please (encourage them all to) contact dbus@lists.freedesktop.org.

Implementing higher-level, per-interface wrappers should not be necessary.

I can understand the appeal of thinking this, because I used to think the same: who needs domain-specific C/Python/whatever APIs when you can just provide a D-Bus API and use each language's generic bindings?

In my experience, the answer turns out to be that this results in a lot of sharp edges, and an API that can't be used in practice without at least some understanding of D-Bus. You can have a convenient high-level API for some particular domain/problem that can be used without understanding D-Bus, or you can have generic bindings for D-Bus, but I have yet to encounter anything that achieves both.

Looks like the DBus has standarized on '\0'-terminated binary strings

I think the way to think about this is: it's a common convention, and ideally that common convention would be easy, but it can't be done at the expense of the other possible uses for byte-arrays. Not every byte-array is a bytestring.

Saying that D-Bus has standardized on the ay type containing '\0'-terminated strings is like saying that Unix has standardized on files containing lines of text with \n at the end of each line, or saying that Linux has standardized on filenames being UTF-8. We can encourage that, we can make it the default in new designs, and we can optimize APIs for it; but if Python didn't allow reading binary files, or didn't let you distinguish between a text file that has \n as its last byte and a text file that doesn't, or didn't let you delete a file whose name isn't UTF-8, you'd be right to complain and consider it to be a bug.

In fact I'm fairly sure the reason Flatpak is using ay instead of s is because of exactly the invalid assumptions that I mentioned. If we could assume that Unix filenames were UTF-8 and streams were text, then we could have APIs like RunCommand(s: command, s: stdin) → i: exit_status, s: stdout, s: stderr and everything would be simple: but filenames (and hence commands) aren't necessarily UTF-8, and streams aren't necessarily text, so a fully-general API to run commands is more like RunCommand(ay: command_plus_0, ay: stdin) → i: exit_status, ay: stdout, ay: stderr.

@LEW21
Copy link
Owner Author

LEW21 commented Sep 28, 2016

Hm... Maybe we could create a new DBus annotation saying something like "treat all arguments of this method as \0-terminated bytestrings"? This wouldn't break anything, and high-level libraries will be happy.

However, I still think that there is no reason to send \0-terminated strings over the wire, and this \0 could be added transparently by gdbus in g_variant_get_bytestring() and removed in g_variant_new_bytestring(). Of course there might be compatibility problems here, but we could deprecate those functions and add new ones.

Hm... Annotation looks like less work, but it'll have to be implemented in all the high-level libraries (or at least those that want to be nice).

@smcv
Copy link

smcv commented Sep 29, 2016

I still think that there is no reason to send \0-terminated strings over the wire

Whether you consider the reasons to be sufficiently good or not, you're too late to have this not happen in existing APIs.

this \0 could be added transparently by gdbus in g_variant_get_bytestring() and removed in g_variant_new_bytestring()

GVariant cannot add '\0' to bytestrings without copying those bytestrings into newly allocated memory, which a caller would then have to arrange to free at an appropriate time. There is no space in which to put the '\0': if alignment constraints allow it, the byte array is immediately followed by the next item in the message.

I know you don't care about this for pydbus, because Pythonic bindings for C code are always going to have to copy strings into str or bytes objects anyway; but as far as I'm aware, never needing to copy data out of the GVariant (for C/C++) was an explicit design goal. One of the design issues in the D-Bus 1.0 protocol and reference implementation is that they result in memcpy()ing all content several times, which is a significant performance constraint. The people behind things like kdbus and gvdb, which also use GVariant, specifically wanted to avoid that.

@mvidner
Copy link

mvidner commented Sep 29, 2016

ruby-dbus here. As Simon has already said, an ay in general is an "anything goes" byte blob. An annotation seems like a reasonable way to customize bindings for special byte arrays.

@diwic
Copy link

diwic commented Oct 5, 2016

FWIW, Rust does not use zero-terminated strings and thus has no use for the extra zero anyway. Also, for now, we only know of two places where this happens so I doubt it's commonly used.

If the D-Bus specification grows some kind of annotation about autoremoval/autoadd of \0 then I'll consider supporting that in my bindings - until then, it's trivial for whoever uses dbus-rs to do that add/removal manually. It's even easier than it would be for me, considering that these zero-terminated arrays may be inside an arbitrary level of containers.

@jeanparpaillon
Copy link

My raw opinion: given that D-Bus already describes array-length, I don't see any reason for using \0-terminated strings for serializing strings.
Has a bug already been opened on GDbus project ?

@smcv
Copy link

smcv commented Oct 21, 2016

Has a bug already been opened on GDbus project ?

If it was, it would be closed NOTABUG. This is not a bug in GDBus. It is a convention encouraged (but not required) by GVariant, that has subsequently been adopted by the designers of several D-Bus APIs.

I don't see any reason for using \0-terminated strings for serializing strings

That would be because you don't write in C, or you write in C but you don't care about the performance impact of copying every bytestring one more time. Those are valid arguments in your own D-Bus designs, but you don't get to change other people's D-Bus API designs when their position on a particular trade-off doesn't match yours.

@LEW21
Copy link
Owner Author

LEW21 commented Oct 21, 2016

I've looked at GDBus, and it looks like they're copying the values while converting between messages and GVariants.

https://github.com/GNOME/glib/blob/master/gio/gdbusmessage.c#L1558 (g_variant_new_string copies the data, as it uses g_bytes_new internally)

@LEW21
Copy link
Owner Author

LEW21 commented Oct 21, 2016

(That was for strings, for 'ay's (gdbusmessage.c#L1687) g_variant_new_fixed_array is used, which also copies the data)

@smcv
Copy link

smcv commented Oct 21, 2016

I've looked at GDBus, and it looks like they're copying the values while converting between messages and GVariants

Indeed, GDBus isn't zero-copy in this way (but GVariant is; and in libdbus, access to strings that use the GVariant bytestring convention from a DBusMessage can be).

At one point there was a plan for GVariant to be D-Bus message serialization 2.0, although I'm not sure what happened to that.

@LEW21
Copy link
Owner Author

LEW21 commented Oct 21, 2016

AFAIR kdbus used GVariant's serialization, but it's dead now.

Looks like we can safely change GVariant's in-memory representation to always include trailing \0 in "ay"s, and possibly create GVariant serialization format 2.0. GDBus apps won't get slower, and we won't need the bytestring convention on the bus.

@limejet
Copy link

limejet commented May 24, 2018

What's the status on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants