-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine cboar and cbor2, close #47 #51
Conversation
Also remove shareable_index argument
Move the bulk of the encoding logic into the encoding class alongside the other encoding methods
The C-module is written against the 3.3+ API. We could skip testing this and just declare it's 3.4+ (it'd certainly be easier given all the hoops one has to jump through to get tox working on 3.3) but I like to support as many versions as I can where possible. Still - do feel free to reject this commit - I can always strip it out easily enough.
Favouring the C variant where it successfully imports. This commit also handles generating the encoding dictionaries for the C variant from those defined for the Python variant (this is much simpler than doing this in C).
Just doc-string and comment changes, nothing behavioural. These changes are mostly to make some future changes a bit easier (particularly the use of context managers for exception testing which makes the tested call syntactically similar to other normal calls).
Various methods in the _cbor2 implementations had slightly different names or were missing from the public API; fix these so _cbor2 matches the cbor2 public API precisely
Previously, the C version was actively importing deferred types. The Python algorithm was better, only checking if the specified module was *already* imported (which, if we're encoding the specified type it must be or the specified type couldn't exist yet). This commit also removes the ugly reference counting in the deferred logic, using the standard "borrowed references on entry, new references on return" idiom throughout instead.
The C implementation of CBORSimpleValue had missed the __eq__ implementation from the Python version. Both were also missing a __ne__ method which led to some odd test results with inequalities.
This is a modification of the Python version to make it more like the C version: We don't call tell() in the C version in the event of an exception and arguably we shouldn't anyway as there's no guarantee that all file-like objects implement (e.g. sockets and pipes). This might make debugging a little harder for users but they can always call it themselves if they need it, and this makes the decoder applicable to literally any readable file-like object.
Minor changes to bring absolute parity between the implementations: ensuring exception messages are exactly equal, that tests equivalently in both (and don't rely on poking internal implementation details which is harder with the _cbor2 implementation).
The C variant of the module accepted out of ranges values like 256 without complaining (forgot that unsigned values aren't checked for overflow by ParseTuple)
Nice to see the test suite for the Python module picking up bugs in the C implementation!
This isn't a translation of FrozenDict to C, rather this simply imports the Python FrozenDict implementation into the _cbor2 module's __dict__
Use some py.test fixture magic to provide an "impl" parameter representing the C or the pure Python modules. Some further py.test magic is used to ensure the C variant is only tested on supported platforms and skipped otherwise.
The C variant supports semantic tags 260 and 261 for encoding IP addresses and networks; this commit adds a pure Python implementation for python 3.3 and above (the built-in ipaddress module was added in 3.3) along with the tests for these.
The C variant permits just about any type of value to be shared within the stream (though the encoder won't produce such shared values). This commit updates the Python variant to be similarly generous toward shared values.
The C implementation already correctly handles timestamp microseconds; ensure the Python variant does too
The C variant has a "str_errors" parameter, permitting configuration of the error-handling mode used when decoding UTF-8 strings. This commit adds this attribute to the Python variant, and also updates the decoding of indefinite-length byte and unicode strings to match the algorithm in the C variant. Specifically, this disallows split-unicode characters in indefinite length unicode strings, and also rejects non-string elements in indefinite length byte and unicode strings.
The C variant (by its nature) imposes more restrictions on certain attributes like object_hook, tag_hook, default, etc. It ensures the types assigned are valid (callable or None). This commit ensures the Python variant does the same. It also fixes some of the attribute names on the C side (which didn't match the Python side).
The C variant raises CBOREncodeError in the case that a deferred encoder has an invalid type (not a 2-tuple, etc.); this commit ensures the Python side does the same
This commit brings the Python CBORTag implementation in line with the C implementation. The repr of the class now handles recursive constructs correctly, and instances are now orderable as if they were simple 2-tuples
This commit brings the Python implementations of break_marker and undefined in line with the C variants. They are now true singletons, with the same reprs and boolean results.
In expanding coverage of the C variant, several segfaults were uncovered when calling load(s) and dump(s) with either no args (erroneously) or keyword-args only.
The immutable property was missing from the C variant, and a test was added for coverage
Doh! Should've run tox before that last commit...
Looks good to me. |
This reads well, thanks for structuring the commits so clearly. I'm going to run some tests and merge hopefully today and tomorrow. |
It doesnt build with Microsofts toolchain which is used by Anaconda on windows. Can you add another filter to setup.py so it doesn't try? I will open another issue for it. |
At least until we can figure out why it's not working...
Sure - I must admit I haven't tried this at all under Windows (I don't have a Windows development environment at the moment, but I would like to include the platform if it's not too much trouble). Any obvious error messages? |
Don't worry about a new issue - I had a shot at setting up a Windows environment for Python extension building this weekend and I think I've got a fix that allows everything to build and work. The issue is mostly the use of the "endian" routines which aren't standard (but they're easy to replace with byte-swapping routines given all the platforms Windows runs on are little-endian). I should add I was absolutely dreading setting up a Windows environment for this - last time I tried it was a major pain, but it looks like things have improved significantly both on the Python and the VS side of things (what with Py3.5 and the "universal" CRT), so this time a straight-forward install of VS2019 Community Edition seemed to include everything I needed (including the Python and Git bits!). I've just got to tidy up some bits and make sure the test suite still runs properly in both Linux & Windows then I'll push the commits to this branch (should only be a couple more). |
Only build under CPython 3.5+ under Windows as that's when the universal CRT support was added (and we really don't want to deal with Windows builds prior to that).
Most of these were the VS compiler being overly picky about types (or just rubbish at static analysis), but it did make me notice a subtle and nasty little bug in handling datetimes with negative TZ offsets so worthwhile overall!
Okay, I confess I'm utterly stumped on this one. Took ages to find as despite my best efforts I've no idea how to interactively debug stuff under VS (after installing the lib via pip), so this was back to prints and a certain amount of guess-work. Anyway... Turns out the "s#" format (for string+length) in Py_BuildValue (or its derivatives like PyObject_CallMethod, etc.) doesn't work on Windows. It just results in a MemoryError whenever it's used! However, the simple "s" format (for NULL-terminated string) works fine. It's not exactly a huge deal to have it calculate the length of "big" so this is a worthwhile trade-off to get stuff working on Windows. At this point all the tests pass in both the C and Python variants.
Right, there's a few more commits now that get things working on Windows too. Nothing too major, but that last commit took me ages to figure out! :) |
everything's working beautifully on Linux, Windows is another story. 🤷♂ I must have my python dev environment setup all wrong on Windows, it's taken me ages to set it up again from scratch and I'm still getting the same errors.
Any idea if I'm just doing something wrong? I've tried both Anaconda and the Python install through Visual Studio. |
My setup was pretty vanilla, but I'm not sure if it's the "right way" of doing things on Windows (I used to code on Windows quite extensively back in my Delphi days, but back then XP was shiny and new, and most of my Windows knowledge has bit-rotted!). Anyway, here's the process I followed:
Here's where I'm not sure what I'm doing is the "right thing" on Windows, simply because it's what I normally do on Linux (ish):
I'm guessing that's not the right way of doing things because I couldn't debug interactively within VS to figure out the issue I mentioned above. Still, I figured it didn't matter that much because it seemed to build happily, so I assumed Windows compat was done. I'll have a dig into the error you saw a bit later and see if I can reproduce it. |
Ah! Just had a quick google - Python issue 34657 looks relevant. At a guess I just need to rename the "timezone" entry in the |
I followed the same steps for VS on a laptop instead of a vm, and then running Your googling was much more successful than mine! |
Okay, hopefully I've got that patched. I've renamed timezone to tz in the CBOREncoderObject struct (but left the attribute named "timezone" for compatibility - that should be fine as the attribute name doesn't get declared in a C struct). I haven't tested this on Windows yet as my setup has Python 3.7 and it looks like the issue (34657 is a duplicate of 24643) was fixed in 3.7. I could try and setup a down-level version in another VM, but I figured it might be quicker if you tested it with your already-setup laptop :) |
This sounds like it warrants the use of Azure Pipelines to check for Windows compilation issues. |
Interesting - I'll have a look at this (I was wondering whether it was time to dig into appveyor as well which I think supports Windows too). |
Appveyor supports only Windows. Travis currently offers support for Linux, macOS and Windows but their Python support for the latter two is nonexistent. It's particularly awful on macOS. Azure Pipelines has none of these issues and its design makes working across all three operating systems much less of a hassle. |
Actually Appveyor claims to support Linux too now: . However, having said that I spent a few hours playing with Azure Pipelines and Appveyor yesterday and couldn't get a thing working on the latter! Even their own recipe for tox based setups didn't work (on either Linux or Windows) with issues mostly around permission to install stuff (like tox) or upgrade stuff (like setuptools, necessary for recent versions of tox). By contrast, Azure Pipelines was indeed ridiculously simple: had it up and testing within a few minutes. My one concern with it, is that it demands a fairly scary level of permissions on any repository it's associated with (ability to create commits, branches, etc. etc.) in stark contrast to Appveyor and Travis which both have fairly minimal access requirements. I understand why Pipelines does this - it's trying to provide a very "friendly" interface in which it can set up your repository by creating and committing the necessary configuration - but I must admit I prefer Travis and Appveyor's more minimal approach in this regard. Anyway, it seems Pipelines is the way to go. There's a couple of branches in my cloned repo that you may want to have a look at: The PIpelines configuration: https://github.com/waveform80/cbor2/blob/azure-pipelines/azure-pipelines.yml And my failed Appveyor experiments (in case there's anything blindingly obviously stupid in there): https://github.com/waveform80/cbor2/blob/appveyor/.appveyor.yml Both of these branch off the Azure Pipelines: https://dev.azure.com/waveform88/cbor2 Appveyor: https://ci.appveyor.com/project/DaveJones/cbor2 Happy to take any suggestions for edits to the config; once we've got something reasonably comprehensive across Linux + Windows (+macOS?) I can merge it into the |
Seems like Python.h didn't always include stdint.h historically
Apparently Mac OS has machine/endian.h ... but it doesn't define any of the standard routines. I've no idea what it actually *does* define but who cares - the consensus according to Google seems to be that everyone uses routines from the libkern/OSByteOrder.h header.
Got Mac OS tests working on Azure Pipelines (I've pretty much given up on Appveyor) and it picked up an issue (surprise surprise) with the endian header again. That's fixed along with a minor issue on 3.5 under Windows (which slightly bizarrely only occurred with one version of VS). Anyway, everything's now passing with CPython 2.7, 3.5, 3.6, 3.7, PyPy, and PyPy3 under Linux, Windows, and Mac OS X (CPython 3.4 is only tested on Linux). |
It works! That was a huge quantity of work, thanks so much! |
Wow! Thanks very much for reviewing it! Do you want me to push the azure pipelines config as another PR? I didn't get around to merging it into |
Here we go! Turned out to be a few more than 15 commits... ahem...
As mentioned in the associated ticket (#47) the overall diff is ridiculously huge. I've done a few rebases now to break everything down into (hopefully) manageable / reviewable chunks. A rough overview is as follows:
tag_hook
prototype to remove shareable_index, etc).Most of the commits have a bit more commentary in their commit messages, hopefully enough for some context. If you want anything split up further (or combined, re-ordered, removed, etc.) do let me know. I'm happy rebasing and editing stuff for clarity.
Good luck!