Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python API: Make the API Python 2 & 3 compatible #324

Merged
merged 2 commits into from Jan 10, 2017

Conversation

mtusnio
Copy link
Contributor

@mtusnio mtusnio commented Jan 5, 2017

Based on #305

Removes the requirement for lxml, while also adding a requirement for the future package (only for iteritems for now). ElementTree is significantly slower than lxml, however it makes for an easier use out of the box on the Ci40. Future package can be easily installed via 'pip install future'

@DavidAntliff
Copy link
Collaborator

This could be abstracted rather than omitting lxml.etree entirely. It is relatively simply to hide both lxml.etree and xml.etree.ElementTree behind an XML abstraction layer, at least for most typical operations such as building a tree. However lxml.etree provides much better xpath support, amongst other things, and as you say it is faster, and is most likely the future of the etree API.

Can you elaborate on how installing the 'future' set might help here?

@mtusnio
Copy link
Contributor Author

mtusnio commented Jan 6, 2017

@DavidAntliff I was originally unsure if introducing two different libraries in this case makes sense, I imagine the parsed/created XMLs are small enough for the differences to not matter, but I can do something along the lines of:

try:
    import lxml.etree as ElementTree
except ImportError:
    import xml.etree.ElementTree as ElementTree

The only issue that it introduces additional complexity of using two different libraries, but the APIs are compatible so this would work. However, that keeps lxml and gives a fallback in case it's unavailable.

About the future package, that's needed for abstracting Python 2/3 differences. At the moment it's for only one case - iteritems is renamed to items in Python 3, however items in Python 2 behave differently than in Python 3. I imagine this will only grow as the API grows, hence the package.

@DavidAntliff
Copy link
Collaborator

@mtusnio import of future.utils is failing - see travis-ci log. Does this need a change to ci/requirements.txt?

@mtusnio
Copy link
Contributor Author

mtusnio commented Jan 9, 2017

@DavidAntliff Updated the PR with the ci/requirements.txt changes (as well as the import changes), I imagine the docker containers need to be rebuilt with this change though?

@DavidAntliff
Copy link
Collaborator

@mtusnio the docker containers are rebuilt automatically by Jenkin's jobs. Travis, however, isn't happy because (perhaps by accidental omission) it's not using requirements.txt. I'll submit a PR to fix that. Once that's accepted, you can update this PR and the travis job should work better.

@DavidAntliff
Copy link
Collaborator

Please wait for #326 to be merged.

@yi-liang
Copy link

@mtusnio seems the "try,except" importing will introduce an issue of different api from the two libraries, such as in "etree.tostring(e_root, pretty_print=True)" , where xml.etree.ElementTree doesn't have a pretty_print ?

@DavidAntliff
Copy link
Collaborator

This is a valid concern and why I suggested hiding them behind a wrapper. Then a consistent API can be presented that can gracefully handle missing functionality in the smaller library.

@mtusnio
Copy link
Contributor Author

mtusnio commented Feb 14, 2017

The pretty print is gone in this version, it was just one call. I think in this case doing an abstraction layer seemed like a bit of an overkill, I wouldn't mind doing that if more discrepancies show up.

@DavidAntliff
Copy link
Collaborator

I don't think it's overkill when you consider that it's intended to provide some discipline to what is an open-source project with potentially many contributors, as it will help ensure that contributors consider the implications of adding new functionality that may not be supported by one of those libraries.

The wrapper can be really, really simple (just a delegation layer in most cases) and it really just ensures that all code using XML functions goes through a common layer, where any discrepancies, incompatibilities and adjustments can be made, now or later. If it's not done early, and used throughout, it can be a nightmare to add later as client code may rely on library-specific nuances. Better to even the playing field early and work with a consistent interface.

Use case: someone wants to add a new feature that uses something from lxml (e.g. enhanced xpath). If this layer doesn't exist, they naturally implement their feature with lxml directly, discover that etree doesn't support it, and then write some code in-place to deal with the incompatibility in etree. Hopefully they realise that someone else may have already done this elsewhere, and go looking for it first. Hopefully they find it and don't rewrite such a translation. Hopefully. Otherwise maybe someone else will do that work for them later. Maybe.

OR they see that all the XML code in the project is using a consistent wrapper, and look through there for an existing compatibility wrapper for "xpath", and just use that, knowing that it's tested ((non-trivial wrapper/compatibility code should definitely have tests).

@mtusnio
Copy link
Contributor Author

mtusnio commented Feb 14, 2017

@DavidAntliff I definitely don't disagree with any of that, my point was only that it's a major piece of work to do and adding an abstraction layer at that point would be an overkill in the sense of work required to accommodate the pretty_print change. For future changes it definitely isn't, however there's in general work that can be done around the API to add responses/requests, clean up bits and pieces and add functionality. I'm hoping to do some of that, see about adding the abstraction layer, however, as always, time is the limit. :) In this case I wouldn't have been able to replace the whole thing with an abstraction layer and the question is whether it's better to then not do any changes, or enable Py2/Py3 compatibility as well as LXML/ElementTree support while keeping all of this in mind. To sum up, I don't disagree at all with what you are proposing, I just think that it was worth adding this functionality in this state and, if any PRs come expanding the API, start thinking about a switch.

@DavidAntliff
Copy link
Collaborator

@mtusnio you could add the layer "as-you-go" - from memory at the moment there's only a handful of XML-related functions, so they are easily moved behind a wrapper, and then if you find yourself wanting to use something from lxml that isn't in etree, you can then write a more complex wrapper at that point. So you only need to wrap what is actually used - no need to analyse both libraries up-front and write a full wrapper. Thank you for taking the time to work on the project :)

@mtusnio
Copy link
Contributor Author

mtusnio commented Feb 14, 2017

That sounds good, I'll give it a go once I start pushing my project forward again. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants