-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python-dwca-reader in Jython #43
Comments
Oops, looks like lxml is the only parser BeautifulSoup can use "Right now, the only supported XML parser is lxml. If you don’t have lxml installed, asking for an XML parser won’t give you one, and asking for “lxml” won’t work either." http://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use So the new question becomes, "Would it be possible to have the reader not depend on BeautifulSoup?" |
Hi John, Indeed, you've perfectly nailed it: python-dwca-reader depends on BeautifulSoup, and BeautifulSoup needs lxml. I've myself been uncomfortable since a long time to have such an heavy dependency for relatively "peripheral" features. So one of my medium-term plan was to replace BeautifulSoup by something lighter, or at least make it optional. Do you urgently need to use python-dwca-reader? I can in the next few days (let's say a week) find time to evaluate if I can publish a new version that doesn't depend on BeautifulSoup. If not too hard and useful for you, I'd definitely go for it. It's also a good opportunity to test it (and fix it if necessary) on Jython, I don't think it has been done before! Best, Nico |
I am using python-dwca-reader actively, but the Jython context does not have the same urgency as just using the Readers. I thought about forking the repository and making a version that had BeautifulSoup optional, but it would probably take me longer than next week to get around to it. If you can do it that same time frame, that is better. I will gladly test it as soon as it is ready. |
Cool, didn't know you were already using it, happy that my work is useful to others. I had a quick look, and it seems indeed that it should be possible to make an version of python-dwca-reader that replace BeautifulSoup/lxml by ElementTree from the standard library... If I'm not mistaken, it is also available in Jython, and so we shouldn't be too far from having Jython compatibility... What do you think? |
I think, "Excellent, go for it." Waiting anxiously. On Fri, Aug 14, 2015 at 11:16 AM, Nicolas Noé notifications@github.com
|
Hi John, I just released a new version (0.7.0) that totally drops the dependency to BeautifulSoup and lxml. All the APIs that were returning BeautifulSoup objects now return xml.etree.ElementTree.Element (from the standard library). Could you have a look? I only checked very briefly, but it seems to work under Jython! |
Confirmed that this works great under Jython and completely solves the issue for me. Closing. Thank you very much. |
Currently the python-dwca-reader has lxml as a requirement. Is there a reason for this? I do not see where it is actually used. The reason I ask is that I would very much like to use the python-dwca-reader with Jython, but the dependency on lxml (which has no implementation that works with Jython, since it is based on C and has not been ported to date) makes this impossible. BeautifulSoup can use other parsers, so I wonder if it is possible to elect the parser rather than require lxml.
The text was updated successfully, but these errors were encountered: