Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python-dwca-reader in Jython #43

Closed
tucotuco opened this issue Aug 8, 2015 · 7 comments
Closed

python-dwca-reader in Jython #43

tucotuco opened this issue Aug 8, 2015 · 7 comments

Comments

@tucotuco
Copy link

tucotuco commented Aug 8, 2015

Currently the python-dwca-reader has lxml as a requirement. Is there a reason for this? I do not see where it is actually used. The reason I ask is that I would very much like to use the python-dwca-reader with Jython, but the dependency on lxml (which has no implementation that works with Jython, since it is based on C and has not been ported to date) makes this impossible. BeautifulSoup can use other parsers, so I wonder if it is possible to elect the parser rather than require lxml.

@tucotuco
Copy link
Author

tucotuco commented Aug 8, 2015

Oops, looks like lxml is the only parser BeautifulSoup can use

"Right now, the only supported XML parser is lxml. If you don’t have lxml installed, asking for an XML parser won’t give you one, and asking for “lxml” won’t work either."

http://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use

So the new question becomes, "Would it be possible to have the reader not depend on BeautifulSoup?"

@niconoe
Copy link
Member

niconoe commented Aug 10, 2015

Hi John,

Indeed, you've perfectly nailed it: python-dwca-reader depends on BeautifulSoup, and BeautifulSoup needs lxml. I've myself been uncomfortable since a long time to have such an heavy dependency for relatively "peripheral" features.

So one of my medium-term plan was to replace BeautifulSoup by something lighter, or at least make it optional. Do you urgently need to use python-dwca-reader? I can in the next few days (let's say a week) find time to evaluate if I can publish a new version that doesn't depend on BeautifulSoup. If not too hard and useful for you, I'd definitely go for it. It's also a good opportunity to test it (and fix it if necessary) on Jython, I don't think it has been done before!

Best,

Nico

@tucotuco
Copy link
Author

I am using python-dwca-reader actively, but the Jython context does not have the same urgency as just using the Readers. I thought about forking the repository and making a version that had BeautifulSoup optional, but it would probably take me longer than next week to get around to it. If you can do it that same time frame, that is better. I will gladly test it as soon as it is ready.

@niconoe
Copy link
Member

niconoe commented Aug 14, 2015

Cool, didn't know you were already using it, happy that my work is useful to others.

I had a quick look, and it seems indeed that it should be possible to make an version of python-dwca-reader that replace BeautifulSoup/lxml by ElementTree from the standard library... If I'm not mistaken, it is also available in Jython, and so we shouldn't be too far from having Jython compatibility... What do you think?

@tucotuco
Copy link
Author

I think, "Excellent, go for it." Waiting anxiously.

On Fri, Aug 14, 2015 at 11:16 AM, Nicolas Noé notifications@github.com
wrote:

Cool, didn't know you were already using it, happy that my work is useful
to others.

I had a quick look, and it seems indeed that it should be possible to make
an version of python-dwca-reader that replace BeautifulSoup/lxml by
ElementTree from the standard library... If I'm not mistaken, it is also
available in Jython, and so we shouldn't be too far from having Jython
compatibility... What do you think?


Reply to this email directly or view it on GitHub
#43 (comment)
.

@niconoe
Copy link
Member

niconoe commented Aug 20, 2015

Hi John,

I just released a new version (0.7.0) that totally drops the dependency to BeautifulSoup and lxml. All the APIs that were returning BeautifulSoup objects now return xml.etree.ElementTree.Element (from the standard library). Could you have a look?

I only checked very briefly, but it seems to work under Jython!

@tucotuco
Copy link
Author

tucotuco commented Sep 3, 2015

Confirmed that this works great under Jython and completely solves the issue for me. Closing. Thank you very much.

@tucotuco tucotuco closed this as completed Sep 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants