Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nt / nquads serializer has poor workaround for lack of xmlcharrefreplace in py2.3 #680

Closed
chmod007 opened this issue Jan 6, 2017 · 3 comments · Fixed by #681
Closed
Labels
Milestone

Comments

@chmod007
Copy link

chmod007 commented Jan 6, 2017

rdflib.plugins.serializers.nt contains a function for encoding unicode characters in XML. It was sourced from http//code.activestate.com/recipes/303668. The code was only intended to be used as a fallback for Python 2.3. Unfortunately, the version of the recipe omits the attempt to use built-in functionality and always falls back on the workaround. The workaround concatenates the output string character and is extremely slow on certain input (long literals in input graph).

The body of the function may be trivially replaced with:

return unicode_data.encode(encoding, 'xmlcharrefreplace')
@gromgull
Copy link
Member

I tried this is #681, but some tests fail :(

@gromgull gromgull added this to the rdflib 4.2.2 milestone Jan 12, 2017
@gromgull
Copy link
Member

Actually, the fix suggested here doesn't work. The xmlcharrefreplace does:

In [122]: u'å'.encode('ascii', 'xmlcharrefreplace')
Out[122]: 'å'

but we need: \u00E5

I'll make up another fix though.

gromgull added a commit that referenced this issue Jan 23, 2017
fixes #680

We replace a loop over every character in a string with a single
call to encode with a custom error-handler.
@chmod007
Copy link
Author

Thanks for spotting the issue and fixing the bug! I was fooled by the comments.

gromgull added a commit that referenced this issue Jan 24, 2017
fixes #680

We replace a loop over every character in a string with a single
call to encode with a custom error-handler.

We move the call to the top-level, and only do it once we encode the
entire output.
gromgull added a commit that referenced this issue Jan 24, 2017
fixes #680

We replace a loop over every character in a string with a single
call to encode with a custom error-handler.

We move the call to the top-level, and only do it once we encode the
entire output.
gromgull added a commit that referenced this issue Jan 24, 2017
fixes #680

We replace a loop over every character in a string with a single
call to encode with a custom error-handler.

We move the call to the top-level, and only do it once we encode the
entire output.
gromgull added a commit that referenced this issue Jan 24, 2017
fixes #680

We replace a loop over every character in a string with a single
call to encode with a custom error-handler.

We move the call to the top-level, and only do it once we encode the
entire output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants