Replaced 'utf8' in ET.tostring with 'utf-8', invalid XML (400) otherwise #538

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants
@jeverling

The XML that is generated by the Azure driver contains encoding='utf8', which doesn't work, the Azure API returns a 400 response. If this is changed to utf-8, it does work. Adjusting the encoding argument passed to ET.tostring solved the 400 responses I was getting when using ex_create_cloud_service.

@Kami

This comment has been minimized.

Show comment
Hide comment
@Kami

Kami Jul 2, 2015

Member

Hm, I thought utf8 and utf-8 are interchangeable.

In any case, thanks, The change looks good to me. I will test it first and if everything looks OK, merge it into trunk.

Member

Kami commented Jul 2, 2015

Hm, I thought utf8 and utf-8 are interchangeable.

In any case, thanks, The change looks good to me. I will test it first and if everything looks OK, merge it into trunk.

@jeverling

This comment has been minimized.

Show comment
Hide comment
@jeverling

jeverling Jul 2, 2015

I thought that as well. But apparently, when using utf8 with from xml.etree import ElementTree as ET it produces XML that is considered invalid by azure.
This

from xml.etree import ElementTree as ET
ele = ET.Element('test')
print('ElementTree:')
print(ET.tostring(ele, encoding='utf8', method='xml'))
print(ET.tostring(ele, encoding='utf-8', method='xml'))

produces the following output:

ElementTree:
b"<?xml version='1.0' encoding='utf8'?>\n<test />"
b'<test />'

Whereas this

from lxml import etree as ET
ele = ET.Element('test')
print('etree:')
print(ET.tostring(ele, encoding='utf8', method='xml'))
print(ET.tostring(ele, encoding='utf-8', method='xml'))

produces this:

etree:
b'<test/>'
b'<test/>'

So it only seems to make a difference when from xml.etree import ElementTree as ET is used. Interesting.

I thought that as well. But apparently, when using utf8 with from xml.etree import ElementTree as ET it produces XML that is considered invalid by azure.
This

from xml.etree import ElementTree as ET
ele = ET.Element('test')
print('ElementTree:')
print(ET.tostring(ele, encoding='utf8', method='xml'))
print(ET.tostring(ele, encoding='utf-8', method='xml'))

produces the following output:

ElementTree:
b"<?xml version='1.0' encoding='utf8'?>\n<test />"
b'<test />'

Whereas this

from lxml import etree as ET
ele = ET.Element('test')
print('etree:')
print(ET.tostring(ele, encoding='utf8', method='xml'))
print(ET.tostring(ele, encoding='utf-8', method='xml'))

produces this:

etree:
b'<test/>'
b'<test/>'

So it only seems to make a difference when from xml.etree import ElementTree as ET is used. Interesting.

@jeverling

This comment has been minimized.

Show comment
Hide comment
@jeverling

jeverling Jul 2, 2015

This actually seems as it may be a bug in lxml. The documentation states this:


    Serialize an element to an encoded string representation of its XML
    tree.
    Defaults to ASCII encoding without XML declaration.  This
    behaviour can be configured with the keyword arguments 'encoding'
    (string) and 'xml_declaration' (bool).  Note that changing the
    encoding to a non UTF-8 compatible encoding will enable a
    declaration by default.

And the code is this:

    elif xml_declaration is None:
        # by default, write an XML declaration only for non-standard encodings
        write_declaration = encoding is not None and encoding.upper() not in \
                            (u'ASCII', u'UTF-8', u'UTF8', u'US-ASCII')

I'm not familiar enough with lxml to see what goes wrong, but it should not write a XML declaration for encoding='utf8'

I will open a bug report on the lxml tracker, just in case.

Update: here is the report https://bugs.launchpad.net/lxml/+bug/1470809

This actually seems as it may be a bug in lxml. The documentation states this:


    Serialize an element to an encoded string representation of its XML
    tree.
    Defaults to ASCII encoding without XML declaration.  This
    behaviour can be configured with the keyword arguments 'encoding'
    (string) and 'xml_declaration' (bool).  Note that changing the
    encoding to a non UTF-8 compatible encoding will enable a
    declaration by default.

And the code is this:

    elif xml_declaration is None:
        # by default, write an XML declaration only for non-standard encodings
        write_declaration = encoding is not None and encoding.upper() not in \
                            (u'ASCII', u'UTF-8', u'UTF8', u'US-ASCII')

I'm not familiar enough with lxml to see what goes wrong, but it should not write a XML declaration for encoding='utf8'

I will open a bug report on the lxml tracker, just in case.

Update: here is the report https://bugs.launchpad.net/lxml/+bug/1470809

@Kami

This comment has been minimized.

Show comment
Hide comment
@Kami

Kami Jul 2, 2015

Member

@jeverling Thanks for the additional context / details.

I'm a bit busy right now, but I will try to have a look asap (probably on Sunday).

Member

Kami commented Jul 2, 2015

@jeverling Thanks for the additional context / details.

I'm a bit busy right now, but I will try to have a look asap (probably on Sunday).

@asfgit asfgit closed this in 961df98 Jul 5, 2015

@Kami

This comment has been minimized.

Show comment
Hide comment
@Kami

Kami Jul 5, 2015

Member

@jeverling Thanks, I've merged changes into trunk.

On a related note - when I was originally working and testing the changes, it mostly worked, but occasionally got 400 status code back. I was digging into it, but I never could consistently reproduce the issue or find the root cause (I thought there was something weird going on, on the Azure side).

I wonder if there there is something "weird" going on, on the Azure side - e.g. some servers accept XML with non-standard encoding declaration and others don't. Although, I think it would be unlikely they have different versions / libraries running on different servers...

Member

Kami commented Jul 5, 2015

@jeverling Thanks, I've merged changes into trunk.

On a related note - when I was originally working and testing the changes, it mostly worked, but occasionally got 400 status code back. I was digging into it, but I never could consistently reproduce the issue or find the root cause (I thought there was something weird going on, on the Azure side).

I wonder if there there is something "weird" going on, on the Azure side - e.g. some servers accept XML with non-standard encoding declaration and others don't. Although, I think it would be unlikely they have different versions / libraries running on different servers...

wido added a commit to wido/libcloud that referenced this pull request Jul 22, 2015

Replaced 'utf8' in ET.tostring with 'utf-8', invalid XML (400) otherwise
Closes #538

Signed-off-by: Tomaz Muraus <tomaz@tomaz.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment