-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML has utf-16 fails to parse #41
Comments
Are you sure the source file is encoded as UTF-16? It definitely won't On Tue, Jul 2, 2013 at 1:49 PM, Nick Portelli notifications@github.comwrote:
|
I'm not sure. I think it is whatever the default .net serializer we are using does. All I need to do is change 16 to 8 and your plugin works great. So in all reality not the plugin's issue. I should figure out how to make the thing save in utf-8. Go ahead and close this. |
I had the same issue, I temporary change remove the utf-16 from xml declaration and add it back before returning formatted string. Not sure this can be a fixer. Test on sublime 3. Find "fix:" in followed code... class IndentXmlCommand(BaseIndentCommand):
def indent(self, s):
# convert to utf
s = s.encode("utf-8")
xmlheader = re.compile(b"<\?.*\?>").match(s)
# fix: replace header
if xmlheader:
s = s.replace(xmlheader.group(), '<?xml version="1.0"?>')
# convert to plain string without indents and spaces
s = re.compile(b'>\s+([^\s])', re.DOTALL).sub(b'>\g<1>', s)
# replace tags to convince minidom process cdata as text
s = s.replace(b'<![CDATA[', b'%CDATAESTART%').replace(b']]>', b'%CDATAEEND%')
try:
s = parseString(s).toprettyxml()
except Exception as e:
sublime.active_window().run_command("show_panel", {"panel": "console", "toggle": True})
raise e
# remove line breaks
s = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', s)
# restore cdata
s = s.replace('%CDATAESTART%', '<![CDATA[').replace('%CDATAEEND%', ']]>')
# remove xml header
s = s.replace("<?xml version=\"1.0\" ?>", "").strip()
if xmlheader:
s = xmlheader.group().decode("utf-8") + "\n" + s
return s |
I'm a bit unfamiliar with utf, but is there a reason why it won't parse if it is utf-16?
The text was updated successfully, but these errors were encountered: