Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

fatalError SAXParseException #14

Closed
maltokyo opened this Issue · 26 comments

4 participants

maltokyo Derrick Childers Brian Morearty Mark Nottingham
maltokyo

Im on the latest version of iPhoto 11 (ver. 9.2.1) and am having a very similar problem to this person:
#8
Unfortunately there is no resolution to the issue above.

Below is the output I am getting. I have tried repairing iPhoto DB, and permissions etc. And there are no errors detected.

Any ideas would be greatly appreciated!! I will try anything, and am desperate to output my photos with the same name as the event... but far too many to do manually.
Thank you for the work done on this already, it is exactly what I need.. If I get it working, I will report back here. But so far have spent hours trying different things, but to no avail.

Output from the script:


python ./exportiphoto.py -d /Volumes/HDD/iPhotoLibrary/ /Users/mt/tmp2/

  • Parsing iPhoto Library data... Traceback (most recent call last): File "./exportiphoto.py", line 579, in ignore_time_delta=options.ignore_time_delta File "./exportiphoto.py", line 69, in init self.parseAlbumData(albumDataXml) File "./exportiphoto.py", line 137, in parseAlbumData doc.expandNode(node) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/pulldom.py", line 253, in expandNode event = self.getEvent() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/pulldom.py", line 265, in getEvent self.parser.feed(buf) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :359226:31: not well-formed (invalid token)
Derrick Childers
Collaborator
maltokyo

Hi Derrick
Thank you very much for your response.
I tried your version just now, but unfortunately the same error... Please see below for output.
Mal


python ./exportiphotoderrick.py -d /Volumes/HDD/iPhotoLibrary/ /Users/mt/tmp2//Volumes/vVer2000/zExportPhotosForSM/

  • Parsing iPhoto Library data... Traceback (most recent call last): File "./exportiphotoderick.py", line 572, in ignore_time_delta=options.ignore_time_delta File "./exportiphotoderick.py", line 67, in init self.parseAlbumData(albumDataXml) File "./exportiphotoderick.py", line 135, in parseAlbumData doc.expandNode(node) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/pulldom.py", line 253, in expandNode event = self.getEvent() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/pulldom.py", line 265, in getEvent self.parser.feed(buf) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :359226:31: not well-formed (invalid token)
Derrick Childers
Collaborator
maltokyo

Hi Derrick
Thank you again for the hint. Yes, it is on an external drive. Python v2.7. Latest OS X Lion (fully updated).
I tried a test iPhoto Lib on the internal drive, and it worked!!

Unfortunately, I also tried copying the test iPhoto Lib to the external drive, and it still worked. When it was reading from and writing out to the external drive.....

So, it seems this is an issue with my main iPhoto Library XML file(?). Hmmm... I thought that could be the case earlier this morning, so I started iPhoto with option+command key, and rebuilt the lib, remade all the thumbnails, checked permissions etc etc. I tried all of those options. It almost took all day to rebuild it all.

I have logmein.com installed. If you have a few minutes, would you like to log in and take a look? I am in Japan, so timezone may not work out so well, but I am happy to set my alarm etc to suit you.

Derrick Childers
Collaborator
Brian Morearty
Owner

@derrickchilders Yeah, back in 2010 @mnot changed exportiphoto to use SAX because his albumdata.xml was 24MB and he was quickly running out of memory.

@maltokyo Wow, this seems like it's becoming a real problem since you're the second person to run across it. If you can put your AlbumData.xml file somewhere accessible, I'll look at it and see if I can figure out how to fix the app. You could make a gist if you're willing to share the file publicly.

maltokyo

Hi BMorearty
Thank you for the offer to take a look! I emailed the zipped XML file to you, to the email address in your profile.
I hope that is acceptable. Raw text was far too big to make a gist from (just crashed the browser when I tried).
Mal

Brian Morearty
Owner

Hi @maltokyo,

I spent a little time analyzing the file you sent me and I can see why it's failing. That's the good news. That bad news is I'm not sure what the right way is to fix it.

The last line of your stack trace above was:

xml.sax._exceptions.SAXParseException: :359226:31: not well-formed (invalid token)

Those numbers are the line number and character number where it failed to parse the XML file. When I ran a little test app to parse the file I got an error on different line number (359570--probably because you added another few photos before sending it to me) and in the XML file, line 359570 has null bytes beginning at character 31.

I ran "less" on the file and jumped to that line with the G command, and here's what I saw:

<key>OriginalType</key><string>^@^@^@^@</string>

Those ^@ characters are null bytes. I guess Python's XML parser doesn't know what to do with them so it just fails.

The file has some other lines that also contain null bytes.

At least now we know the problem. As for fixing it, I've got too much on my plate right now to attempt that. If you know Python or know any Python programmers, maybe someone can lend a hand.

Brian Morearty
Owner

If anyone wants to take a look at fixing this, here is a tiny sample data file and sample code to reproduce the problem.

In the sample data file, replace the contents of the <string> element with a few null bytes.

Mark Nottingham

Huh, looks like Apple has started generating non-well-formed XML; naughty.

To fix it, it's necessary to monkey patch DomEventStream:getEvent() in xml.dom.pulldom to handle these errors (probably just ignore them).

I'll try to give it a stab; work is a bit busy now, tho.

Mark Nottingham

Try #15 .

Brian Morearty
Owner

@mnot I get this when pasting your fix into my little test app:

Traceback (most recent call last):
  File "parse.py", line 35, in <module>
    for event, node in doc:
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/pulldom.py", line 232, in next
    rc = self.getEvent()
  File "parse.py", line 23, in getEvent
    sys.stderr.write("Warning: %s\n", why[1])
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/sax/_exceptions.py", line 41, in __getitem__
    raise AttributeError("__getitem__")
AttributeError: __getitem__

I tried commenting out the sys.stderr.write call but then it's back to:

xml.sax._exceptions.SAXParseException: <unknown>:9:12: not well-formed (invalid token)
maltokyo

Hi All
Wow, ok, now at least I know where the issue is. I am sorry that I cant help with the coding.
I will try removing the offending photos (if I can find them), and see what happens.. Hopefully this removes the lines in the XML file.
When I was googling for this issue, I do now remember somebody else having found "null characters" in their file. They said that they just deleted those lines in the XML file, but I dont like the sound of that...
Mal

Mark Nottingham

Been playing around with this a bit; while I can easily ignore the error, it won't properly process entries after the null characters, so that's of very limited use.

Decreasing the buffer size doesn't seem to have any effect.

I tried iterating through the buffer to find the "good" characters after the non-well-formed ones, but that seems to always lead to a crash (not just a stack trace), at least on my machine.

I think the only viable solution at this point would be to pre-process the albumData.xml file to remove the null characters.

Brian Morearty
Owner

Thanks for looking into it, @mnot. At least pre-processing it should be easy and can be done line by line or in chunks to a tempfile, to avoid eating too much memory (which I think was the reason you changed it to SAX in the first place).

If someone gets around to it before me, I certainly won't mind. Just send a pull request.

Mark Nottingham

It's amazing what a good lunch will do. In #15, I've explicitly removed the null characters as they're fed to the parser; shouldn't be much overhead to that.

Over time, if we need to do more cleanup on the input (please no, Apple), we can make that more complex, but for now I think it's OK as just a plain string.replace.

Brian Morearty
Owner

Great!

I'm about to watch a movie and don't have time to review the code right now, but @maltokyo if you would like to try @mnot's version, you can download it from https://github.com/mnot/exportiphoto/. Please let us know if it worked.

maltokyo

Great! I will try this when I get home from work today... in about 8 hours from now.
Wow, this is quite a strong community. It is at times like this that I regret not being much help with the code, but when it comes to testing, no problem! Will reply again once done.

maltokyo

Very pleased to report back that the mnot version above worked very nicely to export 432 events, with 192GB of photos and movies. The first step "Parsing iPhoto Library data..." took about 2 minutes on a core2duo imac with 4GB RAM, but after that it was smooth sailing.

Thank you all for your help. You saved me naming 432 folders and dragging bunches of photos in there (and many more in future!).

Apple really needs to include this feature in iPhoto.

maltokyo

Interesting thing I noticed is that when it is processing, for most of the albums it outputs a period for the progress...
but for some, it outputs hyphens, as below...
Is this the photos where it skips the null characters?

  • Processing 162 of 432: Album1 (332 images)... Created /Volumes/vVer2000/zExportPhotosForSM/Album1 .-...........-...-.-.-.-.-..........-.-.-...-.-..-.-..-.....-...............-.-.-.-.-.-.-.-.-.-.-.-...-....-..-.-.-.-.-.....................-.-...-.-.-.-.......................-.-.-....-.-............-......-.-.-.-.-.-.-......-.-............-.-.-.-....................................................................................
  • Processing 163 of 432: Album2 (39 images)... Created /Volumes/vVer2000/zExportPhotosForSM/Album2 .......................................
Brian Morearty
Owner

@maltokyo I'm so glad to hear the good news.

The hyphens seem to indicate that the file already existed and either had been written in the last 10 seconds or had the exact same size, so exportiphoto did not bother to copy it again. I didn't add that option but I assume it was added to make it faster when people use exportiphoto over and over again to the same directory, i.e. for a backup.

Were you exporting to a directory that was already there?

It seems odd to me that in a single Event, it reported that some files existed already and some did not.

maltokyo

BMorearty: No, actually I was outputting the whole iTunes lib, for the first time to a completely empty directory... so, if what you say is the case, then there should be no hyphens at all!

Should I be checking to see if all the pics were actually output? Hmm. Hope I didnt find another issue :)

Brian Morearty
Owner

I have implemented a different mechanism to strip null bytes from the file before processing. It's in #16. Both mechanisms should work but I'm hoping this new way will be easier to maintain.

@maltokyo, could you please confirm that it works? (I pulled it into the master branch so you can download it from https://github.com/BMorearty/exportiphoto). I tested it on your file and it seems to work, but since I don't have your photos I thought you should double-check it.

If you export to the same directory as before, it should go very quickly and you should see a bunch of hyphens because the files definitely do exist now. I wasn't attempting to fix the hyphen problem, just making sure your file can be parsed.

Derrick Childers
Collaborator
maltokyo

@BMorearty: Yes, the new version works fine on my file, thank you for adding it in! I ran on the same directory, just to check the hyphens again.. as you say, it did finish quickly, but still seems to copy some of the photos.

@derrickchilders: I ran the new version on the same directory, and this time, I had mostly hyphens (90%) and about 10% periods. It still seemed to copy some, but skip most. Different timestamps?

Derrick Childers
Collaborator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.