Dies on certain images #52

jasontitus · 2021-01-24T18:06:09Z

After running for 20+ hours, the script dies on a specific image even though it parses and displays fine. I have reproduced it with a directory of just the image. This is running release 2.0 on Ubuntu Linux 20.10

The log looks like this -

~/.local/bin/google-photos-takeout-helper -i brokenimages -o testout
Heeeere we go!
=====================
Fixing files metadata and creation dates...
=====================
brokenimages/IMG_4661(3).jpg
Traceback (most recent call last):
  File "/home/jasontitus/.local/bin/google-photos-takeout-helper", line 8, in <module>
    sys.exit(main())
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 570, in main
    for_all_files_recursive(
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 114, in for_all_files_recursive
    file_function(file)
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 494, in fix_metadata
    set_creation_date_from_exif(file)
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 343, in set_creation_date_from_exif
    exif_dict = _piexif.load(str(file))
  File "/home/jasontitus/.local/lib/python3.8/site-packages/piexif/_load.py", line 43, in load
    exif_dict["Exif"] = exifReader.get_ifd_dict(pointer, "Exif")
  File "/home/jasontitus/.local/lib/python3.8/site-packages/piexif/_load.py", line 118, in get_ifd_dict
    tag = struct.unpack(self.endian_mark + "H",
struct.error: unpack requires a buffer of 2 bytes

The jpeginfo output for the file is -
jpeginfo -c brokenimages/IMG_4661\(3\).jpg 
brokenimages/IMG_4661(3).jpg 2592 x 1936 24bit Exif  N 2167440  [OK]

Here is a link to the file

The text was updated successfully, but these errors were encountered:

sionicion · 2021-01-28T00:16:16Z

I think I'm getting the same error here. I did read somewhere just delete the last picture it was working on, but it still crashes.

Traceback (most recent call last):
  File "/usr/local/bin/google-photos-takeout-helper", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/google_photos_takeout_helper/__main__.py", line 593, in main
    for_all_files_recursive(
  File "/usr/local/lib/python3.8/dist-packages/google_photos_takeout_helper/__main__.py", line 117, in for_all_files_recursive
    file_function(file)
  File "/usr/local/lib/python3.8/dist-packages/google_photos_takeout_helper/__main__.py", line 517, in fix_metadata
    set_creation_date_from_exif(file)
  File "/usr/local/lib/python3.8/dist-packages/google_photos_takeout_helper/__main__.py", line 366, in set_creation_date_from_exif
    exif_dict = _piexif.load(str(file))
  File "/usr/local/lib/python3.8/dist-packages/piexif/_load.py", line 46, in load
    exif_dict["GPS"] = exifReader.get_ifd_dict(pointer, "GPS")
  File "/usr/local/lib/python3.8/dist-packages/piexif/_load.py", line 108, in get_ifd_dict
    tag_count = struct.unpack(self.endian_mark + "H",
struct.error: unpack requires a buffer of 2 bytes

TheLastGimbus · 2021-01-28T09:56:01Z

Ugh... piexif is super buggy 😕

Tho I don't know if I can replace it... I neeed to just wrap it in try-catch...

jasontitus · 2021-01-28T14:35:11Z

Even if it just moved the image to a special folder for now. It would be so much better than just dying after hours of running.

…

On Thu, Jan 28, 2021 at 2:56 AM Mateusz Soszyński ***@***.***> wrote: Ugh... piexif is super buggy 😕 Tho I don't know if I can replace it... I neeed to just wrap it in try-catch... — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#52 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGUOXTLSCF43HTQMMXSSNTS4EYEDANCNFSM4WQW7DYQ> .

TheLastGimbus · 2021-01-28T19:42:40Z

I know... nevertheless, it weird - i thought all exif operations were try-catched 🤔

I will improve this when I have some time 👍

sionicion · 2021-01-29T19:00:50Z

This is a little off-topic but I'll be adopting PhotoPrism and I see a setting for not creating ExifTool JSON files, which means it does produce these same files. Does that mean I could import a takeout directly into PhotoPrism? Regardless, I was going to wait until this issue is fixed and then add the pictures into PhotoPrism, I was just curious how it would work with existing ExifTool files or if it would ignore them.

sionicion · 2021-01-29T19:08:13Z

Ah I see they have their own page on it. I also looked to see if they had any issues open for the new Google Takeout years format but I don't see anything, I'm going to give their import process a shot.

TheLastGimbus · 2021-01-29T19:12:10Z

ExifTool JSON files

Google's JSONs are not from ExifTool :/

they have their own page on it

Can you link it here? I'm curious what they have

I'm going to give their import process a shot.

Let us know how well it works 👍

TheLastGimbus · 2021-01-29T21:00:43Z

pip install -U google-photos-takeout-helper==2.1.0b2

As always, try it out and let me know if works 👍

jasontitus · 2021-01-30T00:16:19Z

Nope. Still fails on the images it doesn't like:

/usr/local/bin/google-photos-takeout-helper -i brokenimages/ -o testout/
Heeeere we go!
=====================
Fixing files metadata and creation dates...
=====================
brokenimages/IMG_7322(1).jpg
Traceback (most recent call last):
  File "/usr/local/bin/google-photos-takeout-helper", line 8, in <module>
    sys.exit(main())
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 593, in main
    for_all_files_recursive(
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 117, in for_all_files_recursive
    file_function(file)
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 517, in fix_metadata
    set_creation_date_from_exif(file)
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 366, in set_creation_date_from_exif
    exif_dict = _piexif.load(str(file))
  File "/home/jasontitus/.local/lib/python3.8/site-packages/piexif/_load.py", line 43, in load
    exif_dict["Exif"] = exifReader.get_ifd_dict(pointer, "Exif")
  File "/home/jasontitus/.local/lib/python3.8/site-packages/piexif/_load.py", line 122, in get_ifd_dict
    value_num = struct.unpack(self.endian_mark + "L",
struct.error: unpack requires a buffer of 4 bytes
```
`

TheLastGimbus · 2021-01-30T12:25:06Z

Um, did you even update the script?

pip install -U google-photos-takeout-helper==2.1.0b2

You should at least get a "oh-oh, script crashed" message I introduced in #56

For me, the output for your problematic image is like this:

Heeeere we go!
=====================
Fixing files metadata and creation dates...
=====================
test/IMG_4661(3).jpg
Can't read file's exif!
No exif for test/IMG_4661(3).jpg
Couldn't find json for file 
Last chance, coping folder meta as date...
Couldn't pull datetime from album meta
ERROR! There was literally no option to set date!!!
TODO: We should do something about this - move it to some separate folder, or write it down in another .txt file...
=====================
Coping all files to one folder...
(If you want, you can get them organized in folders based on year and month. Run with --divide-to-dates to do this)
=====================
=====================
Removing duplicates...
=====================

DONE! FREEEEEDOOOOM!!!

Final statistics:
Files copied to target folder: 1
Removed duplicates: 0
Files for which we couldn't find json: 1

sionicion · 2021-01-30T14:58:32Z

ExifTool JSON files

Google's JSONs are not from ExifTool :/

they have their own page on it

Can you link it here? I'm curious what they have

I'm going to give their import process a shot.

Let us know how well it works 👍

I'm referring to this help topic.

And ah ok I just assumed regarding ExifTool. Anyway I did what the help topic suggested and it seems it does import in the data, but it's still a mess of course (maybe that's because of the new structure). I'm thinking of trying this out again with your patch. Just curious though, what happens to files that can't find JSON? I'm assuming they get left in the original directory? Would I be left with a folder full of the original Takeout, and a folder that has most of the pictures but not the ones that failed?

jasontitus · 2021-01-30T15:31:29Z

OK, I somehow had two versions installed (in /usr/local and ~/.local). I uninstalled everything, re-installed and it got through my test broken images. I will try on the full set again today. Thanks!

…

On Sat, Jan 30, 2021 at 7:58 AM Tyler Swindell ***@***.***> wrote: ExifTool JSON files Google's JSONs are not from ExifTool :/ they have their own page on it Can you link it here? I'm curious what they have I'm going to give their import process a shot. Let us know how well it works 👍 I'm referring to this help topic. <https://docs.photoprism.org/user-guide/use-cases/google/> And ah ok I just assumed regarding ExifTool. Anyway I did what the help topic suggested and it seems it does import in the data, but it's still a mess of course (maybe that's because of the new structure). I'm thinking of trying this out again with your patch. Just curious though, what happens to files that can't find JSON? I'm assuming they get left in the original directory? Would I be left with a folder full of the original Takeout, and a folder that has most of the pictures but not the ones that failed? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#52 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGUOXX2NTAQWTEHDXJVOJ3S4QNCLANCNFSM4WQW7DYQ> .

TheLastGimbus · 2021-01-30T15:53:35Z

what happens to files that can't find JSON?

Script tries to find any other way to find their creation date, from exif or folder name, and if there is absolutely no way, it just copies it as-is

Although I want to change this behavior later so it copies it to separate folder

sionicion · 2021-01-30T18:37:03Z

Okay so I was able to run this successfully this time, but I noticed the output folder is 2GB less. 93 gigabytes vs 95 gigabytes. I don't think every file is in the output folder. What do you think of my results?

DONE! FREEEEEDOOOOM!!!

Final statistics:
Files copied to target folder: 20188
Removed duplicates: 0
Files for which we couldn't find json: 2761
Files where inserting correct exif failed: 2902
 - you have full list in takeout-combined2/failed_inserting_exif.txt
Files where date was set from name of the folder: 0
(you have full list in takeout-combined2/date_from_folder_name.txt)

TheLastGimbus · 2021-01-30T19:33:25Z

Removed duplicates: 0

HUH. This is either very lucky, or very weird... aspecially for 95GB 🤔

What do you think of my results?

Do you maybe have Linux/Mac? You can easily do:

cd you/takeout/folder
du -ch **/*.json
# This will print out what total weight of all json files

For my sample 4.3GB it was 31MB...

_{Please try to find how to count it on Windoza, if you have the misfortune to have it}

I'm gonna be honest - I don't know, and have no good way to test it, it this script works flawlessly and copies everything... all the workarounds around duplicates etc made it complicated... but it should...

I just have an idea - can I replace the final copy command with move - just experimentally, to se if you maybe have some weird files in weird formats that are not included in is_photo() or is_video() funcion... This wouldn't test if de-duplicating works well, but would always be something!

Files where inserting correct exif failed: 2902

Out of curiosity - can you tell me (just ctrl+f in notepad in .txt. file) how many of these files were jpg's and how many png's etc?

sionicion · 2021-01-30T23:20:37Z

Total weight is at 123M. Failed PNGs at 910, failed JPGs at 447. Let me know if you want me to test anything else.

TheLastGimbus · 2021-01-31T00:53:27Z

Failed PNGs

Failed files should be moved too, hmmm

I think this is our remove_duplicates function could be somehow broken... either deleting something what it shouldn't, or just not logging something it legitimately deleted 🤔

Try to find if you have any photos/videos that are not from this list:

photo_formats = ['.jpg', '.jpeg', '.png', '.webp', '.bmp', '.tif', '.tiff', '.svg', '.heic']
video_formats = ['.mp4', '.gif', '.mov', '.webm', '.avi', '.wmv', '.rm', '.mpg', '.mpe', '.mpeg', '.m4v']

sionicion · 2021-01-31T11:30:29Z

Sorted through using extension and only slight deviations I see are some files have capitalized extensions mixed in --> .HEIC .JPG .MOV .MP4

This is from the output folder.

TheLastGimbus · 2021-01-31T11:51:20Z

is_photo and is_video use .lower(), so that shouldn't be a problem...

sionicion · 2021-01-31T12:49:14Z

So even in Google Photos, I have a lot of pictures that lost their metadata. I had switched Google accounts at some point and uploaded all my pictures without the JSON, I don't even know if Google's upload tools take them into account? Anyway, a quarter of my library is under one day in Google Photos. Now as soon as I download or extract these pictures, they end up having a created date of today, but the filename is right.

Is it possible to add an option to count files that have creation dates as the current day as wrong and to get the date from the filename? This is an example - IMG_20161223_183024 1.jpg - this file has a date of June 10th, 2020 in Google Photos, but that was I believe the day I uploaded to the second Google account, when I download it, the date becomes today, and when I extract it from the Takeout archive it also becomes today.

sionicion · 2021-01-31T12:53:59Z

Well not quite a quarter of my pictures, 920 to be exact, but that's still a lot I need to fix somehow. A lot of them are saved snapchats though which have random filenames, besides them though I have a lot that have the date and time in the filename.

TheLastGimbus · 2021-01-31T17:14:57Z

Is it possible to add an option to count files that have creation dates as the current day as wrong and to get the date from the filename

Huh... that is doable...

Maybe I will do this in separate branch, just for you, because it could mess the script (and it's performance) very much, and 99% people won't use it

Then, you will just manually git pull it

sionicion · 2021-01-31T19:00:41Z

Ok, it's up to you, I was looking at using the divide-to-dates parameter to see what all pops up in today's folder, so I can see all the problem files I've accumulated from reuploads to Google Photos. I looked around and saw examples of using exiftool to do it, but I haven't tested the commands yet because I'm moving the archives to another system that has more storage so I don't have to keep getting low disk space warnings.

sionicion · 2021-01-31T19:03:05Z

That's not to say this will find the 2GB of data not in the output folder, but I could always use the tool on everything separate from the folder containing all the missing metadata and see if that still happens.

TheLastGimbus · 2021-01-31T19:59:59Z

using the divide-to-dates

That's a good idea! Then you can do:

for f in os.listdir():
  if f[0:4] == 'IMG_':
    date = f[4:12]
    timestamp = datetime.strptime(
      f[0:4]+':'+f[4:6]+':'+f[6:8],
      '%Y:%m:%d'
    )
    os.utime(f, (timestamp, timestamp))

// This is just a reference script, it won't work. I can finish it if you don't know how to do it yourself 👍

2GB of data

Perhaps #57 fixed your problem? Try searching for more weird files

Sorted through using extension

...but inside input folder

sionicion · 2021-01-31T23:17:58Z

Okay still missing 2GB, I'm on my Mac now though so I used HoudahSpot to do a more advanced search and these are the extensions my takeout has.

m4v, gif, heif, jpeg, mkv, mts, mp4, png, mov, bmp

I think it's the MKV files! They weren't even supposed to be in Google Photos, they accidentally got uploaded in lol.

TheLastGimbus · 2021-02-01T00:23:22Z

Yay! So my script isn't fundamentally broken (maybe) 🎉!

Updated it. Try to pip install -U ... and run again (good luck with that 95GB 😅 )

sionicion · 2021-02-05T11:25:46Z

Sorry was waiting for my weekend, ran the script and my input folder is at 101.82 gigabytes and my output folder is coming to 101.8 gigabytes, so it looks like those MKVs are transferring! Although I'm deleting them because they shouldn't even be in my library, at least the script now accounts for MKV though. I'm working on going through the 1,641 files with the wrong date and luckily I am getting somewhere, some are junk that I can delete, and the most important ones I can fix the dates by the filename. Anyways, thanks for your help!

TheLastGimbus mentioned this issue Jan 29, 2021

Broader try-catch in all exif operations #56

Merged

TheLastGimbus closed this as completed in #56 Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dies on certain images #52

Dies on certain images #52

jasontitus commented Jan 24, 2021 •

edited

sionicion commented Jan 28, 2021

TheLastGimbus commented Jan 28, 2021

jasontitus commented Jan 28, 2021 via email

TheLastGimbus commented Jan 28, 2021

sionicion commented Jan 29, 2021

sionicion commented Jan 29, 2021

TheLastGimbus commented Jan 29, 2021

TheLastGimbus commented Jan 29, 2021

jasontitus commented Jan 30, 2021 •

edited

TheLastGimbus commented Jan 30, 2021

sionicion commented Jan 30, 2021

jasontitus commented Jan 30, 2021 via email

TheLastGimbus commented Jan 30, 2021

sionicion commented Jan 30, 2021 •

edited

TheLastGimbus commented Jan 30, 2021 •

edited

sionicion commented Jan 30, 2021

TheLastGimbus commented Jan 31, 2021 •

edited

sionicion commented Jan 31, 2021 •

edited

TheLastGimbus commented Jan 31, 2021

sionicion commented Jan 31, 2021

sionicion commented Jan 31, 2021

TheLastGimbus commented Jan 31, 2021 •

edited

sionicion commented Jan 31, 2021

sionicion commented Jan 31, 2021

TheLastGimbus commented Jan 31, 2021 •

edited

sionicion commented Jan 31, 2021

TheLastGimbus commented Feb 1, 2021

sionicion commented Feb 5, 2021

Dies on certain images #52

Dies on certain images #52

Comments

jasontitus commented Jan 24, 2021 • edited

sionicion commented Jan 28, 2021

TheLastGimbus commented Jan 28, 2021

jasontitus commented Jan 28, 2021 via email

TheLastGimbus commented Jan 28, 2021

sionicion commented Jan 29, 2021

sionicion commented Jan 29, 2021

TheLastGimbus commented Jan 29, 2021

TheLastGimbus commented Jan 29, 2021

jasontitus commented Jan 30, 2021 • edited

TheLastGimbus commented Jan 30, 2021

sionicion commented Jan 30, 2021

jasontitus commented Jan 30, 2021 via email

TheLastGimbus commented Jan 30, 2021

sionicion commented Jan 30, 2021 • edited

TheLastGimbus commented Jan 30, 2021 • edited

sionicion commented Jan 30, 2021

TheLastGimbus commented Jan 31, 2021 • edited

sionicion commented Jan 31, 2021 • edited

TheLastGimbus commented Jan 31, 2021

sionicion commented Jan 31, 2021

sionicion commented Jan 31, 2021

TheLastGimbus commented Jan 31, 2021 • edited

sionicion commented Jan 31, 2021

sionicion commented Jan 31, 2021

TheLastGimbus commented Jan 31, 2021 • edited

sionicion commented Jan 31, 2021

TheLastGimbus commented Feb 1, 2021

sionicion commented Feb 5, 2021

jasontitus commented Jan 24, 2021 •

edited

jasontitus commented Jan 30, 2021 •

edited

sionicion commented Jan 30, 2021 •

edited

TheLastGimbus commented Jan 30, 2021 •

edited

TheLastGimbus commented Jan 31, 2021 •

edited

sionicion commented Jan 31, 2021 •

edited

TheLastGimbus commented Jan 31, 2021 •

edited

TheLastGimbus commented Jan 31, 2021 •

edited