Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dies on certain images #52

Closed
jasontitus opened this issue Jan 24, 2021 · 28 comments · Fixed by #56
Closed

Dies on certain images #52

jasontitus opened this issue Jan 24, 2021 · 28 comments · Fixed by #56

Comments

@jasontitus
Copy link

jasontitus commented Jan 24, 2021

After running for 20+ hours, the script dies on a specific image even though it parses and displays fine. I have reproduced it with a directory of just the image. This is running release 2.0 on Ubuntu Linux 20.10

The log looks like this -

~/.local/bin/google-photos-takeout-helper -i brokenimages -o testout
Heeeere we go!
=====================
Fixing files metadata and creation dates...
=====================
brokenimages/IMG_4661(3).jpg
Traceback (most recent call last):
  File "/home/jasontitus/.local/bin/google-photos-takeout-helper", line 8, in <module>
    sys.exit(main())
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 570, in main
    for_all_files_recursive(
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 114, in for_all_files_recursive
    file_function(file)
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 494, in fix_metadata
    set_creation_date_from_exif(file)
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 343, in set_creation_date_from_exif
    exif_dict = _piexif.load(str(file))
  File "/home/jasontitus/.local/lib/python3.8/site-packages/piexif/_load.py", line 43, in load
    exif_dict["Exif"] = exifReader.get_ifd_dict(pointer, "Exif")
  File "/home/jasontitus/.local/lib/python3.8/site-packages/piexif/_load.py", line 118, in get_ifd_dict
    tag = struct.unpack(self.endian_mark + "H",
struct.error: unpack requires a buffer of 2 bytes

The jpeginfo output for the file is -
jpeginfo -c brokenimages/IMG_4661\(3\).jpg 
brokenimages/IMG_4661(3).jpg 2592 x 1936 24bit Exif  N 2167440  [OK]

Here is a link to the file

@sionicion
Copy link

I think I'm getting the same error here. I did read somewhere just delete the last picture it was working on, but it still crashes.

Traceback (most recent call last):
  File "/usr/local/bin/google-photos-takeout-helper", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/google_photos_takeout_helper/__main__.py", line 593, in main
    for_all_files_recursive(
  File "/usr/local/lib/python3.8/dist-packages/google_photos_takeout_helper/__main__.py", line 117, in for_all_files_recursive
    file_function(file)
  File "/usr/local/lib/python3.8/dist-packages/google_photos_takeout_helper/__main__.py", line 517, in fix_metadata
    set_creation_date_from_exif(file)
  File "/usr/local/lib/python3.8/dist-packages/google_photos_takeout_helper/__main__.py", line 366, in set_creation_date_from_exif
    exif_dict = _piexif.load(str(file))
  File "/usr/local/lib/python3.8/dist-packages/piexif/_load.py", line 46, in load
    exif_dict["GPS"] = exifReader.get_ifd_dict(pointer, "GPS")
  File "/usr/local/lib/python3.8/dist-packages/piexif/_load.py", line 108, in get_ifd_dict
    tag_count = struct.unpack(self.endian_mark + "H",
struct.error: unpack requires a buffer of 2 bytes

@TheLastGimbus
Copy link
Owner

Ugh... piexif is super buggy 😕

Tho I don't know if I can replace it... I neeed to just wrap it in try-catch...

@jasontitus
Copy link
Author

jasontitus commented Jan 28, 2021 via email

@TheLastGimbus
Copy link
Owner

I know... nevertheless, it weird - i thought all exif operations were try-catched 🤔

I will improve this when I have some time 👍

@sionicion
Copy link

This is a little off-topic but I'll be adopting PhotoPrism and I see a setting for not creating ExifTool JSON files, which means it does produce these same files. Does that mean I could import a takeout directly into PhotoPrism? Regardless, I was going to wait until this issue is fixed and then add the pictures into PhotoPrism, I was just curious how it would work with existing ExifTool files or if it would ignore them.

@sionicion
Copy link

Ah I see they have their own page on it. I also looked to see if they had any issues open for the new Google Takeout years format but I don't see anything, I'm going to give their import process a shot.

@TheLastGimbus
Copy link
Owner

ExifTool JSON files

Google's JSONs are not from ExifTool :/

they have their own page on it

Can you link it here? I'm curious what they have

I'm going to give their import process a shot.

Let us know how well it works 👍

@TheLastGimbus
Copy link
Owner

pip install -U google-photos-takeout-helper==2.1.0b2

As always, try it out and let me know if works 👍

@jasontitus
Copy link
Author

jasontitus commented Jan 30, 2021

Nope. Still fails on the images it doesn't like:

/usr/local/bin/google-photos-takeout-helper -i brokenimages/ -o testout/
Heeeere we go!
=====================
Fixing files metadata and creation dates...
=====================
brokenimages/IMG_7322(1).jpg
Traceback (most recent call last):
  File "/usr/local/bin/google-photos-takeout-helper", line 8, in <module>
    sys.exit(main())
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 593, in main
    for_all_files_recursive(
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 117, in for_all_files_recursive
    file_function(file)
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 517, in fix_metadata
    set_creation_date_from_exif(file)
  File "/home/jasontitus/.local/lib/python3.8/site-packages/google_photos_takeout_helper/__main__.py", line 366, in set_creation_date_from_exif
    exif_dict = _piexif.load(str(file))
  File "/home/jasontitus/.local/lib/python3.8/site-packages/piexif/_load.py", line 43, in load
    exif_dict["Exif"] = exifReader.get_ifd_dict(pointer, "Exif")
  File "/home/jasontitus/.local/lib/python3.8/site-packages/piexif/_load.py", line 122, in get_ifd_dict
    value_num = struct.unpack(self.endian_mark + "L",
struct.error: unpack requires a buffer of 4 bytes
```
`

@TheLastGimbus
Copy link
Owner

Um, did you even update the script?

pip install -U google-photos-takeout-helper==2.1.0b2

You should at least get a "oh-oh, script crashed" message I introduced in #56

For me, the output for your problematic image is like this:

Heeeere we go!
=====================
Fixing files metadata and creation dates...
=====================
test/IMG_4661(3).jpg
Can't read file's exif!
No exif for test/IMG_4661(3).jpg
Couldn't find json for file 
Last chance, coping folder meta as date...
Couldn't pull datetime from album meta
ERROR! There was literally no option to set date!!!
TODO: We should do something about this - move it to some separate folder, or write it down in another .txt file...
=====================
Coping all files to one folder...
(If you want, you can get them organized in folders based on year and month. Run with --divide-to-dates to do this)
=====================
=====================
Removing duplicates...
=====================

DONE! FREEEEEDOOOOM!!!

Final statistics:
Files copied to target folder: 1
Removed duplicates: 0
Files for which we couldn't find json: 1

@sionicion
Copy link

ExifTool JSON files

Google's JSONs are not from ExifTool :/

they have their own page on it

Can you link it here? I'm curious what they have

I'm going to give their import process a shot.

Let us know how well it works 👍

I'm referring to this help topic.

And ah ok I just assumed regarding ExifTool. Anyway I did what the help topic suggested and it seems it does import in the data, but it's still a mess of course (maybe that's because of the new structure). I'm thinking of trying this out again with your patch. Just curious though, what happens to files that can't find JSON? I'm assuming they get left in the original directory? Would I be left with a folder full of the original Takeout, and a folder that has most of the pictures but not the ones that failed?

@jasontitus
Copy link
Author

jasontitus commented Jan 30, 2021 via email

@TheLastGimbus
Copy link
Owner

what happens to files that can't find JSON?

Script tries to find any other way to find their creation date, from exif or folder name, and if there is absolutely no way, it just copies it as-is

Although I want to change this behavior later so it copies it to separate folder

@sionicion
Copy link

sionicion commented Jan 30, 2021

Okay so I was able to run this successfully this time, but I noticed the output folder is 2GB less. 93 gigabytes vs 95 gigabytes. I don't think every file is in the output folder. What do you think of my results?

DONE! FREEEEEDOOOOM!!!

Final statistics:
Files copied to target folder: 20188
Removed duplicates: 0
Files for which we couldn't find json: 2761
Files where inserting correct exif failed: 2902
 - you have full list in takeout-combined2/failed_inserting_exif.txt
Files where date was set from name of the folder: 0
(you have full list in takeout-combined2/date_from_folder_name.txt)

@TheLastGimbus
Copy link
Owner

TheLastGimbus commented Jan 30, 2021

Removed duplicates: 0

HUH. This is either very lucky, or very weird... aspecially for 95GB 🤔

What do you think of my results?

Do you maybe have Linux/Mac? You can easily do:

cd you/takeout/folder
du -ch **/*.json
# This will print out what total weight of all json files

For my sample 4.3GB it was 31MB...

Please try to find how to count it on Windoza, if you have the misfortune to have it

I'm gonna be honest - I don't know, and have no good way to test it, it this script works flawlessly and copies everything... all the workarounds around duplicates etc made it complicated... but it should...

I just have an idea - can I replace the final copy command with move - just experimentally, to se if you maybe have some weird files in weird formats that are not included in is_photo() or is_video() funcion... This wouldn't test if de-duplicating works well, but would always be something!

Files where inserting correct exif failed: 2902

Out of curiosity - can you tell me (just ctrl+f in notepad in .txt. file) how many of these files were jpg's and how many png's etc?

@sionicion
Copy link

Total weight is at 123M. Failed PNGs at 910, failed JPGs at 447. Let me know if you want me to test anything else.

@TheLastGimbus
Copy link
Owner

TheLastGimbus commented Jan 31, 2021

Failed PNGs

Failed files should be moved too, hmmm

I think this is our remove_duplicates function could be somehow broken... either deleting something what it shouldn't, or just not logging something it legitimately deleted 🤔

Try to find if you have any photos/videos that are not from this list:

photo_formats = ['.jpg', '.jpeg', '.png', '.webp', '.bmp', '.tif', '.tiff', '.svg', '.heic']
video_formats = ['.mp4', '.gif', '.mov', '.webm', '.avi', '.wmv', '.rm', '.mpg', '.mpe', '.mpeg', '.m4v']

@sionicion
Copy link

sionicion commented Jan 31, 2021

Sorted through using extension and only slight deviations I see are some files have capitalized extensions mixed in --> .HEIC .JPG .MOV .MP4

This is from the output folder.

@TheLastGimbus
Copy link
Owner

is_photo and is_video use .lower(), so that shouldn't be a problem...

@sionicion
Copy link

So even in Google Photos, I have a lot of pictures that lost their metadata. I had switched Google accounts at some point and uploaded all my pictures without the JSON, I don't even know if Google's upload tools take them into account? Anyway, a quarter of my library is under one day in Google Photos. Now as soon as I download or extract these pictures, they end up having a created date of today, but the filename is right.

Is it possible to add an option to count files that have creation dates as the current day as wrong and to get the date from the filename? This is an example - IMG_20161223_183024 1.jpg - this file has a date of June 10th, 2020 in Google Photos, but that was I believe the day I uploaded to the second Google account, when I download it, the date becomes today, and when I extract it from the Takeout archive it also becomes today.

@sionicion
Copy link

Well not quite a quarter of my pictures, 920 to be exact, but that's still a lot I need to fix somehow. A lot of them are saved snapchats though which have random filenames, besides them though I have a lot that have the date and time in the filename.

@TheLastGimbus
Copy link
Owner

TheLastGimbus commented Jan 31, 2021

Is it possible to add an option to count files that have creation dates as the current day as wrong and to get the date from the filename

Huh... that is doable...

Maybe I will do this in separate branch, just for you, because it could mess the script (and it's performance) very much, and 99% people won't use it

Then, you will just manually git pull it

@sionicion
Copy link

Ok, it's up to you, I was looking at using the divide-to-dates parameter to see what all pops up in today's folder, so I can see all the problem files I've accumulated from reuploads to Google Photos. I looked around and saw examples of using exiftool to do it, but I haven't tested the commands yet because I'm moving the archives to another system that has more storage so I don't have to keep getting low disk space warnings.

@sionicion
Copy link

That's not to say this will find the 2GB of data not in the output folder, but I could always use the tool on everything separate from the folder containing all the missing metadata and see if that still happens.

@TheLastGimbus
Copy link
Owner

TheLastGimbus commented Jan 31, 2021

using the divide-to-dates

That's a good idea! Then you can do:

for f in os.listdir():
  if f[0:4] == 'IMG_':
    date = f[4:12]
    timestamp = datetime.strptime(
      f[0:4]+':'+f[4:6]+':'+f[6:8],
      '%Y:%m:%d'
    )
    os.utime(f, (timestamp, timestamp))

// This is just a reference script, it won't work. I can finish it if you don't know how to do it yourself 👍

2GB of data

Perhaps #57 fixed your problem? Try searching for more weird files

Sorted through using extension

...but inside input folder

@sionicion
Copy link

Okay still missing 2GB, I'm on my Mac now though so I used HoudahSpot to do a more advanced search and these are the extensions my takeout has.

m4v, gif, heif, jpeg, mkv, mts, mp4, png, mov, bmp

I think it's the MKV files! They weren't even supposed to be in Google Photos, they accidentally got uploaded in lol.

@TheLastGimbus
Copy link
Owner

Yay! So my script isn't fundamentally broken (maybe) 🎉!

Updated it. Try to pip install -U ... and run again (good luck with that 95GB 😅 )

@sionicion
Copy link

Sorry was waiting for my weekend, ran the script and my input folder is at 101.82 gigabytes and my output folder is coming to 101.8 gigabytes, so it looks like those MKVs are transferring! Although I'm deleting them because they shouldn't even be in my library, at least the script now accounts for MKV though. I'm working on going through the 1,641 files with the wrong date and luckily I am getting somewhere, some are junk that I can delete, and the most important ones I can fix the dates by the filename. Anyways, thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants