As we all know Google Photos is utterly pathetic when it comes to Taking out media files.
After you've untarred/unzipped your takeout archives, the following issues are present:
- All media files have incorrect or absent Exif data (this data is in related
.json
files). - Some media files have no companion json files.
- Media files with long names have cut-off companion json file names, e.g.:
Photo: IMG_123456790.jpg
Companion: IMG_123456790.j.json
- Some
jpg
files and companion files have thejpeg
extension, e.g.:
Photo: IMG_1234.jpg
Companion: IMG_1234.jpeg.json
- Some media files and companion files have inconsistent extension casing, e.g.:
Photo: IMG_1234.jpg
Companion: IMG_1234.JPG.json
python3 fixgptakeout.py [dir]
Where [dir]
is the directory to recursively fix. Right after unarchiving your takeout you can use "Takeout/Google Photos"
for example.
The awesome ExifTool project can be used to automatically import the json data as exif data into the relevant media files, but only if the json and media files are named perfectly consistently.
For sake of interest, here is the command that does the exif fix:
exiftool -r -d %s -tagsfromfile "%d/%F.json" "-GPSAltitude<GeoDataAltitude" "-GPSLatitude<GeoDataLatitude" "-GPSLatitudeRef<GeoDataLatitude" "-GPSLongitude<GeoDataLongitude" "-GPSLongitudeRef<GeoDataLongitude" "-Keywords<Tags" "-Subject<Tags" "-Caption-Abstract<Description" "-ImageDescription<Description" "-DateTimeOriginal<PhotoTakenTimeTimestamp" -ext '*' -overwrite_original --ext json [dir]
The %d/%F.json
part specifies that the companion json files will be named exactly the same as the related media files (with a lowercase extension) and .json
appended to the end.
Chevereto is an open-source photo hosting app that has native support for importing Google Photos Takeout images and parsing the related json files, but obviously only if they are named consistently.
The takeout does a lot of unnecessary media duplication. Specifically, media items that exist in more than one album are fully copied to all relevant album folders. And this includes the generated by-year albums. So, if you have a 512Mb video in a road-trip album from 2016 then that video will also exist (as a full copy) in the album-folder Photos from 2016
thereby taking up a GB of space. It's even worse if you have media items in 3 or more albums each.
python3 dedup.py [dir]
Where [dir]
is the parent directory that contains all the album-folders.
The following will happen:
- Media items in any user albums will be deleted from the relevant by-year album-folder.
Album-folder | Media files |
---|---|
Road trip '16 | img123.jpg img124.jpg |
Photos from 2016 |
- Media items that occur in multiple user albums will be moved to "multi-album" folders and deleted from their source album-folders so that there is only one copy of said media items.
Album-folder | Media file |
---|---|
Road trip '16 | |
Road trips mega-album | |
[New] Road trip '16 _, Road trips mega-album | img123.jpg |
- If a media item has a companion
.json
file, that file will be moved/deleted along with it. Thus, it's crucial to first run the above json fix script to get the naming right.
Album-folder | Files |
---|---|
Road trip '16 | img123.jpg img123.jpg.json |
Photos from 2016 |
Notes:
- Duplication checking happens by way of md5 hashing, because different photos may coincidentally have the same name.
- It's probably best to run the deduplication before you make exif changes to the images. Because you never know with Google the same image could have a json file in one album-folder but not in another. The exif difference would cause the two instances to hash differently, even though they are the same image.
- Reduces space usage
- Makes importing into a photo hosting app easier. Now you can simply select a folder and add all its contents to the album/s it's named after. Or even write a script to do it with some cheeky API calls.
Because Google often changes its API's on a whim, I fully expect these scripts, and the related exiftool
command not to work at some point in the future. But, as of January 2021 it works, so Takeout your photos and use it while you can!