-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editing annotations and images in migrated datasets #672
Comments
A v2 of the object was created in the ocfl, but it's got extra versions of the four original images, because of the issue where the same image uploaded twice is not bitwise identical (the exif stripping problem). It shouldn't have let me upload five images - what I need to work out is how to get it to recognise that if it has four images, and I give it a weedcoco.json with those four images and an extra one, to only ask for one. In the first test I didn't do this - I was using weedcoco from my laptop, which had filenames like 'image1.jpg' rather than hashes. |
Testing run 2:
This still complained about not remapping the images, but successfully indexed the new dataset with v2. The ocfl deduplication issue #665 seems to have come back - in the resulting ocfl, there are nine image files, with the four images in the first dataset appearing twice. Currently trying to track down where/when the original four files have changes in the editing process - I may be working on a branch which doesn't have the bug fix for that |
Have now tested it with the ocfl dedupe bugfix in place, and it's not duplicating the original images. |
Here is a diagram of the editing process where the original dataset was migrated from the pre-ocfl repository, and doesn't have mappings back to the original filenames. My original diagram showed how it works if the original dataset was post-ocfl: The problems I was having arise if the updated weedcoco.json refers to files using their original filenames. The system has no way to map these to the hashed image files, so it calculates the wrong number of missing images, and can get missing key exceptions when it tries to thumbnail them. A way to avoid this is to base the updated weedcoco.json on a zipfile which you download from WeedAI, because this version has the hashed image names: here's how that looks *when the redis map is built, it's going to be a mix of references to old pre-ocfl hashes and new unhashed image names, for eg:
One way to avoid all this complication would be to restrict whether existing datasets can be edited, or in what way we can edit them. IE:
If the user edits a dataset and uploads a weedcoco.json which doesn't have mappings in redis, that means it's an old-style dataset. The upload stepper could then check to see if the weedcoco.json image filenames (which will be old hashes) match the filenames in the dataset. If they do, then the update should work ok. If they don't, it means they're using a weedcoco.json with the original files. We could give the user a message like "to update this dataset, use this weedcoco.json as a starting point" and provide a link to download the weedcoco.json which has hashed filenames. |
Diagram of the three scenarios - the one which causes problems is the third, where the user uploads a weedcoco.json where none of the referenced images are in the existing dataset. The most likely cause of this is the error I hit - uploading an edited version of the weedcoco.json which has filenames before they were hashed (and which are unrecoverable because there's no redis mapping for them). It would be possible to trigger the same condition by uploading a dataset, then changing all of the images and weedcoco.json refs to different filenames, and uploading that - but I think that's a perverse case and one we can discount in this release. |
Testing steps
on initialising edit, got a set of warnings because the images in the original dataset did not have mappings from hashed file names to original filenames in Redis. This is normal - none of the migrated datasets will have redis mappings.
On uploading the new weedcoco.json, the frontend reports that there are five missing images (not one, as expected)
On uploading the five images, got the following error in creating thumbnails:
This means that the thumbnailer can't find the annotations for an image.
Need to think through how the editing code will work in the situation where we have a dataset without original image files (ie all of the existing dataset) or if we just rule out editing them somehow. I think that it should be possible to make it work for them.
The text was updated successfully, but these errors were encountered: