Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document format and workflow #78

Closed
quicklizard99 opened this issue Oct 8, 2014 · 4 comments
Closed

Document format and workflow #78

quicklizard99 opened this issue Oct 8, 2014 · 4 comments
Assignees

Comments

@quicklizard99
Copy link
Member

inselect works on and/or generates data spread across several unconnected files:

  • high-res scanned image
  • optional low-res thumbnail
  • metadata
  • normalised coordinates of each specimen
  • crops of specimens

This is confusing, error-prone and does not integrate with workflows.

@quicklizard99
Copy link
Member Author

Suggestions - please comment.

Document format

Using a fictitious scanned image - Scan-1001.tiff - as an example.

  • Scan-1001/Scan-1001.tiff - the high-res scanned image - might also be .jpg, .png or other image format
  • Scan-1001/Scan-1001-thumbnail.jpg - a low-res thumbnail - for display within inselect and used by the segmentation algorithm
  • Scan-1001/Scan-1001.inselect - a json document containing, for each specimen
    • normalised (i.e., between 0 and 1) coordinates within the scan
    • metadata, both user-entered and decoded from barcodes
  • Scan-1001/Scan-1001_specimen_crops/BMNHE_1246958.jpg - cropped from the high-res scanned image; one such file for each specimen within the scan

Workflow

  1. The scanning software saves scanned images to the scanned_images folder:

    scanned_images/Scan-1001.tiff

  2. inselect's nightly processing - for each image in scanned_images:

    • Create a directory within inselect_docs, in this case inselect_docs/Scan-1001/
    • Move scanned_images/Scan-1001.tiff to inselect_docs/Scan-1001/Scan-1001.tiff
    • Create low-res thumbnail inselect_docs/Scan-1001/Scan-1001-thumbnail.jpg
    • Run coarse segmentation on inselect_docs/Scan-1001/Scan-1001-thumbnail.jpg
    • Save scanned_images/Scan-1001/Scan-1001.inselect
  3. User actions in inselect

    • Open scanned_images/Scan-1001/Scan-1001.inselect
    • Review and refine segments
    • Add metadata, including decoded barcodes
    • Click 'Save', at which point inselect will:
      • Write inselect_docs/Scan-1001/Scan-1001.inselect
      • Ask the user if he/she would like the specimen crop images to be written
        • If 'yes'
          • create inselect_docs/Scan-1001/Scan-1001_specimen_crops/ if it does not already exist
          • clear any existing image files in inselect_docs/Scan-1001/Scan-1001_specimen_crops/
          • crop specimen image from inselect_docs/Scan-1001/Scan-1001.tiff and save them to inselect_docs/Scan-1001/Scan-1001_specimen_crops/ - filenames are computed from metadata
        • If 'no', remove inselect_docs/Scan-1001/Scan-1001_specimen_crops/, if it exists

inselect UI changes

  • File menu:
    • 'Open'
    • 'Save'
    • 'Ingest image' (which runs step 2 - inselect nightly processing - on a single image)
    • 'Exit'
  • Remove the 'Open image', Save boxes', 'Import boxes' and 'Export images' commands
  • Add a 'document dirty' flag - if the user modifies a document in any way, he/she is prompted to save when inselect is closed

Configuration

  • Location of scanned_images
  • Location of inselect_docs
  • Metadata fields, along with validation, e.g., mandatory, integer greater than zero
  • Format of the 'specimen image crop' filenames, e.g.:
    • Specimen number taken from barcode - BMNHE_1246959.jpg
    • Crop number - Scan-1001-crop-23.jpg

@aliceh75
Copy link
Contributor

aliceh75 commented Oct 8, 2014

A few notes:

  • Something implied in your workflow is that opening the .inselect file directly opens the associated image. I agree this is a good idea, but it should be made explicit;
  • How does inselect know about the low and high res images? The current approach (which is not ideal) is that the low res image is the base image, and when cropping inselect looks for an image of the same name with a .tiff extension. What I gather from your workflow is that it works the other way round: the base image is the .tiff file, and whenever inselect does work on this image, it looks for, or creates, a thumbnail version in the same folder. Is this correct?

@quicklizard99
Copy link
Member Author

Thanks Alice. I agree with your first point. Yes, I propose changing inselect to generate the low-res image as part of its ingest process. For cases where the scan itself is a low-res jpeg, perhaps

  • image ingest would not generate the low-res thumbnail
  • segmentation and cropping would operate directly on the scanned image
  • the .inselect document would contain a flag indicating that the thumbnail image is not necessary

@quicklizard99
Copy link
Member Author

Now implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants