Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to migrate existing image dataset managed with IMS to PIMS #19

Open
geektortoise opened this issue Nov 22, 2021 · 1 comment
Open

Comments

@geektortoise
Copy link
Contributor

Hello,

I see that some files need to be in prefixed directory (prefix is "upload")
https://github.com/Cytomine-ULiege/pims/blob/master/pims/files/file.py#L27

Do you think it will be easy to migrate existing dataset to stick to new PIMS convention ? Or the script already exists ? :-)

Thanks for your work !
Have a nice day

@urubens
Copy link
Member

urubens commented Nov 29, 2021

Hi,

The script does not exist yet, but it's on my to-do list ;).

For most cases, I have the feeling migration is not too much complicated but there will be some corner cases.

First, a technical difficulty is that the migration script has to move/rename files on disk but also data in uploaded_file table. We cannot use regular core migration system as it does not have access to disk. My idea was to have a special container in the bootstrap for that migration, that should be run after Cytomine is stopped and before to restart it with the new version. This 'migration' container could be linked to disk and database, to directly move files disk and make bulk updates to the uploaded_file table.

The general idea would be: for every uploaded_file root:

  1. There is no children: file (.eg.: 123/myfile.tif) can be directly read by IMS.
    • We rename 123 to upload123
    • Identify format for upload123/myfile.tif and create required symlinks: /upload123/processed/original.PYRTIFF and /upload123/processed/visualisation.PYRTIFF
    • Compute histogram
    • Update record in the database (new path, and set "content type" to "PYRTIFF")
  2. There is a child: there is a conversion. The idea is the same but we need to identify the converted file format, and update also its record in the DB. Many conversions are no more needed as PIMS is able to read natively small files, so some conversion could be removed from DB and disk to gain space. Maybe we could have an option in the migration "keep no more needed conversions", it's a matter of space/time trade-off.

Other special treatments would be required for :

  • multi-file formats imported as archives,
  • archive with collection of distinct images
  • experimental conversion of ome-tiff and zeiss CZI where each plane is in a single pyrtiff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants