Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make updates more efficient regardless either dandiset with zarrs or not #364

Open
yarikoptic opened this issue Nov 2, 2023 · 0 comments

Comments

@yarikoptic
Copy link
Member

ATM we have two cron jobs tools/backups2datalad-update-cron (ran often) and tools/backups2datalad-update-cron-108 (ran long) since 108 contained zarrs and their backup is much more involved (see e.g. #363 ) than of regular files. But now other dandisets also start to contain zarrs. We need to figure out a workflow to perform updates in such a fashion that we do not need some custom separation across dandisets.

I think overall we should start using some proper job system to orchestrate updates. May be even a full blown celery with that flower to monitor the status? Then workflow could be

  • given a dandiset with changed time stamp, and no ongoing already job to update it, schedule an update_dandiset job which
    • for all zarr assets check if they exist, not being uploaded, and up-to-date (based on date).
      • if any missing - schedule a job to have zarr created/updated
      • if any out of date - schedule a job to have zarr updated
      • we might need a "registry" of jobs since can't query celery for ongoing/planned jobs so we skip dandiset if any job is still running
      • in any of above cases, skip updating the dandiset in this round
    • if no zarrs - or all zarrs found up to date [*], proceed with update of the dandiset as we do now

[*] alert -- race condition, unless we collect specific commits for each zarr so we update them to those and would be fine even if zarr is being modified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant