-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Optimizations for Non-Bulk Indexing #26
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good.
try: | ||
if is_processable_path(p): | ||
if os.path.isfile(p): | ||
await process_file(p, manager, fc_rc, no_patch, dryrun) | ||
await index_file(p, manager, fc_rc, patch, dryrun) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want even more performance, you can run multiple index_file
operations in parallel, which may help if it is waiting on a FC response. Example: https://github.com/WIPACrepo/iceprod/blob/master/iceprod/server/scheduled_tasks/update_task_priority.py#L66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep that in mind if it becomes a noticeable issue. The FC turnaround-time actually isn't all that bad usually. The checksums and I3Reader reading are the most expensive ops by far.
Plus, as a consequence of multi-threading, I think parallel I/O operations would actually slow things down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be nice is to re-design the multi-processing because currently the parent process isn't doing much, which just lowers CPU utilization.
depends on WIPACrepo/iceprod#294 for git pips |
The indexer was designed for bulk-indexing. However, it may become necessary to point the indexer at a single file (or a small batch). Until now, that would require a not insignificant overhead.
--no-patch
with--patch
--patch
, skip the file--blacklist-file
)--processes 1
/ the default case)-n
/--non-recursive
)--blacklist
for shorter blacklistscloses #23
closes #24