New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File search gets too slow with large databases #60
Comments
Attached: Results of a profiling session illustrating the problem. |
In function bst_get/findFileInStudies, can't we check whether the file we're looking for exists in each protocol study, instead of going through every files of the study? |
If we decide to go for a major refactoring, I would build a tree structure. I feel like a hierarchical structure is more appropriate than a hash table, since files are organized in protocols, studies, subjects, conditions and so forth. |
You mean checking the actual file on the hard drive?
Indeed, this flat structure is quite bad. It dates back from way before I joined. Maybe we could do something relatively light on top of the existing structure, that would keep all the file names in the current database in a cache, so that we can find faster the corresponding study/subject/file indices. You could try benchmarking a few of these techniques:
Does it make sense? |
Tentative speedup implemented in 6d6a453. |
The issue is that all the file searches rely on comparing the filename with all the filenames in the database at every call. Function bst_get/findFileInStudies is bottleneck for processes that do a lot of operations with small files.
Issue reported by Emily: http://neuroimage.usc.edu/forums/t/processes-slowing-way-down/5280/4
Possible solution: manage a large hash table with all the filenames of the database?
(maybe this could be manage internally in bst_get.m)
The text was updated successfully, but these errors were encountered: