Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alleviate HDF5 bottleneck #194

Merged
merged 2 commits into from
Jul 27, 2022
Merged

Alleviate HDF5 bottleneck #194

merged 2 commits into from
Jul 27, 2022

Conversation

clementinboittiaux
Copy link
Contributor

Hi @skydes
I noticed a speed bottleneck when dealing with HDF5 files. When there is a large number of groups, it takes much more time to read / write to the HDF5 file. As a result, if using more than 15,000 images, it takes much more time to write to the file than compute NetVLAD features for example. A solution was suggested in h5py/h5py#1055 and https://stackoverflow.com/questions/45023488/inserting-many-hdf5-datasets-very-slow . You simply need to use the libver='latest' option when opening HDF5 files. I found that it greatly increased writing speed when creating a great number of groups.
Sorry for the little hiccup, I accidentally deleted the branch.

Use libver='latest' when opening HDF5 files. Greatly increases writting spead when creating a great number of groups. This was suggested in h5py/h5py#1055 and https://stackoverflow.com/questions/45023488/inserting-many-hdf5-datasets-very-slow .
@sarlinpe
Copy link
Member

sarlinpe commented Jun 3, 2022

Hi @clementinboittiaux,
Nice catch! Does this break backward compatibility - i.e. are you still able to read files created before this fix? HDF5 has indeed given us some trouble recently. Changing the pair separator to create subgroups (#159) helped for long names.

@clementinboittiaux
Copy link
Contributor Author

Hey,
I did not stumble across any incompatibility issue so far. I have been using this new branch with features computed before and after the modifications and everything went well !
I just had trouble when reading features from hLoc with PixLoc, but it seems that it is only because the data structure of the h5 files you are using in PixLoc are not the same. (I made a dirty fix on my fork https://github.com/clementinboittiaux/pixloc/blob/0e1c5b1007ea37f1684a5158007ee79bbb4c7b01/pixloc/utils/io.py#L70).

@sarlinpe
Copy link
Member

Got it, thanks! I'll do some tests and merge later.

Indeed the structure of the HDF5 file is slightly different, I should mention this somewhere.

@sarlinpe sarlinpe merged commit 9c04a19 into cvg:master Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants