New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
165 hdf5 in memory using core driver #173
165 hdf5 in memory using core driver #173
Conversation
…eading the buffer with getInMemoryFileContents
HDFD_CORE_INEMEORY.
- added support for CORE,STDIO, SEC2 drivers - added additional checks and exceptions
Quick comment @michalslonina, the |
Hi @scopatz. I don't exactly understand you. Looking at the code, i see a few examples like:
Also in the parameters.py every parameter is upper case, like: BOUNDS_MAX_SLOTS = 4*1024 Should I convert all of them to lower case ? |
In parameters.py, these are all global 'constants', so these should be uppercase. However, keyword arguments to functions should always lowercase with underscores. # Good
def f(driver=""):
pass
# Bad
def f(DRIVER=""):
pass I hope this clarifies things. |
@michalslonina, sorry for the late replay.
|
@michalslonina, it seems that some of the features you used are only available in HDF5 >= 1.8.9. Do you this it is possible to enable this feature conditionally (only if all dependencies are available)? |
Or in response to the version issue, to make what you have work with HDF5 v1.8.4+ would be even better! |
@avalentino, @scopatz thanks for the incredible feedback and the code review. I agree with you 100% on the mentioned issues. I'll update my branch this week. I'll try to write a better setup script that does a better job at library version detection and write some fallback too. @avalentino Regarding the assert, you are right, this was a bad idea. Regarding the chunk cache, i found a small explanation how it works here: ((hvl_t ) udata)->p = size; / the author of this should be hanged drawn and quartered */ |
Thanks @michalslonina! I look forward to getting this merged in... Just let us know when you want us to look at it again. |
handling - fixed a dangerous memory initialization bug
Hi Guys, have some spare time, so I can get back and fix all the stuff. To be clear on the params issue, I would like to clarify a few things. I've added DRIVER, as well as H5FD_CORE_INMEMORY_IMAGE parameters to the processing of file **kwargs. I will delete the DRIVER from parameters.py and move it to file.py:File.init(..., driver=None). H5FD_CORE_INMEMORY_IMAGE parameter should be lowercase as well, since its not a constant, right ? All the other parameters referenced, like CHUNK_CACHE__,MAX_BLOSC__, MAX_NUMEXPR_THREADS_* stay uppercase, even though you can pass them as *_kwarg to the File.init() function (as i understand, this code copies the args from parameters.py to params variable that represents union of *kwargs and parameters.py variables. Considering the above, the DRIVER and H5FD_CORE_INMEMORY_IMAGE will still be lowercase, right ? |
memory_image function arguments
Yes, I think that is correct. driver and memory_image are supposed to lowercase. |
While I agree with @scopatz that keywords should be lowercase I would like to preserve consistency with the current convention used for tables.File keyword parameters (that currently are uppercase). Also, IMO at least the My suggestion is to keep using the old convention (uppercase parameters) and then then address the coding style question in a second time for all keyword parameters altogether. After all we are planning a big API change to improve the coding style. The patch in this case would e trivial: params = dict([(k.lower(), v) for k, v in parameters.dict.iteritems() instead of params = dict([(k, v) for k, v in parameters.dict.iteritems() What do you think about it? |
@avalentino The convention you are asking was already implemented in 58e92fd. I've changed it on your request guys. I can revert the patch if you like. It will be wise to move DRIVER back to the parameters.py. As for the H5FD_CORE_INMEMORY_IMAGE, it doesn't matter. The parameter is used only to provide the string that holds the image of the file. IMHO keeping the DRIVER in parameters.py is more consistent with the current implementation. It also allows provides one point in code where you can change the default behavior and the code looks cleaner as the default values will not need to be passed explicitly between function calls. @avalentino @scopatz I will gladly accept any decision you make. |
I leave it up to @avalentino |
Oh it was just an idea :) |
@avalentino I'm a new contributor to the project, i don't think I should make such decisions. I'll take your last proposal as the official direction ;) |
…er and" This reverts commit fcc2db8. Conflicts: tables/hdf5Extension.pyx
…code path as the other drivers
- fixed the case with no DRIVER argument provided - fix compilation of H5PCORE-mem
Summary of the fixed problems in the recent commits:
|
@michalslonina Thanks a ton! Sorry for accidentally closing.... |
Thanks @michalslonina, I'll work on it ASAP. |
@avalentino @scopatz It's a pleasure working with you guys. PyTables is an amazing piece of software. Please let me know if there are any other issues I should fix. |
Thanks for pushing this feature through, @michalslonina. I'll let @avalentino merge this in. |
The pull request is now merged into develop with some modifications:
Probably writing a short tutorial about this wold be a good idea. Thanks again @michalslonina. |
Hey all. This feature looks awesome, and is just what I was looking for to bypass writing to disk for persisting some raw data stored in compressed HDF5 to remote MongoDB. When can we expect to see it released with 3.0? :) |
PyTables 3.0 will be released with Python 3.x support, which @avalentino is leading. Any of the remaining issues that for this milestone that you want to help out with would help speed this up! ;) |
This patch adds the ability to use different libhdf5 file drivers (H5FD_SEC2,H5FD_CORE,H5FD_STDIO) and one special fake driver H5FD_CORE_INMEMORY that enables the user to read and write hdf5 documents in memory.
Setting the driver is done by passing the DRIVER **kwarg to openFile function.
To pass an in-memory document to libhdf5, please pass the python string as the H5FD_CORE_INMEMORY_IMAGE argument to openFile function and set the DRIVER to H5FD_CORE_INMEMORY.
To write an in-memory just set the DRIVER to H5FD_CORE_INMEMORY and use pytables as usual.
After closing the file object, your data will be made available thru the file.getInMemoryFileContents().