-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap a subset of the libhdf5 table interface methods #777
Conversation
src/api_helpers.jl
Outdated
|
||
function h5tb_get_field_info(loc_id, table_name) | ||
nfields, = h5tb_get_table_info(loc_id, table_name) | ||
field_names = Vector{UInt8}[fill(0x00,255) for i in 1:2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if there is any way to determine the length of these.
In https://confluence.hdfgroup.org/display/HDF5/H5TB_GET_FIELD_INFO they just allocate 255 sized buffer.
src/api_helpers.jl
Outdated
field_sizes = Vector{Csize_t}(undef,nfields) | ||
field_offsets = Vector{Csize_t}(undef,nfields) | ||
type_size = Ref{Csize_t}() | ||
while true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To work around the issue of determing the string buffer size, here I just initially arbitrary set it to 64 then we keep doubling until it until the string fits (detected through presence of a null byte)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I don't think this is valid/safe: https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5/browse/hl/src/H5TB.c?at=refs%2Fheads%2Fhdf5_1_10#3092 libhdf5 just does a strcpy
, so for a long name, it is overflowing the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, which is why if it doesn't fit, we resize the buffer and then check if there's a 0x00
in the buffer which would indicate it was copied correctly. If it didn't fit it would overflow the buffer and no null byte would be detected, in which case we increase the buffer size.
My local testing, where I set the fieldname to be a long string, showed that this strategy worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But overflowing means you're overwriting other bytes in memory — this is a classic buffer overflow attack vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears this is just a wrapper over datatype querying functions, so unfortunately the safe thing to do might be to just pass C_NULL
pointer for the names (which the function implementation shows should just skip over filling), and then to do the loop of h5t_get_member_name
over the table dataset's datatype ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point. I don't know what we can do other than perhaps loop through and call H5Tget_member_name
to manually determine the required buffer size? At which point we're just reimplementing this function?
Heck even their example is also susceptible to overflow:
https://confluence.hdfgroup.org/display/HDF5/H5TB_GET_FIELD_INFO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm kind of disappointed with both the documentation and implementation of this function :-\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I updated this PR to avoid the buffer attack, by computing the column names via h5t_get_member_name
This is now ready for review. |
Sans objections I plan to merge soon |
Co-authored-by: jmert <2965436+jmert@users.noreply.github.com>
Methods from #511