Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safety improvement to Memory #167

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

safety improvement to Memory #167

wants to merge 2 commits into from

Conversation

vtjnash
Copy link
Contributor

@vtjnash vtjnash commented Feb 2, 2024

While Memory was in theory a slightly safer wrapper for unsafe_read / unsafe_write operations, it doesn't appear to have every needed to be used that way, and ways just took in Vector{UInt8}. That seems safer to keep it that way therefore.

Closes #125

We could preserve the ptr field also, but seems unlikely to make a
difference in practice and adds some mildly annoying duplication of
storage bytes. In Julia v1.11, this field could be replaced in whole or
in part by a Memory{UInt8} object allocation, but currently
`jl_string_to_genericmemory` is not even exposed as an API in Base.
@nhz2
Copy link
Member

nhz2 commented Feb 2, 2024

If I want to use the "transcoding protocol" to decompress data directly into a Matrix{Float64} I would use unsafe_wrap to convert the matrix to a Vector{UInt8} in this new version of the API? I don't fully understand what the warning in the docstring of unsafe_wrap means and if this is safe to do:

the programmer is responsible also for ensuring that the
underlying data is not accessed through two arrays of different element
type, similar to the strict aliasing rule in C.

@vtjnash
Copy link
Contributor Author

vtjnash commented Feb 2, 2024

It is generally not a good idea to use unsafe_wrap on existing julia memory. It is really primarily for C memory only.

However, the IO protocol in Base is mostly based around calling unsafe_load/unsafe_store!/unsafe_copyto! because of the reasons you stated, about wanting to do IO directly to and from arbitrary types. The Memory representation might even be an okay way to represent doing that, as a means of creating a fat pointer for adding memory-safety to IO. But it didn't seem like this package was testing any of that.

@vtjnash vtjnash marked this pull request as draft February 2, 2024 23:35
@vtjnash
Copy link
Contributor Author

vtjnash commented Feb 3, 2024

But anyways, that documentation is sort of a lie there, since that is not really how anything seems to be implemented here. Nor would it be even be that sensible to implement that way. Everything is required to go through the Buffer object (a Vector) and only that object ever gets converted to Memory (basically, a slower, less-safe view of said Buffer). For example:

function readdata!(input::IO, output::Buffer)
if input isa TranscodingStream && input.state.buffer1 === output
# Delegate the operation to the underlying stream for shared buffers.
return fillbuffer(input)
end
nread::Int = 0
navail = bytesavailable(input)
if navail == 0 && marginsize(output) > 0 && !eof(input)
nread += writebyte!(output, read(input, UInt8))
navail = bytesavailable(input)
end
n = min(navail, marginsize(output))
GC.@preserve output Base.unsafe_read(input, marginptr(output), n)
supplied!(output, n)
nread += n
return nread
end
or c.f. sloweof implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why implement Buffer and Memory?
2 participants