safety improvement to Memory #167

vtjnash · 2024-02-02T04:02:41Z

While Memory was in theory a slightly safer wrapper for unsafe_read / unsafe_write operations, it doesn't appear to have every needed to be used that way, and ways just took in Vector{UInt8}. That seems safer to keep it that way therefore.

Closes #125

We could preserve the ptr field also, but seems unlikely to make a difference in practice and adds some mildly annoying duplication of storage bytes. In Julia v1.11, this field could be replaced in whole or in part by a Memory{UInt8} object allocation, but currently `jl_string_to_genericmemory` is not even exposed as an API in Base.

nhz2 · 2024-02-02T20:28:32Z

If I want to use the "transcoding protocol" to decompress data directly into a Matrix{Float64} I would use unsafe_wrap to convert the matrix to a Vector{UInt8} in this new version of the API? I don't fully understand what the warning in the docstring of unsafe_wrap means and if this is safe to do:

the programmer is responsible also for ensuring that the
underlying data is not accessed through two arrays of different element
type, similar to the strict aliasing rule in C.

vtjnash · 2024-02-02T23:34:55Z

It is generally not a good idea to use unsafe_wrap on existing julia memory. It is really primarily for C memory only.

However, the IO protocol in Base is mostly based around calling unsafe_load/unsafe_store!/unsafe_copyto! because of the reasons you stated, about wanting to do IO directly to and from arbitrary types. The Memory representation might even be an okay way to represent doing that, as a means of creating a fat pointer for adding memory-safety to IO. But it didn't seem like this package was testing any of that.

vtjnash · 2024-02-03T05:27:03Z

But anyways, that documentation is sort of a lie there, since that is not really how anything seems to be implemented here. Nor would it be even be that sensible to implement that way. Everything is required to go through the Buffer object (a Vector) and only that object ever gets converted to Memory (basically, a slower, less-safe view of said Buffer). For example:

TranscodingStreams.jl/src/stream.jl

Lines 681 to 697 in d7e8370

    
           function readdata!(input::IO, output::Buffer) 
        
               if input isa TranscodingStream && input.state.buffer1 === output 
        
                   # Delegate the operation to the underlying stream for shared buffers. 
        
                   return fillbuffer(input) 
        
               end 
        
               nread::Int = 0 
        
               navail = bytesavailable(input) 
        
               if navail == 0 && marginsize(output) > 0 && !eof(input) 
        
                   nread += writebyte!(output, read(input, UInt8)) 
        
                   navail = bytesavailable(input) 
        
               end 
        
               n = min(navail, marginsize(output)) 
        
               GC.@preserve output Base.unsafe_read(input, marginptr(output), n) 
        
               supplied!(output, n) 
        
               nread += n 
        
               return nread 
        
           end

or c.f. sloweof implementation

vtjnash added 2 commits February 1, 2024 22:36

make more optimized and avoid copying input Strings

4602903

vtjnash marked this pull request as draft February 2, 2024 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

safety improvement to Memory #167

safety improvement to Memory #167

vtjnash commented Feb 2, 2024

nhz2 commented Feb 2, 2024

vtjnash commented Feb 2, 2024

vtjnash commented Feb 3, 2024 •

edited

safety improvement to Memory #167

Are you sure you want to change the base?

safety improvement to Memory #167

Conversation

vtjnash commented Feb 2, 2024

nhz2 commented Feb 2, 2024

vtjnash commented Feb 2, 2024

vtjnash commented Feb 3, 2024 • edited

vtjnash commented Feb 3, 2024 •

edited