New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it expected that native code can modify Ruby Strings passed to it inplace? #1080
Comments
Interestingly on JRuby the mutations to the native memory have no effect on the Ruby String for $ chruby jruby-9.4.5.0
Using jruby-9.4.5.0: jruby 9.4.5.0 (3.1.4) 2023-11-02 1abae2700f OpenJDK 64-Bit Server VM 17.0.8+7 on 17.0.8+7 +jit [x86_64-linux]
$ irb -rffi
irb(main):001:0> module MyLib; extend FFI::Library; ffi_lib 'c'; attach_function :strtok, [ :string, :string ], :pointer; end
=> #<#<Class:0x2d5d001f> address=0x7fd087303680 size=0>
irb(main):002:0> s = "hello"
=> "hello"
irb(main):003:0> MyLib.strtok(s, "e")
=> #<FFI::Pointer address=0x7fd084001000>
irb(main):004:0> s
=> "hello"
$ irb -rffi
irb(main):001:0> module MyLib; extend FFI::Library; ffi_lib 'c'; attach_function :strtok, [ :pointer, :string ], :pointer; end
=> #<#<Class:0x7e514482> address=0x7fccb9e26680 size=0>
irb(main):002:0> s = "hello"
=> "hello"
irb(main):003:0> MyLib.strtok(s, "e")
=> #<FFI::Pointer address=0x7fccb87f90f0>
irb(main):004:0> s
=> "h\u0000llo" |
IOW it comes done to whether the
So that resolves the question. There is still a bug of FFI on CRuby though that it doesn't invalidate the coderange around a FFI call when passed a String to a :pointer argument. i.e. the 3rd example in the issue description. To fix that I think |
…f a FFI call * See #3293 (comment) and ffi/ffi#1080 * As that could cause extra conversions to managed later on.
…f a FFI call * See #3293 (comment) and ffi/ffi#1080 * As that could cause extra conversions to managed later on.
Nobody should be modifying a String passed through FFI and expecting the original to reflect those changes. The docs make that clear: it should be treated as a A new type for FFI that passes strings by reference, or like a pointer, would be the way to support mutability (or just use a pointer type today).
Nothing is copied for pointer; it's just a pointer to a block of native memory that can be mutated in native code or from the managed runtime. it is expected that changes on one side are immediately reflected on the other, without copying. RubyString may have a native version soon but it has not been a priority for us (and we would probably also dup when passing to FFI). Note that any string dup should not be done in code shared across runtimes; that would essentially result in double-copying for all strings on JRuby and for managed strings on TruffleRuby. The dup should be done in code that is specifically preparing the Ruby string for a call into native code. |
@headius Makes sense. I don't understand this bit though:
How can JRuby pass a Ruby String to native code with a |
Ah, I thought you meant a pointer allocated through FFI, not passing a String into a pointer. Yes, we copy out and in for that case. |
I don't think this should be changed, given that the
Adding Can we close this issue? |
Given the comment for the
I read that multiple times as "if you want to mutate the string from C, use Also I think I read the wiki back before the 18 July 2020 change: https://github.com/ffi/ffi/wiki/Types/_compare/dea572a4f062df5a3e296b7dd91bc0ae0ff16d22...2e2903e5f8af957e7fe59e83c073f65a2a633e56:
So maybe I remembered that part, which seems to clearly indicate a String + I am worried there is code out there relying on mutations to the Strings being kept after returns from C to Ruby, because the docs used to basically say this is fine. I wonder if the FFI test suite might even test that. If we want to go the road of If instead we want to let |
I was curious and used |
Ruby 3.3.0 FFI 1.16.3
It happens both for the
:string
and:pointer
types:This also makes it possible to break the code range of Ruby Strings, for example:
I found this by looking at a performance issue in TruffleRuby: oracle/truffleruby#3293 (comment)
On TruffleRuby the Ruby String passed to a FFI call needs to be converted from a byte[] to a native String (char*), like RSTRING_PTR() in C extensions.
That works and converting to native memory is anyway necessary.
But because of the semantics above this conversion needs to also change that Ruby String object to also use that native memory, otherwise it wouldn't reflect changes like above. And that can have really bad performance effects like in the linked issue when the same String is then matched against a Regexp and that needs a byte[] again, so then big strings go back and forth between native and managed byte[] memory which is really slow.
I think it would be better to change FFI to always
.dup
the String passed to native, so if that memory is changed it does not affect the original String.OTOH using a String as some kind of native buffer can be useful too, but maybe this should only be done with a
:buffer
type. Maybe for:pointer
too. But for:string
it feels wrong.The text was updated successfully, but these errors were encountered: