pointerof an instance variable in an initialiser does not return the final pointer #1685

philpax · 2015-10-04T03:06:46Z

struct Test
    def initialize()
      @test = StaticArray(UInt32, 4)
      p pointerof(@test)
    end

    def array_ptr()
      p pointerof(@test)
    end
end

test = Test.new
test.array_ptr

results in

Pointer(StaticArray(UInt32, 4):Class)@0xbfa6b750
Pointer(StaticArray(UInt32, 4):Class)@0xbfa6b8c8

From what I can see from the LL dump, this is a result of the struct being constructed in one memory location, and then relocated elsewhere. My current workaround for this is calling a function after the initialiser, which works as expected.

Fixing this may require changes in how variables are constructed, and thus may not be feasible. I suggest that a warning be emitted for the use of pointerof on an instance variable in a constructor; this would expose any latent issues.

The text was updated successfully, but these errors were encountered:

denisdefreyne · 2015-10-04T10:16:18Z

This also happens with self (just got bit by this bug):

class Foo
  def initialize
    self2 = self
    puts "pointerof(self) = #{pointerof(self2)}"
  end
end

foo = Foo.new
puts "pointerof(foo)  = #{pointerof(foo)}"

prints:

pointerof(self) = Pointer(Foo)@0x7fff5d3114c8
pointerof(foo)  = Pointer(Foo)@0x7fff5d311748

…14c8 ≠ …1748.

asterite · 2015-10-04T14:36:09Z

The first might be a bug, but it would be interesting to know your use case. Since a struct is passed by value, I wouldn't rely on pointers to it, because they can quickly become invalid.

The second case is not a bug: self and self2 are two different variables, their memory address is different.

philpax · 2015-10-04T14:43:43Z

My actual code looks something like

struct Test
    def initialize()
      @test = StaticArray(UInt32, 4)
      external_fn_taking_pointer(@test.to_unsafe)
    end
end

struct Test2
    def initialize()
      @test = Test.new
    end
end

test2 = Test2.new

which I would expect to produce a consistent memory address (as the array is a member variable of another member variable)

For comparison, the equivalent C++ works as expected:

#include <cstdio>
#include <cstdint>

struct Test
{
  uint32_t test[4];
  Test()
  {
    printf("%x\n", test);
  }
};

struct Test2
{
  Test test;
};

int main()
{
  Test2 test;
  printf("%x\n", test.test.test);
  return 0;
}

results in

bfab0e40
bfab0e40

ozra · 2015-10-04T20:51:52Z

Sounds a bit strange that structures of any kind should be "inited in one place" and then copied - if that happens for every type instantiation it would be a performance drag.

sdogruyol · 2018-04-13T13:09:58Z

As of 0.24.2 this use-case is not valid. We now enforce the type of the instance variable when initializing.

Error in test.cr:12: instantiating 'Test:Class#new()'

test = Test.new
            ^~~

in test.cr:3: Can't infer the type of instance variable '@test' of Test

The type of a instance variable, if not declared explicitly with
`@test : Type`, is inferred from assignments to it across
the whole program.

The assignments must look like this:

  1. `@test = 1` (or other literals), inferred to the literal's type
  2. `@test = Type.new`, type is inferred to be Type
  3. `@test = Type.method`, where `method` has a return type
     annotation, type is inferred from it
  4. `@test = arg`, with 'arg' being a method argument with a
     type restriction 'Type', type is inferred to be Type
  5. `@test = arg`, with 'arg' being a method argument with a
     default value, type is inferred using rules 1, 2 and 3 from it
  6. `@test = uninitialized Type`, type is inferred to be Type
  7. `@test = LibSome.func`, and `LibSome` is a `lib`, type
     is inferred from that fun.
  8. `LibSome.func(out @test)`, and `LibSome` is a `lib`, type
     is inferred from that fun argument.

Other assignments have no effect on its type.

Can't infer the type of instance variable '@test' of Test

      @test = StaticArray(UInt32, 4)
      ^~~~~

Sija · 2018-04-13T13:30:50Z

@sdogruyol Issue is still valid, see code below:

struct Test
  def initialize
    @test = StaticArray(UInt32, 4).new(10_u32)
    p pointerof(@test)
  end

  def array_ptr
    p pointerof(@test)
  end
end

test = Test.new
test.array_ptr

asterite · 2018-07-20T15:54:40Z

I would change this from bug to feature.

RX14 · 2018-07-21T10:20:46Z

agreed, this isn't entirely unexpected given struct semantics

lugia-kun · 2020-08-29T12:07:19Z

I don't understand why copy is required at returning from #new. I think it should be able to handle #new like following on generating code because #initialize guarantees that all instance variables are set in some way.

struct A
  @foo : Int32
  @bar : Pointer(Int32)

  def initialize(@foo)
    @bar = pointerof(@foo)
  end
end

a = uninitialized A
a.initialize(100)

Which extra process is there before or after calling #initialize?
Which kind of #initialize process does need a copy?
Is it just because there is =?

I'm worrying that a struct is (implicitly) copied.

I would like to have an option to disallow any copy operation to a defined struct, like C++ can do this with putting the specific constructor declaration into private.

I (and some people) expect that struct is fixed location and fixed duration of lifetime. This semantic is useful to generate efficient code and sometimes required to call underlying C library, without loosing the experience of the user of wrapped library. For example, considering that the author of C library suggests to their function foo to pass local resource:

void user_func(...)
{
  struct X data;
  Y *ptr;
  ptr = foo(&data, ...);
  
  /* play with `ptr`, but actually read or write to `data` */
  bar(ptr, ...);
  baz(ptr, ...);
}

Although the content of data is public, no C programmer wants to copy data because it does not make sense.

To implement this in Crystal we need to write:

lib LibFoo
  struct X
     # omit
  end
  alias Y = Void

  fun foo(X*, ...) : Y*
  fun bar(Y*, ...) : Void
  fun baz(Y*, ...) : Void
end

def user_func(...)
  data = uninitialized LibFoo::X
  ptr = LibFoo.foo(pointerof(data), ...)
  
  # play with `ptr`
  LibFoo.bar(ptr, ...)
  LibFoo.baz(ptr, ...)

  # Is it guaranteed to `data` exists until here?
end

Developers who use Crystal consider this is unsafe and unwant to do. So I have to put this safe without loosing flexibility on calling bar, baz and so on.

About preferable usage of `pointerof` to use a C library

Before putting data into a struct, considering this:

struct Y
  @ptr : Pointer(LibFoo::Y)

  def initialize(@ptr)
  end

  def bar(...)
    LibFoo.bar(@ptr, ...)
  end

  def baz(...)
    LibFoo.baz(@ptr, ...)
  end
end

def bind_to_Y(..., &)
  data = uninitialized LibFoo::X
  ptr = Y.new(LibFoo.foo(pointerof(data), ...)
  ret = yield ptr
  data
  ret
end

# with this, a user can:

def user_func(...)
   bind_to_Y(...) do |y|
     # play with `y`.
     y.bar(...)
     y.baz(...)
   end
end

This looks safe but actually not safe.

Operation	Safety
Passing to an argument of another method	Safe if the calling method does not do any of followings
Copying to another local variable in the block	Safe
Passing to nested block	Unsafe unless executed inside this "scope"
Copying to a captured variable	Can be Unsafe
Returning `y` from block	Can be Unsafe
Passing to `raise`	Unsafe (because one can implement this with `longjmp`)
...	...

Moreover, we have to nest to use multiple objects at once. This is inconvenient.

def user_func(...)
  bind_to_Y(...) do |y|
    bind_to_Y(...) do |z|
      # play with `y` and `z`, `y` gets copied (right?).
      y.bar(...)
      z.baz(...)
      y.baz(...)
    end
  end
end

Putting `data` to an instance variable of struct to manage in pair

Because data and ptr is paired together, I may put it to a single struct:

struct Y
  @data : LibFoo::X
  @ptr : Pointer(LibFoo::Y)
  @refdata : ...

  def initialize(@refdata)
    @data = uninitialized LibFoo::X
    @ptr = LibFoo.foo(pointerof(@data), @refdata)
  end

  def bar(...)
    LibFoo.bar(@ptr, ...)
  end

  def baz(...)
    LibFoo.baz(@ptr, ...)
  end
end

# with this, a user can:

def user_func(...)
   y = Y.new(...)
   # play with `y`.
   y.bar(...)
   y.baz(...)
end

This is not work as expected because it is copied when return from #new.

This is the best semantic of providing the experience of LibFoo, however, it is still unsafe:

Operation	Safety
Passing to an argument of another method	Safe but the passed value still points the original variable
Copying to another local variable in the block	Safe but it still points the original variable
Passing to nested block	Unsafe unless executed inside this "scope", also it still points the original variable
Copying to a captured variable	Can be Unsafe, also it still points the original variable
Returning `y` from method	Unsafe
Passing to `raise`	Unsafe (because one can implement this with `longjmp`)
...	...

Workaround of above one

Since above does not work, we can make it to do delayed initialization:

struct Y
  @data : LibFoo::X
  @ptr : Pointer(LibFoo::Y)
  @refdata : ...

  def initialize(@refdata)
    @data = uninitialized LibFoo::X
    @ptr = Pointer(LibFoo::Y).null
  end

  def bar(...)
    LibFoo.bar(self, ...)
  end

  def baz(...)
    LibFoo.baz(self, ...)
  end

  def to_unsafe
    if @ptr.null?
      @ptr = LibFoo.foo(pointerof(@data), @refdata)
    end
    @ptr
  end
end

# with this, a user can:

def user_func(...)
   y = Y.new(...)
   # play with `y`.
   y.bar(...)
   y.baz(...)
end

In this case, LibFoo.foo will be called multiple times, if it has been copied before the calling one of these method.

  y = Y.new(...)
  z = y
  z.bar(...) # LibFoo.foo will be called for `z`
  y.baz(...) # LibFoo.foo will be called for `y`

Any copying before calling its methods is safe, but copying after calling any of its methods is possibly unsafe in same condition described above. This is too complicated.

Conclusion

We need a temporary object to bridge the user of binding library and underlying C library, but we don't want to fill the sea of heap memory with the garbage of temporary object, which can be used many many times. However, currently we can not handle a pointer from C library with a struct gracefully. Because it is freely and implicitly copied one to another. Disallowing copy can let the user of binding library to use the struct only in safe way, without loosing efficiency.

If you want a real world use case, here it is.

I have ideas of the methods to disallow copying:

Having protected or private #dup
- Same to do with Reference. (see below)
- Value#dup becomes an "special method" which makes a copy
- a = b where b is a Value should be interpreted as a = b.dup
- a = A.new where A is a Struct class can be interpreted as a = A.new.dup
Having protected or private #initialize(another : self)
- Same as C++.
- a = b where b is a Value should be interpreted as a = b.class.new(b)
- a = A.new where A is a Struct class never involves copy.
Special annotation to a struct
- The user of an annotated struct can never "monkey patch" to reenable copy operation of the struct.

N.B.

To protect Reference from copying is easy. Because it requires #dup to make a copy, so just put #dup to protected or private:

class A
  protected def dup
    self
  end
end

a = A.new
b = a.dup

Showing last frame. Use --error-trace for full trace.

error in line 8
Error: protected method 'dup' called for A

@asterite suggested to disallow to use pointerof to instance variable of struct in #9700. The language reference says using pointerof is (always) unsafe. If so, just disallowing it does not make sense, I think. Any invalid usage of pointerof should not be responsible to Crystal's developers, however you should provide the detailed specification such as "when returned pointer goes invalid". This should be guaranteed by language in some way.

(ex.)

Source of pointer	Guaranteed Lifetime
Any `Reference` type	Until no variable references it
Any instance variables of `Value` type in `Reference` (`class`)	Until no variable references the container of it
Any instance variables of `Value` type in `#initialize` of `Value` (`struct`)	Until return from the method
Any instance variables of `Value` type in any other method of `Value` (`struct`)	Until return from the caller's scope
Any local variable of `Value` type	Until return from the current scope
Any captured variable of `Value` type	Until return from the block
Any constant variable of `Value` type	Forever
(Non Interpolated) String literal	Forever

(This table is not true. This depends how the language guarantees (i.e., decided by Crystal's developer))

asterite · 2020-08-29T12:23:34Z

It works like that because of how new is defined, and how structs are passed by value.

@waj was working on improving this situation but it's probably something hard to change

jhass added kind:bug topic:compiler labels Oct 4, 2015

sdogruyol closed this as completed Apr 13, 2018

RX14 reopened this Apr 13, 2018

RX14 added kind:feature and removed kind:bug labels Jul 21, 2018

asterite mentioned this issue Aug 26, 2020

Copying at return from Struct.new can cause different behavior between --debug and --release builds #9700

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pointerof an instance variable in an initialiser does not return the final pointer #1685

pointerof an instance variable in an initialiser does not return the final pointer #1685

philpax commented Oct 4, 2015

denisdefreyne commented Oct 4, 2015

asterite commented Oct 4, 2015

philpax commented Oct 4, 2015

ozra commented Oct 4, 2015

sdogruyol commented Apr 13, 2018

Sija commented Apr 13, 2018

asterite commented Jul 20, 2018

RX14 commented Jul 21, 2018

lugia-kun commented Aug 29, 2020

asterite commented Aug 29, 2020

pointerof an instance variable in an initialiser does not return the final pointer #1685

pointerof an instance variable in an initialiser does not return the final pointer #1685

Comments

philpax commented Oct 4, 2015

denisdefreyne commented Oct 4, 2015

asterite commented Oct 4, 2015

philpax commented Oct 4, 2015

ozra commented Oct 4, 2015

sdogruyol commented Apr 13, 2018

Sija commented Apr 13, 2018

asterite commented Jul 20, 2018

RX14 commented Jul 21, 2018

lugia-kun commented Aug 29, 2020

I'm worrying that a struct is (implicitly) copied.

About preferable usage of pointerof to use a C library

Putting data to an instance variable of struct to manage in pair

Workaround of above one

Conclusion

N.B.

asterite commented Aug 29, 2020

About preferable usage of `pointerof` to use a C library

Putting `data` to an instance variable of struct to manage in pair