Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pointerof an instance variable in an initialiser does not return the final pointer #1685

Open
philpax opened this issue Oct 4, 2015 · 10 comments

Comments

@philpax
Copy link
Contributor

philpax commented Oct 4, 2015

struct Test
    def initialize()
      @test = StaticArray(UInt32, 4)
      p pointerof(@test)
    end

    def array_ptr()
      p pointerof(@test)
    end
end

test = Test.new
test.array_ptr

results in

Pointer(StaticArray(UInt32, 4):Class)@0xbfa6b750
Pointer(StaticArray(UInt32, 4):Class)@0xbfa6b8c8

From what I can see from the LL dump, this is a result of the struct being constructed in one memory location, and then relocated elsewhere. My current workaround for this is calling a function after the initialiser, which works as expected.

Fixing this may require changes in how variables are constructed, and thus may not be feasible. I suggest that a warning be emitted for the use of pointerof on an instance variable in a constructor; this would expose any latent issues.

@denisdefreyne
Copy link
Contributor

This also happens with self (just got bit by this bug):

class Foo
  def initialize
    self2 = self
    puts "pointerof(self) = #{pointerof(self2)}"
  end
end

foo = Foo.new
puts "pointerof(foo)  = #{pointerof(foo)}"

prints:

pointerof(self) = Pointer(Foo)@0x7fff5d3114c8
pointerof(foo)  = Pointer(Foo)@0x7fff5d311748

…14c8 ≠ …1748.

@asterite
Copy link
Member

asterite commented Oct 4, 2015

The first might be a bug, but it would be interesting to know your use case. Since a struct is passed by value, I wouldn't rely on pointers to it, because they can quickly become invalid.

The second case is not a bug: self and self2 are two different variables, their memory address is different.

@philpax
Copy link
Contributor Author

philpax commented Oct 4, 2015

My actual code looks something like

struct Test
    def initialize()
      @test = StaticArray(UInt32, 4)
      external_fn_taking_pointer(@test.to_unsafe)
    end
end

struct Test2
    def initialize()
      @test = Test.new
    end
end

test2 = Test2.new

which I would expect to produce a consistent memory address (as the array is a member variable of another member variable)

For comparison, the equivalent C++ works as expected:

#include <cstdio>
#include <cstdint>

struct Test
{
  uint32_t test[4];
  Test()
  {
    printf("%x\n", test);
  }
};

struct Test2
{
  Test test;
};

int main()
{
  Test2 test;
  printf("%x\n", test.test.test);
  return 0;
}

results in

bfab0e40
bfab0e40

@ozra
Copy link
Contributor

ozra commented Oct 4, 2015

Sounds a bit strange that structures of any kind should be "inited in one place" and then copied - if that happens for every type instantiation it would be a performance drag.

@sdogruyol
Copy link
Member

As of 0.24.2 this use-case is not valid. We now enforce the type of the instance variable when initializing.

Error in test.cr:12: instantiating 'Test:Class#new()'

test = Test.new
            ^~~

in test.cr:3: Can't infer the type of instance variable '@test' of Test

The type of a instance variable, if not declared explicitly with
`@test : Type`, is inferred from assignments to it across
the whole program.

The assignments must look like this:

  1. `@test = 1` (or other literals), inferred to the literal's type
  2. `@test = Type.new`, type is inferred to be Type
  3. `@test = Type.method`, where `method` has a return type
     annotation, type is inferred from it
  4. `@test = arg`, with 'arg' being a method argument with a
     type restriction 'Type', type is inferred to be Type
  5. `@test = arg`, with 'arg' being a method argument with a
     default value, type is inferred using rules 1, 2 and 3 from it
  6. `@test = uninitialized Type`, type is inferred to be Type
  7. `@test = LibSome.func`, and `LibSome` is a `lib`, type
     is inferred from that fun.
  8. `LibSome.func(out @test)`, and `LibSome` is a `lib`, type
     is inferred from that fun argument.

Other assignments have no effect on its type.

Can't infer the type of instance variable '@test' of Test

      @test = StaticArray(UInt32, 4)
      ^~~~~

@Sija
Copy link
Contributor

Sija commented Apr 13, 2018

@sdogruyol Issue is still valid, see code below:

struct Test
  def initialize
    @test = StaticArray(UInt32, 4).new(10_u32)
    p pointerof(@test)
  end

  def array_ptr
    p pointerof(@test)
  end
end

test = Test.new
test.array_ptr

@RX14 RX14 reopened this Apr 13, 2018
@asterite
Copy link
Member

I would change this from bug to feature.

@RX14
Copy link
Contributor

RX14 commented Jul 21, 2018

agreed, this isn't entirely unexpected given struct semantics

@lugia-kun
Copy link

I don't understand why copy is required at returning from #new. I think it should be able to handle #new like following on generating code because #initialize guarantees that all instance variables are set in some way.

struct A
  @foo : Int32
  @bar : Pointer(Int32)

  def initialize(@foo)
    @bar = pointerof(@foo)
  end
end

a = uninitialized A
a.initialize(100)

Which extra process is there before or after calling #initialize?
Which kind of #initialize process does need a copy?
Is it just because there is =?

I'm worrying that a struct is (implicitly) copied.

I would like to have an option to disallow any copy operation to a defined struct, like C++ can do this with putting the specific constructor declaration into private.

I (and some people) expect that struct is fixed location and fixed duration of lifetime. This semantic is useful to generate efficient code and sometimes required to call underlying C library, without loosing the experience of the user of wrapped library. For example, considering that the author of C library suggests to their function foo to pass local resource:

void user_func(...)
{
  struct X data;
  Y *ptr;
  ptr = foo(&data, ...);
  
  /* play with `ptr`, but actually read or write to `data` */
  bar(ptr, ...);
  baz(ptr, ...);
}

Although the content of data is public, no C programmer wants to copy data because it does not make sense.

To implement this in Crystal we need to write:

lib LibFoo
  struct X
     # omit
  end
  alias Y = Void

  fun foo(X*, ...) : Y*
  fun bar(Y*, ...) : Void
  fun baz(Y*, ...) : Void
end

def user_func(...)
  data = uninitialized LibFoo::X
  ptr = LibFoo.foo(pointerof(data), ...)
  
  # play with `ptr`
  LibFoo.bar(ptr, ...)
  LibFoo.baz(ptr, ...)

  # Is it guaranteed to `data` exists until here?
end

Developers who use Crystal consider this is unsafe and unwant to do. So I have to put this safe without loosing flexibility on calling bar, baz and so on.

About preferable usage of pointerof to use a C library

Before putting data into a struct, considering this:

struct Y
  @ptr : Pointer(LibFoo::Y)

  def initialize(@ptr)
  end

  def bar(...)
    LibFoo.bar(@ptr, ...)
  end

  def baz(...)
    LibFoo.baz(@ptr, ...)
  end
end

def bind_to_Y(..., &)
  data = uninitialized LibFoo::X
  ptr = Y.new(LibFoo.foo(pointerof(data), ...)
  ret = yield ptr
  data
  ret
end

# with this, a user can:

def user_func(...)
   bind_to_Y(...) do |y|
     # play with `y`.
     y.bar(...)
     y.baz(...)
   end
end

This looks safe but actually not safe.

Operation Safety
Passing to an argument of another method Safe if the calling method does not do any of followings
Copying to another local variable in the block Safe
Passing to nested block Unsafe unless executed inside this "scope"
Copying to a captured variable Can be Unsafe
Returning y from block Can be Unsafe
Passing to raise Unsafe (because one can implement this with longjmp)
... ...

Moreover, we have to nest to use multiple objects at once. This is inconvenient.

def user_func(...)
  bind_to_Y(...) do |y|
    bind_to_Y(...) do |z|
      # play with `y` and `z`, `y` gets copied (right?).
      y.bar(...)
      z.baz(...)
      y.baz(...)
    end
  end
end

Putting data to an instance variable of struct to manage in pair

Because data and ptr is paired together, I may put it to a single struct:

struct Y
  @data : LibFoo::X
  @ptr : Pointer(LibFoo::Y)
  @refdata : ...

  def initialize(@refdata)
    @data = uninitialized LibFoo::X
    @ptr = LibFoo.foo(pointerof(@data), @refdata)
  end

  def bar(...)
    LibFoo.bar(@ptr, ...)
  end

  def baz(...)
    LibFoo.baz(@ptr, ...)
  end
end

# with this, a user can:

def user_func(...)
   y = Y.new(...)
   # play with `y`.
   y.bar(...)
   y.baz(...)
end

This is not work as expected because it is copied when return from #new.

This is the best semantic of providing the experience of LibFoo, however, it is still unsafe:

Operation Safety
Passing to an argument of another method Safe but the passed value still points the original variable
Copying to another local variable in the block Safe but it still points the original variable
Passing to nested block Unsafe unless executed inside this "scope", also it still points the original variable
Copying to a captured variable Can be Unsafe, also it still points the original variable
Returning y from method Unsafe
Passing to raise Unsafe (because one can implement this with longjmp)
... ...

Workaround of above one

Since above does not work, we can make it to do delayed initialization:

struct Y
  @data : LibFoo::X
  @ptr : Pointer(LibFoo::Y)
  @refdata : ...

  def initialize(@refdata)
    @data = uninitialized LibFoo::X
    @ptr = Pointer(LibFoo::Y).null
  end

  def bar(...)
    LibFoo.bar(self, ...)
  end

  def baz(...)
    LibFoo.baz(self, ...)
  end

  def to_unsafe
    if @ptr.null?
      @ptr = LibFoo.foo(pointerof(@data), @refdata)
    end
    @ptr
  end
end

# with this, a user can:

def user_func(...)
   y = Y.new(...)
   # play with `y`.
   y.bar(...)
   y.baz(...)
end

In this case, LibFoo.foo will be called multiple times, if it has been copied before the calling one of these method.

  y = Y.new(...)
  z = y
  z.bar(...) # LibFoo.foo will be called for `z`
  y.baz(...) # LibFoo.foo will be called for `y`

Any copying before calling its methods is safe, but copying after calling any of its methods is possibly unsafe in same condition described above. This is too complicated.

Conclusion

We need a temporary object to bridge the user of binding library and underlying C library, but we don't want to fill the sea of heap memory with the garbage of temporary object, which can be used many many times. However, currently we can not handle a pointer from C library with a struct gracefully. Because it is freely and implicitly copied one to another. Disallowing copy can let the user of binding library to use the struct only in safe way, without loosing efficiency.

If you want a real world use case, here it is.

I have ideas of the methods to disallow copying:

  • Having protected or private #dup
    • Same to do with Reference. (see below)
    • Value#dup becomes an "special method" which makes a copy
    • a = b where b is a Value should be interpreted as a = b.dup
    • a = A.new where A is a Struct class can be interpreted as a = A.new.dup
  • Having protected or private #initialize(another : self)
    • Same as C++.
    • a = b where b is a Value should be interpreted as a = b.class.new(b)
    • a = A.new where A is a Struct class never involves copy.
  • Special annotation to a struct
    • The user of an annotated struct can never "monkey patch" to reenable copy operation of the struct.

N.B.

To protect Reference from copying is easy. Because it requires #dup to make a copy, so just put #dup to protected or private:

class A
  protected def dup
    self
  end
end

a = A.new
b = a.dup
Showing last frame. Use --error-trace for full trace.

error in line 8
Error: protected method 'dup' called for A

@asterite suggested to disallow to use pointerof to instance variable of struct in #9700. The language reference says using pointerof is (always) unsafe. If so, just disallowing it does not make sense, I think. Any invalid usage of pointerof should not be responsible to Crystal's developers, however you should provide the detailed specification such as "when returned pointer goes invalid". This should be guaranteed by language in some way.

(ex.)

Source of pointer Guaranteed Lifetime
Any Reference type Until no variable references it
Any instance variables of Value type in Reference (class) Until no variable references the container of it
Any instance variables of Value type in #initialize of Value (struct) Until return from the method
Any instance variables of Value type in any other method of Value (struct) Until return from the caller's scope
Any local variable of Value type Until return from the current scope
Any captured variable of Value type Until return from the block
Any constant variable of Value type Forever
(Non Interpolated) String literal Forever

(This table is not true. This depends how the language guarantees (i.e., decided by Crystal's developer))

@asterite
Copy link
Member

It works like that because of how new is defined, and how structs are passed by value.

@waj was working on improving this situation but it's probably something hard to change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants