Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New offsetof() expression #7566

Closed
malte-v opened this issue Mar 20, 2019 · 31 comments

Comments

Projects
None yet
8 participants
@malte-v
Copy link
Contributor

commented Mar 20, 2019

I thought it would be nice to have a C-style offsetof() macro besides sizeof(), which could be practical when interfacing with C. My implementation currently looks like this:

macro offsetof(path, property_name)
  {%
    type = path.resolve
    properties = type.instance_vars
    offset_expr = "0"
  %}

  {% for current_property in properties %}
    {% if current_property.name == property_name %}
      ({{offset_expr.id}})
    {% end %}
    {% offset_expr = offset_expr + " + sizeof(" + "#{current_property.type.name}" + ")" %}
  {% end %}
end

This seems to work for me, although I don't know whether Crystal performs some sort of packing/alignment which would break this. If so, is there a different way to approach this?

@straight-shoota

This comment has been minimized.

Copy link
Member

commented Mar 20, 2019

It's not safe to calculate member offsets by hand. There is no guarantee that the binary data layout corresponds to the order and size in the type declaration. LLVM can and will move members around.

So only LLVM can tell the actual offset used of a specific ivar, but we can expose this as an intrinsic method like sizeof.

@malte-v

This comment has been minimized.

Copy link
Contributor Author

commented Mar 20, 2019

Thanks for the quick clarification! Probably, LLVMOffsetOfElement is what should be used here (it's already in LibLLVM). I would be happy to help implementing this if you point me in the right direction. This requires introducing a new AST node, right? And if so, what's the process of creating one?

@malte-v malte-v changed the title offsetof() macro New offsetof() expression Mar 20, 2019

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 20, 2019

Before adding any new feature to the standard library or the language we should answer the question of why this is needed. Why do you need the offsetof function?

@malte-v

This comment has been minimized.

Copy link
Contributor Author

commented Mar 20, 2019

@asterite Concrete use case:
I'm currently getting into Vulkan (the graphics API) and I'm using Crystal since it's my programming language of choice. At some point you will need tell Vulkan how to pass data from the main program (written in Crystal) to some shader (written in GLSL/HLSL/...). In order to do that, you must define a data structure which contains all per-vertex data. A minimalistic example would be:

struct Vertex
  property pos : Vec2
  property color : Vec3

  def initialize(@pos = Vec2.zero, @color = Vec3.zero)
  end
end

Then, you fill in a structure containing the information on how this data is laid out:

struct Vertex
  # ...

  # this is just for completeness, see below for the relevant part
  def self.get_binding_description
    description = Vk::VertexInputBindingDescription.new
    description.binding = 0
    description.stride = sizeof(Vertex)
    description.input_rate = Vk::VertexInputRate::Vertex

    return description
  end

  def self.get_attribute_descriptions
    descriptions = uninitialized StaticArray(Vk::VertexInputAttributeDescription, 2)

    # this is for the position
    descriptions[0].binding = 0    # you can ignore
    descriptions[0].location = 0   # these two
    descriptions[0].format = Vk::Format::R32G32Sfloat # this basically means "two 32-bit floats"
    descriptions[0].offset = offsetof(Vertex, pos)  # uh-oh... how should I do this?

    # this is for the color
    descriptions[1].binding = 0
    descriptions[1].location = 1
    descriptions[1].format = Vk::Format::R32G32B32Sfloat # this means "three 32-bit floats"
    descriptions[1].offset = offsetof(Vertex, color)

    descriptions
  end
end

Without considering hard-coding the offsets an option, I don't think there is a different way to solve this problem. The use case might be rare, but when you're in a situation like above, you're pretty much stuck. I think seamless C interop is one of Crystal's major selling points, and sometimes you need to deal with numerical offsets. I can understand you don't want to "pollute" the language with features that are rarely used, but this simply cannot be implemented without modifying the compiler.

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 20, 2019

I think the usecase is valid 👍

We can probably go with offsetof(Type, @var) and I guess it could work with both classes and structs, though for classes it might not be that useful. And error on primitive types and such.

@jhass

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

Maybe offsetof(Type.@var), leveraging property access syntax (var.@property) to reduce confusion whether @var refers to Type's or some in the current scope?

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

@jhass It's a good idea. But since this is mostly to interact with C libraries I'd prefer to use the same syntax as C.

@malte-v

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2019

Ok, so I started working on this yesterday and it looks like we have a somehow working implementation to begin with: malte-v@384e26a

Mostly I just looked up how sizeof works and added something equivalent for offsetof. Here's how it looks like:

struct Foo
  @x = 0_i64
  @y = 34_i32
end

foo = Foo.new

puts offsetof(typeof(foo), @y) #=> 8

Currently, my implementation just expands all OffsetOf nodes to NumberLiterals in the main semantic visitor by calling LLVMOffsetOfElement. I'm not entirely sure whether this is good practice, because from my understanding Crystal has two different ways of handling sizeof expressions:

  • If the type given to sizeof is not a typeof expression, it's simply expanded to a NumberLiteral by calling LLVMGetSizeOfTypeInBits which returns a UInt64
  • Otherwise, the sizeof expression is translated into some kind of LLVM-IR-specific sizeof expression using LLVMSizeOf, which returns a ValueRef (I think so?)

Do we also need this distinction with offsetof? I'm wondering because it looks like my current implementation works fine with typeof. If we also need the second variant for offsetof, we could probably generate a getelementptr, right?

Anyways, I wanted to say thank you for keeping the code clean and easy to understand so even newcomers con contribute!

@RX14

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

LLVM cannot and will not reorder struct members. It doesn't even do padding. So this can probably be done with the exact same logic as your original macro.

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

@malte-v If you have sizeof(typeof(something)) then you can't compute the type right away because the type of something might change. For example:

a = 1
loop do
  sizeof(typeof(a))
  a = "hello"
end

In the first pass the compiler will infer that a is an Int32 and so typeof(a) will be Int32. However once the loop is analyzed and after seeing a = "hello" the compiler will conclude that a is Int32 | String and so the previous type of typeof wasn't the true, final one.

That's why in offsetof if you have typeof you can't compute the size right away. You have to wait for the types to settle. I think we shouldn't be using LLVMSizeOf because once types settle we can compute it using LLVMGetSizeOfTypeInBits, but that would be an improvement.

@malte-v

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2019

@RX14 LLVM does padding:

struct Foo
  @x = 0_i64
  @y = 34_i8
  @z = 5_i32
end

puts "real size: " + sizeof(Foo).to_s
puts "real x offset: " + offsetof(Foo, @x).to_s
puts "real y offset: " + offsetof(Foo, @y).to_s
puts "real z offset: " + offsetof(Foo, @z).to_s

puts "wrong size: #{sizeof(Int64) + sizeof(Int8) + sizeof(Int32)}"
puts "wrong z offset: #{sizeof(Int64) + sizeof(Int8)}"

Output:

real size: 16
real x offset: 0
real y offset: 8
real z offset: 12
wrong size: 13
wrong z offset: 9

@asterite I see. Now I'm wondering if we should allow typeof in offsetof at all. Wouldn't it be weird if you don't know the type of an expression but you know the name of one of its instance variables?

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

@malte-v I guess for now we can disallow it until someone needs it. But I can't see why we would disallow it. It can make for more DRY code. Like:

def name_offset(obj)
  offsetof(typeof(obj), @name)
end
@malte-v

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2019

@asterite I agree we shouldn't invest time into this right now. I also added an instance_offsetof expression which works for classes:

class Foo
  @x = 0_i64
  @y = 34_i8
  @z = 5_i32
end   

puts "class sizeof: " + instance_sizeof(Foo).to_s
puts "class offsetof z: " + instance_offsetof(Foo, @z).to_s   


struct Bar
  @x = 0_i64
  @y = 34_i8
  @z = 5_i32
end   

puts "struct sizeof: " + sizeof(Bar).to_s
puts "struct offsetof z: " + offsetof(Bar, @z).to_s   


bar = Bar.new
puts "z by offset: " + Pointer(Int32).new(pointerof(bar).address + offsetof(Bar, @z)).value.to_s

Output:

class sizeof: 24
class offsetof z: 16
struct sizeof: 16
struct offsetof z: 12
z by offset: 5

But why does the class instance take up more space? Another thing I encountered was a transform method in cleanup_transformer.cr which is only defined for InstanceSizeof:

def transform(node : InstanceSizeOf)
  exp_type = node.exp.type?

  if exp_type
    instance_type = exp_type.instance_type.devirtualize
    unless instance_type.class?
      node.exp.raise "#{instance_type} is not a class, it's a #{instance_type.type_desc}"
    end
  end

  if expanded = node.expanded
    return expanded
  end

  node
end

Is this only for checking that the type passed toinstance_sizeof is a class? If so, why isn't this included in MainVisitor?

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

A class is a reference type. Memory is allocated for it and you get a pointer to it, and the pointer is passed around (unlike a struct where the entire struct is passed around, no pointer involved). The sizeof a class type is then the size of a pointer. That's why there's instance_sizeof: to know the size behind that pointer. And it doesn't make sense to ask this for a struct type.

For offsetof there's no point to define two things: it's always the offset of a field relative to the beginning (if it's for a class, that means behind the pointer).

So instance_offsetof can go away.

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

Also a class type always hold a type ID as the first field. Then if you have a class Foo and a class Bar < Foo, Foo is still represented as a pointer and behind it you can know whether it's Foo or Bar at runtime.

@malte-v

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2019

@asterite The only difference between offsetof and instance_offsetof is that instance_offsetof uses llvm_struct_type insteaf of llvm_type:

def size_of(type)
  size = llvm_typer.size_of(llvm_typer.llvm_type(type))
  size = 1 if size == 0
  size
end

def instance_size_of(type)
  llvm_typer.size_of(llvm_typer.llvm_struct_type(type))
end

def offset_of(type, element_index)
  llvm_typer.offset_of(llvm_typer.llvm_type(type), element_index)
end

def instance_offset_of(type, element_index)
  llvm_typer.offset_of(llvm_typer.llvm_struct_type(type), element_index)
end

First I tried to use the original offset_of implementation with classes, which gave me weird results like 140112558951168. Then I tried using llvm_struct_type to construct the LLVM type like instance_sizeof does, and it worked fine. To be honest, I don't know the difference between the two. So what should be done to remove the instance_offsetof?
To me it seemed logical that we need a dedicated instance_offsetof expression since a pointer can't have instance variables and we want to know the offset of some instance variable of a class behind the pointer.

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

You need to use llvm_type if it's a struct, llvm_struct_type if it's a class. You can check this by calling struct? on the type

@malte-v

This comment has been minimized.

Copy link
Contributor Author

commented Mar 25, 2019

I just added specs as following:

require "../../spec_helper"

describe "Semantic: offsetof" do
  it "types offsetof" do
    assert_type("offsetof(String, @length)") { int32 }
  end

  it "errors on undefined instance variable" do
    assert_error "struct Foo; @a = 0; end; offsetof(Foo, @b)", "type Foo doesn't have an instance variable called @b"
  end

  it "errors on typeof inside offsetof expression" do
    assert_error "struct Foo; @a = 0; end; foo = Foo.new; offsetof(typeof(foo), @a)", "can't use typeof inside offsetof expression"
  end

  it "gives error if using offsetof on something that can't have instance variables" do
    assert_error "offsetof(Int32, @foo)", "type Int32 can't have instance variables"
  end

  it "gives error if using offsetof on something that's neither a class nor a struct" do
    assert_error "module Foo; @a = 0; end; offsetof(Foo, @a)", "Foo is neither a class nor a struct, it's a module"
  end

  it "errors on offsetof uninstantiated generic type" do
    assert_error "struct Foo(T); @a = 0; end; offsetof(Foo, @a)", "can't take offsetof element @a of uninstantiated generic type Foo(T)"
  end
end

I'm happy with it right now and I'd say it's ready for review: #7589

@straight-shoota

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

Maybe offsetof(Type.@var), leveraging property access syntax (var.@Property) to reduce confusion whether @var refers to Type's or some in the current scope?

I tending more towards this syntax now. Type.@var makes it clear that this is about the @var ivar of Type. In Type, @var, it's easy to confuse @var with a ivar in the current scope.

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

@straight-shoota How would you combine that with typeof? For example offsetof(typeof(exp), @var).

@konovod

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

I know that Symbols are possibly going to be dropped, but isn't offsetof(Type, :var) better?
Or even Type.offsetof(:var) to make it similar to responds_to??

@malte-v

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2019

@konovod @var, unlike :var, shows that offsetof only works with instance variables.

@straight-shoota

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

@asterite Anything wrong with offsetof(typeof(exp).@var)?

@straight-shoota

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

@konovod I wouldn't want to add such a rarely used method to every type scope.

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

@straight-shoota The logic is a bit harder. You can then have offsetof(foo.bar.@baz) and then you need to separate the foo.bar from @baz and so on. I think making it look like C's syntax is fine.

@asterite

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

What I mean is, I don't think there's a need to invent new syntax for this if the only place this comes from is C and it works fine there, it will be familiar to C users and using a different syntax doesn't seem to give any benefit.

@straight-shoota

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

Yes, let's keep it simple.

@konovod

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

@malte-v well, compiler will say it anyway if anybody will try to use it with a method.
But @var looks like expression, while :var clearly points that it's a member name, not a value.

@ysbaddaden

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

Ruby uses :@var in some places, such as instance_var_get(:@foo). It emphasizes that it's not the actual ivar (because it's a symbol) but still refers to the ivar @foo.

@Blacksmoke16

This comment has been minimized.

Copy link
Contributor

commented Apr 15, 2019

Should this be closed since #7589 is merged?

@malte-v

This comment has been minimized.

Copy link
Contributor Author

commented Apr 15, 2019

Yes, you're right. I'll close this.

@malte-v malte-v closed this Apr 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.