Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: a compiler compiled in release mode can no longer compile anything else #10359

Closed
elbywan opened this issue Feb 3, 2021 · 68 comments
Closed

Comments

@elbywan
Copy link

elbywan commented Feb 3, 2021

Issue

It is hard to summarize the issue clearly so I'll try my best to explain 😉.

Producing a binary with the --release flag that requires the crystal compiler (require "compiler/crystal/**") is unable to compile other programs (at least on my machine running macos & llvm 11).

Running the binary produces the following error:

Called macro defined in /usr/local/Cellar/crystal/0.36.1/src/enum.cr:139:5

 139 | {% if @type.annotation(Flags) %}

Which expanded to:

   35 |         end
   36 |
 > 37 |         if value == 4294967306_u32
                            ^-------------
Error: 4294967306 doesn't fit in an UInt32  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/ast.cr:13:9 in 'raise'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/ast.cr:12:5 in 'raise'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/bindings.cr:13:7 in 'instantiate'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/call.cr:311:5 in 'lookup_matches_in_type'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/call.cr:238:3 in 'lookup_matches_in:with_literals'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/call.cr:133:7 in 'lookup_matches:with_literals'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/call.cr:96:5 in 'recalculate'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/main_visitor.cr:1486:7 in 'visit'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/syntax/visitor.cr:27:12 in 'accept'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/instance_vars_initializer_visitor.cr:109:7 in 'semantic'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/compiler.cr:172:7 in '__crystal_main'
  from /usr/local/Cellar/crystal/0.36.1/src/crystal/main.cr:110:5 in 'main'
Full stacktrace

Unhandled exception: In /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/compiler.cr:98:38

 98 | property codegen_target = Config.host_target
                                       ^----------
Error: instantiating 'Crystal::Config:Module#host_target()'


In /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/config.cr:57:83

 57 | target = Crystal::Codegen::Target.new(target.to_s.sub(default_libc, "-#{linux_runtime_libc}"))
                                                                              ^-----------------
Error: instantiating 'linux_runtime_libc()'


In /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/config.cr:65:28

 65 | ldd_version = String.build do |io|
                           ^----
Error: instantiating 'String.class#build()'


In /usr/local/Cellar/crystal/0.36.1/src/string.cr:277:21

 277 | String::Builder.build(capacity) do |builder|
                       ^----
Error: instantiating 'String::Builder.class#build(Int32)'


In /usr/local/Cellar/crystal/0.36.1/src/string.cr:277:21

 277 | String::Builder.build(capacity) do |builder|
                       ^----
Error: instantiating 'String::Builder.class#build(Int32)'


In /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/config.cr:65:28

 65 | ldd_version = String.build do |io|
                           ^----
Error: instantiating 'String.class#build()'


In /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/config.cr:66:17

 66 | Process.run("ldd", {"--version"}, output: io, error: io)
              ^--
Error: instantiating 'Process.class#run(String, Tuple(String))'


In /usr/local/Cellar/crystal/0.36.1/src/process.cr:133:14

 133 | status = new(command, args, env, clear_env, shell, input, output, error, chdir).wait
                ^--
Error: instantiating 'new(String, Tuple(String), Nil, Bool, Bool, Process::Redirect, String::Builder, String::Builder, Nil)'


In /usr/local/Cellar/crystal/0.36.1/src/kernel.cr:8:28

 8 | STDIN = IO::FileDescriptor.from_stdio(0)
                                ^---------
Error: instantiating 'IO::FileDescriptor.class#from_stdio(Int32)'


In /usr/local/Cellar/crystal/0.36.1/src/io/file_descriptor.cr:35:37

 35 | Crystal::System::FileDescriptor.from_stdio(fd)
                                      ^---------
Error: instantiating 'Crystal::System::FileDescriptor:Module#from_stdio(Int32)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/system/unix/file_descriptor.cr:173:31

 173 | return IO::FileDescriptor.new(fd).tap(&.flush_on_newline=(true)) unless ret == 0
                                 ^--
Error: instantiating 'IO::FileDescriptor.class#new(Int32)'


In /usr/local/Cellar/crystal/0.36.1/src/io/file_descriptor.cr:14:3

 14 | def initialize(fd, blocking = nil)
      ^------------
Error: instantiating 'GC:Module#add_finalizer(IO::FileDescriptor)'


In /usr/local/Cellar/crystal/0.36.1/src/gc/boehm.cr:155:5

 155 | add_finalizer_impl(object)
       ^-----------------
Error: instantiating 'add_finalizer_impl(IO::FileDescriptor)'


In /usr/local/Cellar/crystal/0.36.1/src/gc/boehm.cr:164:33

 164 | ->(obj, data) { obj.as(T).finalize },
                                 ^-------
Error: instantiating 'IO::FileDescriptor+#finalize()'


In /usr/local/Cellar/crystal/0.36.1/src/io/file_descriptor.cr:193:5

 193 | close rescue nil
       ^----
Error: instantiating 'close()'


In /usr/local/Cellar/crystal/0.36.1/src/io/buffered.cr:225:5

 225 | flush if @out_count > 0
       ^----
Error: instantiating 'flush()'


In /usr/local/Cellar/crystal/0.36.1/src/io/buffered.cr:217:5

 217 | unbuffered_write(Slice.new(out_buffer, @out_count)) if @out_count > 0
       ^---------------
Error: instantiating 'unbuffered_write(Slice(UInt8))'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/system/unix/file_descriptor.cr:21:5

 21 | evented_write(slice, "Error writing file") do |slice|
      ^------------
Error: instantiating 'evented_write(Slice(UInt8), String)'


In /usr/local/Cellar/crystal/0.36.1/src/io/evented.cr:72:7

 72 | loop do
      ^---
Error: instantiating 'loop()'


In /usr/local/Cellar/crystal/0.36.1/src/io/evented.cr:72:7

 72 | loop do
      ^---
Error: instantiating 'loop()'


In /usr/local/Cellar/crystal/0.36.1/src/io/evented.cr:80:13

 80 | wait_writable
      ^------------
Error: instantiating 'wait_writable()'


In /usr/local/Cellar/crystal/0.36.1/src/io/evented.cr:144:3

 144 | def wait_writable(timeout = @write_timeout)
       ^------------
Error: instantiating 'wait_writable((Time::Span | Nil))'


In /usr/local/Cellar/crystal/0.36.1/src/io/evented.cr:145:5

 145 | wait_writable(timeout: timeout) { raise TimeoutError.new("Write timed out") }
       ^------------
Error: instantiating 'wait_writable()'


In /usr/local/Cellar/crystal/0.36.1/src/io/evented.cr:153:24

 153 | Crystal::Scheduler.reschedule
                          ^---------
Error: instantiating 'Crystal::Scheduler.class#reschedule()'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/scheduler.cr:44:30

 44 | Thread.current.scheduler.reschedule
                               ^---------
Error: instantiating 'Crystal::Scheduler#reschedule()'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/scheduler.cr:144:5

 144 | loop do
       ^---
Error: instantiating 'loop()'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/scheduler.cr:144:5

 144 | loop do
       ^---
Error: instantiating 'loop()'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/scheduler.cr:147:20

 147 | runnable.resume
                ^-----
Error: instantiating 'Fiber#resume()'


In /usr/local/Cellar/crystal/0.36.1/src/fiber.cr:197:24

 197 | Crystal::Scheduler.resume(self)
                          ^-----
Error: instantiating 'Crystal::Scheduler.class#resume(Fiber)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/scheduler.cr:48:30

 48 | Thread.current.scheduler.resume(fiber)
                               ^-----
Error: instantiating 'Crystal::Scheduler#resume(Fiber)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/scheduler.cr:91:5

 91 | validate_resumable(fiber)
      ^-----------------
Error: instantiating 'validate_resumable(Fiber)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/scheduler.cr:111:7

 111 | fatal_resume_error(fiber, "tried to resume a dead fiber")
       ^-----------------
Error: instantiating 'fatal_resume_error(Fiber, String)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/scheduler.cr:123:5

 123 | {% unless flag?(:win32) %}
       ^
Error: expanding macro


There was a problem expanding macro 'macro_4635878608'

Called macro defined in /usr/local/Cellar/crystal/0.36.1/src/crystal/scheduler.cr:123:5

 123 | {% unless flag?(:win32) %}

Which expanded to:

   1 |
   2 |       # FIXME: Enable when caller is supported on win32
 > 3 |       caller.each { |line| Crystal::System.print_error "  from #{line}\n" }
   4 |
Error: instantiating 'caller()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack.cr:17:28

 17 | Exception::CallStack.new.printable_backtrace
                               ^------------------
Error: instantiating 'Exception::CallStack#printable_backtrace()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack.cr:46:20

 46 | @backtrace ||= decode_backtrace
                     ^---------------
Error: instantiating 'decode_backtrace()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack.cr:169:16

 169 | @callstack.compact_map do |ip|
                  ^----------
Error: instantiating 'Array(Pointer(Void))#compact_map()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack.cr:169:16

 169 | @callstack.compact_map do |ip|
                  ^----------
Error: instantiating 'Array(Pointer(Void))#compact_map()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack.cr:172:38

 172 | file, line, column = CallStack.decode_line_number(pc)
                                      ^-----------------
Error: instantiating 'Exception::CallStack.class#decode_line_number(UInt64)'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/dwarf.cr:29:5

 29 | load_dwarf
      ^---------
Error: instantiating 'load_dwarf()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/dwarf.cr:19:9

 19 | load_dwarf_impl
      ^--------------
Error: instantiating 'load_dwarf_impl()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:13:5

 13 | read_dwarf_sections
      ^------------------
Error: instantiating 'read_dwarf_sections()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:17:5

 17 | locate_dsym_bundle do |mach_o|
      ^-----------------
Error: instantiating 'locate_dsym_bundle()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:69:11

 69 | files.each do |dwarf|
            ^---
Error: instantiating 'Tuple(String, String)#each()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:69:11

 69 | files.each do |dwarf|
            ^---
Error: instantiating 'Tuple(String, String)#each()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:72:22

 72 | Crystal::MachO.open(program) do |mach_o|
                     ^---
Error: instantiating 'Crystal::MachO.class#open(String)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/mach_o.cr:94:12

 94 | File.open(path, "r") do |file|
           ^---
Error: instantiating 'File.class#open(String, String)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/mach_o.cr:94:12

 94 | File.open(path, "r") do |file|
           ^---
Error: instantiating 'File.class#open(String, String)'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:72:22

 72 | Crystal::MachO.open(program) do |mach_o|
                     ^---
Error: instantiating 'Crystal::MachO.class#open(String)'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:73:24

 73 | Crystal::MachO.open(dwarf) do |dsym|
                     ^---
Error: instantiating 'Crystal::MachO.class#open(String)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/mach_o.cr:94:12

 94 | File.open(path, "r") do |file|
           ^---
Error: instantiating 'File.class#open(String, String)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/mach_o.cr:94:12

 94 | File.open(path, "r") do |file|
           ^---
Error: instantiating 'File.class#open(String, String)'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:73:24

 73 | Crystal::MachO.open(dwarf) do |dsym|
                     ^---
Error: instantiating 'Crystal::MachO.class#open(String)'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:17:5

 17 | locate_dsym_bundle do |mach_o|
      ^-----------------
Error: instantiating 'locate_dsym_bundle()'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:26:14

 26 | mach_o.read_section?("__debug_info") do |sh, io|
             ^------------
Error: instantiating 'Crystal::MachO#read_section?(String)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/mach_o.cr:503:13

 503 | @io.seek(sh.offset) do
           ^---
Error: instantiating 'IO::FileDescriptor+#seek(UInt32)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/mach_o.cr:503:13

 503 | @io.seek(sh.offset) do
           ^---
Error: instantiating 'IO::FileDescriptor+#seek(UInt32)'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:26:14

 26 | mach_o.read_section?("__debug_info") do |sh, io|
             ^------------
Error: instantiating 'Crystal::MachO#read_section?(String)'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/mach_o.cr:36:11

 36 | parse_function_names_from_dwarf(info, strings) do |low_pc, high_pc, name|
      ^------------------------------
Error: instantiating 'parse_function_names_from_dwarf(Crystal::DWARF::Info, (Crystal::DWARF::Strings | Nil))'


In /usr/local/Cellar/crystal/0.36.1/src/exception/call_stack/dwarf.cr:49:10

 49 | info.each do |code, abbrev, attributes|
           ^---
Error: instantiating 'Crystal::DWARF::Info#each()'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/dwarf/info.cr:49:31

 49 | abbrev.attributes.each do |attr|
                        ^---
Error: instantiating 'Array(Crystal::DWARF::Abbrev::Attribute)#each()'


In /usr/local/Cellar/crystal/0.36.1/src/indexable.cr:187:5

 187 | each_index do |i|
       ^---------
Error: instantiating 'each_index()'


In /usr/local/Cellar/crystal/0.36.1/src/indexable.cr:187:5

 187 | each_index do |i|
       ^---------
Error: instantiating 'each_index()'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/dwarf/info.cr:49:31

 49 | abbrev.attributes.each do |attr|
                        ^---
Error: instantiating 'Array(Crystal::DWARF::Abbrev::Attribute)#each()'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/dwarf/info.cr:50:23

 50 | value = read_attribute_value(attr.form)
              ^-------------------
Error: instantiating 'read_attribute_value(Crystal::DWARF::FORM)'


In /usr/local/Cellar/crystal/0.36.1/src/crystal/dwarf/info.cr:134:41

 134 | raise "Unknown DW_FORM_#{form.to_s.underscore}"
                                     ^---
Error: instantiating 'Crystal::DWARF::FORM#to_s()'


There was a problem expanding macro 'macro_4649176576'

Code in /usr/local/Cellar/crystal/0.36.1/src/enum.cr:139:5

 139 | {% if @type.annotation(Flags) %}
       ^
Called macro defined in /usr/local/Cellar/crystal/0.36.1/src/enum.cr:139:5

 139 | {% if @type.annotation(Flags) %}

Which expanded to:

   35 |         end
   36 |
 > 37 |         if value == 4294967306_u32
                            ^-------------
Error: 4294967306 doesn't fit in an UInt32  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/ast.cr:13:9 in 'raise'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/ast.cr:12:5 in 'raise'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/bindings.cr:13:7 in 'instantiate'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/call.cr:311:5 in 'lookup_matches_in_type'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/call.cr:238:3 in 'lookup_matches_in:with_literals'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/call.cr:133:7 in 'lookup_matches:with_literals'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/call.cr:96:5 in 'recalculate'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/main_visitor.cr:1486:7 in 'visit'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/syntax/visitor.cr:27:12 in 'accept'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/semantic/instance_vars_initializer_visitor.cr:109:7 in 'semantic'
  from /usr/local/Cellar/crystal/0.36.1/src/compiler/crystal/compiler.cr:172:7 in '__crystal_main'
  from /usr/local/Cellar/crystal/0.36.1/src/crystal/main.cr:110:5 in 'main'

It seems like one value of the Crystal::DWARF::FORM enum (of type UInt32) is inlined by the compiler to 4294967306_u32 which is indeed out of bounds.

More generally every enum of type UInt32 seem to be messed up during compilation.

I have very limited expertise about this but I would guess that this is caused by LLVM 11 optimizations since this code used to run fine with LLVM 9.

How to reproduce

The following file compiles itself and prints the values of an UInt32 enum twice (first time during the compilation with the crystal compiler, then a second time when the program runs and compiles itself).

The enum part is not required to produce the stacktrace (compiling puts("hello world") is failing too) but is useful to demonstrate that something fishy is happening with UInt32 enums.

require "compiler/crystal/**"

# Machine dependent - change to suit the `crystal env` output.
ENV["CRYSTAL_LIBRARY_PATH"] = "/usr/local/Cellar/crystal/0.36.1/embedded/lib:/usr/local/lib"
ENV["CRYSTAL_CACHE_DIR"] = "/Users/elbywan/.cache/crystal"
ENV["CRYSTAL_PATH"] = "lib:/usr/local/Cellar/crystal/0.36.1/src"

enum Test : UInt32
  One   =  1
  Two   =  2
  Three =  3
  Four  =  4
  Five  =  5
  Six   =  6
  Seven =  7
  Eight =  8
  Nine  =  9
  Ten   = 10

  {% for member in @type.constants %}
    {% puts member, @type.constant(member) %}
  {% end %}
end

compiler = Crystal::Compiler.new
compiler.no_codegen = true
file = File.new(__FILE__)
sources = [
  Crystal::Compiler::Source.new(__FILE__, file.gets_to_end),
]
file.close
puts "Compiling…"
result = compiler.compile(sources, "")
puts "Done"
  • crystal run file.cr

Produces the expected correct output:

One
1_u32
Two
2_u32
Three
3_u32
Four
4_u32
Five
5_u32
Six
6_u32
Seven
7_u32
Eight
8_u32
Nine
9_u32
Ten
10_u32
Compiling…
One
1_u32
Two
2_u32
Three
3_u32
Four
4_u32
Five
5_u32
Six
6_u32
Seven
7_u32
Eight
8_u32
Nine
9_u32
Ten
10_u32
Done
  • crystal run --release file.cr

Produces the incorrect output:

One
1_u32
Two
2_u32
Three
3_u32
Four
4_u32
Five
5_u32
Six
6_u32
Seven
7_u32
Eight
8_u32
Nine
9_u32
Ten
10_u32
Compiling…
One
1_u32
Two
8_u32
Three
9_u32
Four
0_u32
Five
1_u32
Six
2_u32
Seven
3_u32
Eight
4_u32
Nine
5_u32
Ten
4294967306_u32

(+ the stacktrace at the end)

Crystal -v

Crystal 0.36.1 (2021-02-02)

LLVM: 11.0.1
Default target: x86_64-apple-macosx
@asterite
Copy link
Member

It seems this bugs prevents further development of crystalline, a language server protocol.

It would have been nice to know that, I would probably have fixed it by now.

I'll try looking at it this week if I have time.

@j8r
Copy link
Contributor

j8r commented Feb 17, 2021

It may be a LLVM 11.0 bug, please try with 11.1.0.

@asterite
Copy link
Member

@j8r How did you reach that conclusion?

@j8r
Copy link
Contributor

j8r commented Feb 17, 2021

It is not a conclusion, it is an hyphotesis, I don't know much about LLVM 😅
Such breakages already happened in the past.

Better try the patched LLVM versions, it is easy and can save time.

@j8r
Copy link
Contributor

j8r commented Feb 17, 2021

Also it is mentioned of "renumbering values", so I thought it might be this bug. I don't want you to waste your time @asterite 😉

@asterite
Copy link
Member

I don't want you to waste your time @asterite 😉

@j8r But that's the thing. Every time there's a misleading comment I have to waste time jumping in and saying "No, this is wrong, don't waste time looking into this direction".

I prefer if we would avoid misleading comments, or unfounded hypothesis. This is clearly a bug in the compiler where a number is outputted with the type suffix and that overflows.

@straight-shoota
Copy link
Member

straight-shoota commented Feb 17, 2021

Making helpful suggestions is still encouraged. But this is very clearly not related to LLVM at all because it happens in the semantic stage of the compiling process. So it's entirely internal to the compiler.
So the LLVM 11.0 hypothesis misses any solid basis.
And it could've easily been refuted by yoursel, trying to reproduce with LLVM 11.1 (or even with an older version which is even easier because you can just an existing container image for that). (ooops, I missed the OP already reported it worked with LLVM9)

@j8r
Copy link
Contributor

j8r commented Feb 17, 2021

There should be a reason why it worked before and not now.

@asterite
Copy link
Member

To whoever wants to reproduce this:

crystal run foo.cr --release

^ THAT'S WRONG

crystal run --release foo.cr

THAT'S CORRECT

Because... doesn't it feel good to pass an argument to crystal, and crystal totally ignoring it? ❤️

@elbywan
Copy link
Author

elbywan commented Feb 17, 2021

@asterite Yeah sorry, just edited the original post to fix this 😉.

@asterite
Copy link
Member

@elbywan No need to say sorry, that behavior is a mess and I'd like to change that

@j8r
Copy link
Contributor

j8r commented Feb 17, 2021

I reproduced the issue:
LLVM 10: not impacted.
LLVM 11.0.1: impacted
LLVM 11.1.0: impacted
LLVM 12.0: not supported by Crystal.

@asterite asterite changed the title Stacktrace when running a binary that requires the crystal compiler Bug: a compiler compiled in release mode can no longer compile anything else Feb 21, 2021
@asterite
Copy link
Member

@bcardiff If I'm not mistaken, the issue is pretty serious: if you compile a compiler in release mode, it can't compile anything else that requires the prelude. I don't know what's going on...

I think this is a blocker for 1.0

@straight-shoota
Copy link
Member

Yes, this looks like a major issue. On linux it sig faults:

$ LLVM_CONFIG=/home/linuxbrew/.linuxbrew/opt/llvm/bin/llvm-config bin/crystal run --release test-compiler-release.cr
Using compiled compiler at .build/crystal
One
1_u32
Two
2_u32
Three
3_u32
Four
4_u32
Five
5_u32
Six                                                                                                                                                                                                                                   
6_u32
Seven
7_u32
Eight
8_u32
Nine
9_u32
Ten
10_u32
Compiling…
Invalid memory access (signal 11) at address 0x40d8dede
[0xd12136] ???
[0xd08192] __crystal_sigfault_handler +514
[0xd38531] sigfault_handler +40
[0x7f059007d3c0] ???
[0x83b309] ???
...
$ bin/crystal --version
Using compiled compiler at .build/crystal
Crystal 1.0.0-dev [de9351f05] (2021-02-20)

LLVM: 11.1.0
Default target: x86_64-unknown-linux-gnu

@straight-shoota
Copy link
Member

We don't build macos nightlies, right? I suppose they would be affected, too. And any local built compiler with release mode.

On linux nightlies (and latest) are still linked against LLVM 10, so that's safe.

@straight-shoota straight-shoota added this to the 1.0.0 milestone Feb 21, 2021
@bcardiff
Copy link
Member

This is undetected in the CI becasue

  1. CircleCI / bin/ci although started to use LLVM 11 recently the compiler is not built on release mode
  2. GitHub Actions uses LLVM 10 due to shell.nix and same as before bin/ci does not build the compiler in release mode

In CircleCI test_dist_linux_on_docker test_dist_linux32_std_on_docker checks the progress of the compiler built in release mode, but there LLVM 10 is used. There is no equivalent for Darwin. And the Darwin .tar.gz built with distribution scripts is using LLVM 6.

The only Crystal compiler in release mode I am aware of comes from homebrew.

@asterite
Copy link
Member

The only Crystal compiler in release mode I am aware of comes from homebrew

I'm not sure that's the issue. The compiler build in release mode from homebrew works fine. The problem is building a new compiler in release mode from that one. But maybe the one in homebrew is already broken? I'm not sure.

My theroies:

  • Something between 0.36.1 and master broke something.
  • Something was broken before, and it's only noticeable in recent LLVM versions
  • Something else?

We could do a bisect too.

What I'll try as a first step is checking out v0.36.1 and running the repro steps to see if the error was there.

@asterite
Copy link
Member

So, going back to v0.36.1, compiling a compiler in release mode and running an empty program already fails:

# foo.cr (empty)
$ git co 0.36.1
$ make clean crystal release=1
$ bin/crystal foo.cr

Output:

There was a problem expanding macro 'macro_4470593616'

Code in src/enum.cr:139:5

 139 | {% if @type.annotation(Flags) %}
       ^
Called macro defined in src/enum.cr:139:5

 139 | {% if @type.annotation(Flags) %}

Which expanded to:

 > 35 |         end
 > 36 |
 > 37 |         if value == 4294967306_u32
                            ^-------------
Error: 4294967306 doesn't fit in an UInt32

I'm going to try now with LLVM 10 (I don't think it's an LLVM issue, we probably have a bug that is only triggered when LLVM 11+ is used.)

@asterite
Copy link
Member

Another "bad" thing: it seems llvm@10 no longer exists in homebrew... I'm going to try with llvm@9 to see what happens.

@asterite
Copy link
Member

The repro steps #10359 (comment) also fail for LLVM 9 for me.

It might be that the 0.36.1 is already buggy.

Next I'm going to try checking out 0.35.1 and do the repro steps. Hopefully it works fine and the issue is somewhere between 0.35.1 and 0.36.1! 🤞

@asterite
Copy link
Member

Well... it segfaults during compilation 😞

I think we'll have to go back to using an 0.35.1 compiler and start from there. But I ran out of time for today.

@asterite
Copy link
Member

Hopefully this bug has some relation to enums, uint32 and converting literals to them. We'll see.

@asterite
Copy link
Member

I'm thinking this bug is related to union of numbers and assigning/extracting values from them. This was fixed recently, but maybe there's still something going on there...

As a workaround maybe we can change those UInt32 enums to Int64 until we figure out what the problem is, to avoid blocking 1.0.

@straight-shoota
Copy link
Member

That workaround wouldn't fix user code with UInt enums, though.

@0xtmphey
Copy link

Not sure if it matters but adding else branch to case..when prevents the bug from happening. I guess this is because unions are not involved anymore (changing the return type from (UInt32 | Nil) to just UInt32

@kramerul
Copy link

It seems that the bug doesn't happen in foo. There is something wrong in puts or in passing the values from foo to puts. Replacing puts with p produces the correct result.

def foo(str)
  if 1.to_s == "1"
    str.to_u32
  end
end

p foo("1")
p foo("2")
p foo("3")

I also disassembled both variants to ensure that foo is the same in both cases.

Debugging also turned out that foo seems to return the correct values. Setting a breakpoint at the end of the function at retq showed the following value

Input register read eax (Seems to be the type of the union) register read edx x/1xw <rsp-0x10>
"1" 0x000000a8 0x00000001 0x00000001
"2" 0x000000a8 0x00000002 0x00000002
"3" 0x000000a8 0x00000003 0x00000003

Using puts "#{foo("1")} #{foo("2")} #{foo("3")}" also produces the wrong results.

@asterite
Copy link
Member

I was able to reproduce this bug without puts, just calling to_s and LibC.printf

I'll try to reduce the problem even further but commenting prelude code. At least that's how I found the cause of another bug.

@asterite
Copy link
Member

For instance, this reproduces the problem and there's no puts involved at all:

class NumberLiteral
  def initialize(@value : Int32, @kind : Int32)
  end

  def kind
    @kind
  end

  def value
    @value
  end
end

struct Int32
  def u32?
    case self
    when 0 then 0_u32
    when 1 then 1_u32
    when 2 then 2_u32
    when 3 then 3_u32
    when 4 then 4_u32
    when 5 then 5_u32
    when 6 then 6_u32
    when 7 then 7_u32
    when 8 then 8_u32
    else        nil
    end
  end
end

struct UInt32
  def to_s2 : UInt8
    return 48_u8 if self == 0

    ptr = Pointer(UInt8).malloc(2) + 1
    num = self

    while num != 0
      ptr = ptr - 1
      ptr.value = 48_u8 &+ num.unsafe_mod(10)
      num = num.unsafe_div(10)
    end

    ptr.value
  end
end

def foo(num)
  case num.kind
  when 0
    case 1
    when 0 then num.value.to_i8!
    when 1 then num.value.u32? || (Pointer(UInt8).malloc(2); LibC.exit 0)
    else
      LibC.exit 0
    end
  else
    Pointer(UInt8).malloc(2)
    LibC.exit 0
  end
end

class String
  def to_unsafe
    pointerof(@c)
  end
end

class Array2
  def initialize
    @buffer = Pointer(NumberLiteral).malloc(4)
    @buffer[0] = NumberLiteral.new(0, 0)
    @buffer[1] = NumberLiteral.new(1, 0)
  end

  def each
    yield @buffer[0]
    yield @buffer[1]
  end
end

members = Array2.new
members.each do |value|
  z = foo(value)
  if z.is_a?(UInt32)
    LibC.printf "%c\n", z.to_s2
  end
end

@HertzDevil
Copy link
Contributor

I compared the LLVM IR outputs of #10359 (comment) between these compilers:

Crystal 1.0.0-dev [b018f288c] (2021-02-28)

LLVM: 10.0.0
Default target: x86_64-unknown-linux-gnu
Crystal 1.0.0-dev [5878c876a] (2021-03-07)

LLVM: 11.0.1
Default target: x86_64-pc-linux-gnu

The LLVM 10 one is from Bintray and the LLVM 11 one is compiled in release mode; the latter produces a SIGSEGV after printing 0 and 1 (which are specially handled inside Int#to_s). The offending instruction is movq %r13, %rax while computing num.tdiv(base) inside Int#internal_to_s, whereas the actual crash occurs when trying to index Int::DIGITS_DOWNCASE with an out-of-bound value:

.LBB0_102:                              # %body.i24.i.i.i.i
                                        #   Parent Loop BB0_92 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
	movq	%r13, %rax
	imulq	%rdi, %rax
	shrq	$35, %rax
	leal	(%rax,%rax), %edx
	leal	(%rdx,%rdx,4), %edx
	movl	%r13d, %esi
	subl	%edx, %esi
	movzbl	".L'0123456789abcdefghi...'.1"+12(%rsi), %edx

This line would be movl %r13, %eax in LLVM 10 instead, and indeed replacing this single instruction would make the compiled program run properly and not crash. This appears to come from the following IR fragment:

body.i24.i.i.i.i:                                 ; preds = %body.i24.i.i.i.i.preheader, %body.i24.i.i.i.i
  %num.042.i.i.i.i.i = phi i32 [ %463, %body.i24.i.i.i.i ], [ %obj1.sroa.2.8.extract.trunc.i.i.i.i, %body.i24.i.i.i.i.preheader ]
  %ptr20.041.i.i.i.i.i = phi i8* [ %462, %body.i24.i.i.i.i ], [ %355, %body.i24.i.i.i.i.preheader ]
  %462 = getelementptr inbounds i8, i8* %ptr20.041.i.i.i.i.i, i64 -1
  %scevgep = getelementptr i8, i8* %ptr20.041.i.i.i.i.i, i64 -1
  %num.042.i.i.i.i.i.frozen = freeze i32 %num.042.i.i.i.i.i
  %463 = udiv i32 %num.042.i.i.i.i.i.frozen, 10
  %464 = mul i32 %463, 10
  %.decomposed315 = sub i32 %num.042.i.i.i.i.i.frozen, %464
  %465 = zext i32 %.decomposed315 to i64
  %466 = getelementptr inbounds { i32, i32, i32, [37 x i8] }, { i32, i32, i32, [37 x i8] }* @"'0123456789abcdefghi...'.1", i64 0, i32 3, i64 %465
  %467 = load i8, i8* %466, align 1

Only LLVM 11 generates that freeze instruction; removing that also makes the offending movq go away. My guess is using freeze here allows LLVM to assume that the part of an Int64 value that goes unused in an UInt32 inside a union is defined, which then gives %num.042.i.i.i.i.i.frozen an alignment of 8 bytes and permits the use of movq.

The internal layout of the integer union is { i32, [1 x i64] }; maybe if we align the struct ourselves instead of using i64 then LLVM won't perform this bad optimization. So I tried the following:

module Crystal
  class LLVMTyper
    private def create_llvm_type(type : MixedUnionType, wants_size)
      @llvm_context.struct(llvm_name) do |a_struct|
        # ...
        padding = pointer_size - size_of(@llvm_context.int32)
        llvm_value_type = @llvm_context.int8.array(max_size * pointer_size)

        if padding > 0
          [@llvm_context.int32, @llvm_context.int8.array(padding), llvm_value_type]
        else
          [@llvm_context.int32, llvm_value_type]
        end
    end

    def union_value_type(type : MixedUnionType)
      llvm_type(type).struct_element_types[-1]
    end
  end
end

This works, and the union becomes { i32, [4 x i8], [8 x i8] }, but doing so produces some unsightly results as post-optimization code must address that UInt32 value using the individual bytes, like:

  %obj1.sroa.2.sroa.6.0.insert.ext.i.i.i.i = zext i8 %__temp_489.elt3.elt11.i.i to i32
  %obj1.sroa.2.sroa.6.0.insert.shift.i.i.i.i = shl nuw i32 %obj1.sroa.2.sroa.6.0.insert.ext.i.i.i.i, 24
  %obj1.sroa.2.sroa.5.0.insert.ext.i.i.i.i = zext i8 %__temp_489.elt3.elt9.i.i to i32
  %obj1.sroa.2.sroa.5.0.insert.shift.i.i.i.i = shl nuw nsw i32 %obj1.sroa.2.sroa.5.0.insert.ext.i.i.i.i, 16
  %obj1.sroa.2.sroa.4.0.insert.ext.i.i.i.i = zext i8 %__temp_489.elt3.elt7.i.i to i32
  %obj1.sroa.2.sroa.4.0.insert.shift.i.i.i.i = shl nuw nsw i32 %obj1.sroa.2.sroa.4.0.insert.ext.i.i.i.i, 8
  %obj1.sroa.2.sroa.0.0.insert.ext.i.i.i.i = zext i8 %__temp_489.elt3.elt.i.i to i32
  %obj1.sroa.2.sroa.5.0.insert.insert.i.i.i.i = or i32 %obj1.sroa.2.sroa.5.0.insert.shift.i.i.i.i, %obj1.sroa.2.sroa.0.0.insert.ext.i.i.i.i
  %obj1.sroa.2.sroa.4.0.insert.insert.i.i.i.i = or i32 %obj1.sroa.2.sroa.5.0.insert.insert.i.i.i.i, %obj1.sroa.2.sroa.6.0.insert.shift.i.i.i.i
  %obj1.sroa.2.sroa.0.0.insert.insert.i.i.i.i = or i32 %obj1.sroa.2.sroa.4.0.insert.insert.i.i.i.i, %obj1.sroa.2.sroa.4.0.insert.shift.i.i.i.i

This also doesn't explain why the compiler itself has to be built in release mode. However all this testing makes me believe the same issue can also happen with any union of Crystal structs, not just integers.

@bcardiff
Copy link
Member

bcardiff commented Mar 9, 2021

Since the official releases does not use llvm@11, and llvm@10 seems to have no issue. I see no need to delay 1.0 on this issue.

We can even fallback to llvm@9 in homebrew.

It is definitely needed to declare that Crystal supports llvm@11. Maybe 1.0.1. Last @HertzDevil comment means also that #10453 as a workaround might not even catch some occurrences of the underlying issue.

@bcardiff bcardiff removed this from the 1.0.0 milestone Mar 9, 2021
@straight-shoota
Copy link
Member

Should we remove 11.0 from supported LLVM versions then? It is partly supported, so maybe it should stay.

@bcardiff
Copy link
Member

bcardiff commented Mar 9, 2021

It can stay otherwise is a chicken-vs-egg problem to me. But on the following release we can highlight this as a known issue for llvm 11.

@bcardiff bcardiff added this to the post-1.0 milestone Mar 22, 2021
@anatol
Copy link
Contributor

anatol commented Mar 28, 2021

I wanted to move Arch Linux's package to LLVM 11 but @jhass pointed me to this thread.

What is the current recommendation? Stay with LLVM 10? Is there any workaround for LLVM11 configuration?

@straight-shoota
Copy link
Member

straight-shoota commented Mar 28, 2021

Just stay with LLVM 10 until LLVM 11 support is complete.

@beta-ziliani
Copy link
Member

It seems there's a bug in LLVM regarding the DivRemPairs optimization. This was caught by @ggiraldez and I posted the question in the llvm forum. There's a fix that might land in llvm 12.1, we'll see. With that fix, I was able to compile and run one of the examples successfully, but it also took a crazy amount of time to compile. I will investigate what's the issue there, and if everything else is working properly.

BTW, to compile with LLVM 11.1 in a Mac the problem with libxml persists. The solution is to make llvm with the following command:

cmake -G Ninja llvm "-DLLVM_ENABLE_LIBXML2=OFF"

@bcardiff
Copy link
Member

@beta-ziliani Maybe you did not compile llvm locally in release mode? This depends on the CMAKE_BUILD_TYPE env variable.

(I was checking if there were invalid alignment issues, but even lowering all alignments to 1 leads to the same invalid results)

If that patch is all that is needed the options to move forward are:

  1. Patch llvm@11 homebrew formula to include it and updated crystal brew formula to use that as dependency. No need to wait for a crystal release to do this.
  2. Keep track that llvm@11 is not supported unless the referenced patch is included.
  3. See if there is a way to emit llvm-ir in a different way to avoid falling in the llvm bug. This could be done only if the compiler was linked against llvm >= 11.

Of course other distribution packages will need to be addressed, but homebrew is the one that is pushing this more. The others can wait a little longer.

I will build llvm 11 locally and see how it goes... :-)

@bcardiff
Copy link
Member

Hombrew PR sent Homebrew/homebrew-core#79289 🤞

@bcardiff
Copy link
Member

homebrew pr has been merged and bottles updated

% brew install crystal
% crystal --version
Crystal 1.0.0 (2021-03-22)

LLVM: 11.1.0
Default target: x86_64-apple-macosx

🎉

I guess we can close this issue now.

llvm 11 support is conditioned to having this patch applied.

for other maintainers (@anatol) crystal 1.0 can be republished as is as long as llvm 11 is patched.

Whether that patch will be included or not in 12.0.1, it can be followed in https://bugs.llvm.org/show_bug.cgi?id=50695

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests