Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crystal i #10910

Closed
wants to merge 524 commits into from
Closed

crystal i #10910

wants to merge 524 commits into from

Conversation

@asterite
Copy link
Member

@asterite asterite commented Jul 8, 2021

The experimental crystal i (interactive) command is here!

What is crystal i

crystal i stands for "interactive crystal". It can be used in two modes right now:

  1. crystal i: opens an interactive crystal session (REPL) similar to irb from Ruby
  2. crystal i file.cr: runs a file in interpreted mode

In any of these two modes, you can use debugger in your code to debug it at that point. This is similar tu Ruby's pry. There you can use these commands (similar to pry too):

  • step: go to the next line/instruction, possibly going inside a method
  • next: go to the next line/instruction, doesn't enter into methods
  • finish: exit the current method
  • continue: resume execution
  • whereami: show where the debugger is

By the way, before this PR you could also use debugger and lldb or gdb would stop at that point, if compiled with -d (otherwise it seems the program just crashes, maybe we should just ignore debugger in that case: something to improve outside of this PR)

Overview and some technical details

When running in interpreted mode, semantic analysis is done as usual, but instead of then using LLVM to generate code, we compile code to bytecode (custom bytecode defined in this PR, totally unrelated to LLVM). Then there's an interpreter that understands this bytecode. All of this for a stack-based machine. If you know nothing about interpreters and you want to learn more, I recommend reading the excellent https://www.craftinginterpreters.com/a-bytecode-virtual-machine.html

The memory layout of things in interpreted mode matches the memory layout of compiled mode: a union of Int32 | String occupies 16 bytes in both modes, and there's an 8 bytes tag for the type ID. That said, the values for type IDs might not match between the two versions, and this makes the two worlds kind of incompatible (they can't interact with each other.)

C calls, and C callbacks, are implemented using libffi together with dlopen.

Status

I'm just opening this PR to give a chance for others to start looking and commenting at the code, and ask questions if things are not clear. That said, I'd still like to take some time to clean up and comment a lot of the code before merging this. A lot of this was rushed because I wanted to have a big chunk ready for the conference.

Some things that are missing to implement related to the language:

  • Raising and rescuing exceptions
  • Closures
  • Calling C functions that are outside of LibC (or outside the functions already included in the compiler, I think, because LibGC works)
  • Some other primitives (some atomic functions, some math functions) and I'm sure some others more

And then some things that I know don't work fine right now:

  • defining types and methods inside crystal i doesn't work super well, specially if you redefine methods or if you extend types (this is the real challenge, and we could maybe just postpone this one for how long as we want)
  • it seems that defining variables when in debugger mode (let's call it pry from now on) doesn't work. I think this one should be easy to fix but I didn't have time to look into it
  • free variables don't work well in pry don't work well (I'm not sure how to fix this, I think it requires a change also to the "real" Crystal)
  • if an error happens (exception, or maybe something doesn't compile), I think it gets stuck in that mode (this should be easy to fix too)
  • macro finished isn't executed yet
  • at_exit isn't executed yet (because main isn't executed at all)

Focus

For now, I'd like to focus on polishing the existing code and making sure it's 100% understandable and documented (some comments are already outdated, some method names are long and not nice, there might be some redundant code, and I'm sure there's already unclear code). After that we can merge this, put an "experimental" label next to the command, and then implement the missing things or fix existing bugs in separate PRs, which are going to be easier to review and merge like that.

I'm also open for suggestions on how to improve the code, be it for code clarity of for performance.

Also sorry beforehand for the commit (mis)organization and commit messages, this was all very experimental.

Should this be in a separate repository?

This PR has some changes to the standard library, mainly some methods and functions need to be emulated in interpreted mode, and there's a new flag?(:interpreted) thing you can use. These could be in a separate PR, and then we could maybe extract the tool to a separate repository. But I think that having interpreted mode in the main repository, like every other tool, is fine from now. I believe moving tools outside of this repository is a separate discussion. But for instance, eventually crystal play could start using interpreted mode under the hood, or maybe you will be able to do crystal spec -i ... to run in interpreted mode. So, again, having all of this in this repository is probably fine.

About FFI, dlopen and readline

Right now I put them in the top-level just for simplicity. We should probably move them under the Crystal namespace because we don't want to publicly support these.

There are now inside the Crystal namespace βœ…

More thoughts

  • It's clear I never ran the full spec suit locally in this branch 😊

Steps before merging this PR (will be updated as we work on it)

  • Install readline and libffi on CI
  • Document every nook and cranny of the interpreter
@asterite
Copy link
Member Author

@asterite asterite commented Aug 31, 2021

But as soon as the prelude is included, there's an error.

Can you try to pinpoint where that error happens? The easiest way would be to put a debugger line somewhere and do next until it crashes, then start again and try to do step where it crashed, etc.

@asterite
Copy link
Member Author

@asterite asterite commented Aug 31, 2021

@straight-shoota Also, you can try pulling and applying your patch again. I made a lot of fixes these past days.

@straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Aug 31, 2021

Okay, so the issue seems to be with calling fstat. I jerry-rigged something together because the symbol fstat could not be found. I don't know what's going on there, but aparently fstat is just a wrapper to an internal function and for some reason it doesn't work. Apparantly my workaround didn't work either. The symbol can be found but it errors when being called. 🀷

The first exception is raised when initializing STDOUT (because there's a call to fstat). But that doesn't break the program yet. I believe that exception causes initialization of STDERR and when that raises as well, the interpreter fails. Maybe because it recognizes that it's still in the process of rescuing an exception and suddenly, there's another exception from the same scope?

@straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Aug 31, 2021

My suspicion was correct, the following code triggers the EscapingException as described:

@[Primitive(:interpreter_raise_without_backtrace)]
def raise(exception : Exception) : NoReturn
end

def foo
  raise Exception.new
end

begin
  foo
rescue exc
  foo
end

@asterite
Copy link
Member Author

@asterite asterite commented Aug 31, 2021

I think the issue is that when an exception escapes then we print it using ex.inspect_with_backtrace(STDOUT), and because STDOUT didn't finish initializing that probably breaks in a bad way. We could change it to call ex.inspect_with_backtrace instead, then print that string using the STDOUT from the interpreter, not the interpreted program. At least to understand better what's going on. I'll push a commit for this soon.

@straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Aug 31, 2021

I don't think it's necessary tampering with exception printing. Knowing that that's the reason is enough. We should be able to investigate the actual error cause in isolation.

But on a more general perspective, we might want to make sure to make the kernel setup in general (not just for the interpreter) more resilient to issues with setting up the standard streams.

@asterite
Copy link
Member Author

@asterite asterite commented Aug 31, 2021

@straight-shoota Done! Can you try again?

That said, I think we should figure out what's the problem with fstat. Maybe try with a small program in isolation, with prelude=empty (there's no option to make that happen, but you can comment "prelude" and do require "empty" there)

@straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Aug 31, 2021

As I said in my previous comment, the last comment isn't necessary. It doesn't help me with debugging.

@asterite
Copy link
Member Author

@asterite asterite commented Aug 31, 2021

@straight-shoota Maybe we could pair on this one day to see why fstat is not working. Maybe the bindings to libffi aren't correct for linux.

@asterite
Copy link
Member Author

@asterite asterite commented Aug 31, 2021

@straight-shoota I think __fxstat64 fails because it takes 3 arguments, not 2: https://github.com/lattera/glibc/blob/master/sysdeps/unix/sysv/linux/fxstat64.c#L32

You could also try and see if fstat works in Ruby for you: https://carc.in/#/r/bumw

@straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Aug 31, 2021

Yeah, I've figured as much. But even with assumingly correct arguments, it still doesn't work.
It also reproduces with compiled Crystal, so this is unrelated to the interpreter.

The Ruby example fails for me as well. It appears that different versions of libc (I assume it's glibc because carc.in runs on Arch Linux) may have an fstat symbol or not.
However, I'm confused because when linking a compiled Crystal program there seems to be no issue with finding fstat.

@straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Aug 31, 2021

I've updated the gist with a patch that now works for me on x86_64-linux-gnu πŸš€ (https://gist.github.com/straight-shoota/6484f5b1fa5a5d0c46a932c3f9f8909f)

@asterite
Copy link
Member Author

@asterite asterite commented Sep 1, 2021

@straight-shoota Great! I just pushed that diff minus the change to out _ because I made that work.

Thinking about next steps, this is what comes to my mind:

  • Make CI pass
  • Allow distributing this (we probably need to ship libffi and libreadline)
  • Maybe add a flag to completely disable the interpreter in case people want to distribute it without it
  • Include a comment in the command's description that makes it very clear that this is still an experimental tool
  • Merge this PR

After that, there's still a lot more to do but then it will be easier because the PRs will be smaller and more focused. In particular:

  • Running std_specs still fail for the interpreter. On my side I get that out of the 247 std spec files we have, 31 are failing for various reasons: some things aren't implemented (Int128, casting between different tuple types inside a union, class variables for virtual types), some specs crash, probably because of bugs, and some specs work incorrectly, again probably because of bugs.
  • The interpreter has a lot of TODOs that need to be done. They probably aren't hit because the code to trigger them should be uncommon, but it should still be done
  • Used as a REPL, things like require "http/client" don't work out of the box because of how new types are added to the compiler. There's no precedent of adding new types to an existing program (they are all added once the entire program is known) so some work needs to be done there
  • When you do debugger some things don't work. For example free variables aren't found, instance vars inside initialize can't be inspected, and sometimes some vars aren't found when they should.
  • fork randomly crashes, and this affects being able to do things like Process.run

@asterite
Copy link
Member Author

@asterite asterite commented Sep 1, 2021

@straight-shoota Oh, maybe it's better if we make another branch so that any of us can push to it? Otherwise... how are you going to make changes to see if CI works? (if that's what you eventually intend to do)

@straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Sep 1, 2021

Yeah, it might be a good idea to develop this feature in a branch on this repo. We can even make pull requests against that. I think we have still some work for getting this in a shippable state.
Usually it's a maintenance burdon to keep a long-running feature branch up with master. But crystal i is mostly new additions and applies only very few minor changes to existing code. So it shouldn't be hard to keep it around for some time, at least until we have some more stability.
I wouldn't keep it like that forever though (until it's "ready"). Just until we can be sure that the majority of the language is supported and we can exactly define what is still missing.

@asterite asterite mentioned this pull request Sep 2, 2021
@asterite asterite closed this Sep 2, 2021
@asterite asterite deleted the repl branch Sep 2, 2021
@Fryguy
Copy link
Contributor

@Fryguy Fryguy commented Sep 2, 2021

FWIW, the default options on a PR allow maintainers to push into PR branches. If you, when creating the original PR, hadn't changed this checkbox, then anyone with commit access on crystal-lang:master should have been able to push into asterite:repl.

Once you've created a PR, you can change this in the bottom right of the PR:

@asterite
Copy link
Member Author

@asterite asterite commented Sep 2, 2021

@Fryguy Thanks!

I still think it's better to have a branch in this repo. Then for example you can crate a PR that targets that's branch, and letting others push to it is just simpler.

@asterite
Copy link
Member Author

@asterite asterite commented Sep 2, 2021

(also checking out the branch locally is simpler, no need to add remotes...)

@Sija
Copy link
Contributor

@Sija Sija commented Sep 2, 2021

There's one drawback - it duplicates the CI jobs.

@asterite
Copy link
Member Author

@asterite asterite commented Sep 2, 2021

@Sija How?

@Sija
Copy link
Contributor

@Sija Sija commented Sep 2, 2021

@asterite They're being run on both, push and pull_request events simultaneously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet