New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve JSON and YAML exception locations #4855

Merged
merged 2 commits into from Aug 19, 2017

Conversation

Projects
None yet
6 participants
@asterite
Contributor

asterite commented Aug 19, 2017

This PR improves the error location reported when parsing invalid or unexpected JSON/YAML sources. Many places in the current code were using {0, 0} as a location, which probably made debugging invalid sources a bit hard.

For example, in this snippet:

require "yaml"

class Foo
  YAML.mapping({
    foo: Int32,
  })
end

source = <<-YAML
  foo: hello
  YAML

Foo.from_yaml(source)

Before we got:

Invalid Int32: hello (YAML::ParseException)

Now we get:

Invalid Int32: hello at line 1, column 6 (YAML::ParseException)

Similarly, when an attribute is missing or it's unknown (in strict mode) the reported error will be correct this time.

@RX14

RX14 approved these changes Aug 19, 2017

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Aug 19, 2017

Contributor

It seems the compiler can't compile the specs in linux 64 bits because of lack of memory (or the compiler eating it all up).

One of the reasons memory grew in the last time is because the codegen now uses one LLVM::Context pero LLVM::Module, with one module per crystal type, more or less. I could merge this and then send a PR to revert that change and use a single LLVM::Context like before, because we still don't need it (we'll need it once we have parallelism).

Or maybe we can run separately:

make crystal
make std_spec
make compiler_spec

That should consume a bit less memory...

Thoughts?

Contributor

asterite commented Aug 19, 2017

It seems the compiler can't compile the specs in linux 64 bits because of lack of memory (or the compiler eating it all up).

One of the reasons memory grew in the last time is because the codegen now uses one LLVM::Context pero LLVM::Module, with one module per crystal type, more or less. I could merge this and then send a PR to revert that change and use a single LLVM::Context like before, because we still don't need it (we'll need it once we have parallelism).

Or maybe we can run separately:

make crystal
make std_spec
make compiler_spec

That should consume a bit less memory...

Thoughts?

@kirbyfan64

This comment has been minimized.

Show comment
Hide comment
@kirbyfan64

kirbyfan64 Aug 19, 2017

Contributor

I could merge this and then send a PR to revert that change and use a single LLVM::Context like before, because we still don't need it (we'll need it once we have parallelism).

Isn't that just delaying the inevitable, though? At some point it'll need to be resolved.

Also it's really neat to see a PR from you again!

Contributor

kirbyfan64 commented Aug 19, 2017

I could merge this and then send a PR to revert that change and use a single LLVM::Context like before, because we still don't need it (we'll need it once we have parallelism).

Isn't that just delaying the inevitable, though? At some point it'll need to be resolved.

Also it's really neat to see a PR from you again!

@mverzilli

This comment has been minimized.

Show comment
Hide comment
@mverzilli

mverzilli Aug 19, 2017

Member

Glad you took a look at that :). I was looking at it because I want to unblock #4707. Both solutions are temporal, I'd go with splitting the runnables for now, but no strong opinion. What I know is we have to apply one of them now because a long term solution will not be available anytime soon.

Member

mverzilli commented Aug 19, 2017

Glad you took a look at that :). I was looking at it because I want to unblock #4707. Both solutions are temporal, I'd go with splitting the runnables for now, but no strong opinion. What I know is we have to apply one of them now because a long term solution will not be available anytime soon.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Aug 19, 2017

Contributor

@kirbyfan64 Yes, this is delaying the inevitable. This is why almost every day I think "It's fun to program in Crystal, but it is scalable?". I don't have an answer for that, and from time to time I lose hope and quit. Then I think "maybe someone some day will come with a solution" and continue helping Crystal grow in areas where solutions exist. But I don't know, I personally can't see a way to make compilation modular (without dramatically changing the language) to allow compiling more and more code, and I've expressed this feeling many times now. That's why I think focusing on solving this first is the most important thing to do right now. Not Windows, not parallelism, not adding more features or fixing small bugs, but finding a way to make the language scale, and doing it, not just believing it can or will be done.

People at Manas told me there is a solution (they believe there is a solution) so in the meantime I'll continue programming stuff that I find interesting/fun without worrying about whether all of it makes sense. Much like playing a video game doesn't have a purpose other than having fun.

Contributor

asterite commented Aug 19, 2017

@kirbyfan64 Yes, this is delaying the inevitable. This is why almost every day I think "It's fun to program in Crystal, but it is scalable?". I don't have an answer for that, and from time to time I lose hope and quit. Then I think "maybe someone some day will come with a solution" and continue helping Crystal grow in areas where solutions exist. But I don't know, I personally can't see a way to make compilation modular (without dramatically changing the language) to allow compiling more and more code, and I've expressed this feeling many times now. That's why I think focusing on solving this first is the most important thing to do right now. Not Windows, not parallelism, not adding more features or fixing small bugs, but finding a way to make the language scale, and doing it, not just believing it can or will be done.

People at Manas told me there is a solution (they believe there is a solution) so in the meantime I'll continue programming stuff that I find interesting/fun without worrying about whether all of it makes sense. Much like playing a video game doesn't have a purpose other than having fun.

@akzhan

This comment has been minimized.

Show comment
Hide comment
@akzhan

akzhan Aug 19, 2017

Contributor

@asterite what about designing of new language? Something more scalable with union types and modules.

And thread aware from start.

Contributor

akzhan commented Aug 19, 2017

@asterite what about designing of new language? Something more scalable with union types and modules.

And thread aware from start.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Aug 19, 2017

Contributor

@akzhan I have no idea what that language could look like without being one of the existing languages (Go, Rust, Pony, Nim, D). Crystal has unique features, but those features don't exist in other statically typed languages for a very good reason (or that's at least what I think now, though I have no proof for that): they don't scale to big programs.

Contributor

asterite commented Aug 19, 2017

@akzhan I have no idea what that language could look like without being one of the existing languages (Go, Rust, Pony, Nim, D). Crystal has unique features, but those features don't exist in other statically typed languages for a very good reason (or that's at least what I think now, though I have no proof for that): they don't scale to big programs.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Aug 19, 2017

Contributor

On the other hand, the compiler is where this issue happens most. I didn't see any other program to present this problem. The compiler has a big AST hierarchy and many, many types and lines of codes. I don't think the typical Crystal program has many lines or such a complex type hierarchy, so the problem won't happen there. By running all the specs in halves the problem will be kind of fixed.

Contributor

asterite commented Aug 19, 2017

On the other hand, the compiler is where this issue happens most. I didn't see any other program to present this problem. The compiler has a big AST hierarchy and many, many types and lines of codes. I don't think the typical Crystal program has many lines or such a complex type hierarchy, so the problem won't happen there. By running all the specs in halves the problem will be kind of fixed.

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Aug 19, 2017

Member

I don't think anyone has actually tried to take an optimiser to the compiler and perform non-architectural performance tweaks. Sure, having a compiler architecture which is inherently fast is great but I think there may be some performance low hanging fruit in the compiler, especially in memory usage, which will help scale the compiler some more.

Regarding scaling in general, the worst case possible solution is that we add a concept of "modules" which break the program into pieces, each of which can be compiled separately. That's the worst case, and to be perfectly honest, it wouldn't be the end of the world. I believe there's a solution which doesn't involve doing that, but I feel comfortable knowing there's an ugly solution which will allow us to scale. Also, microservices are cool, we could always tell people who hit compilation time problems to split up their monolith!

I also think that we can't bury our heads in the sand and give up because we don't have a solution to scaling. The truth is, the more popular Crystal gets, the more smart people are likely to work on the compiler and tackle the scaling problem for themselves. Perhaps there's some neat hack, or compiler theory, we haven't thought of to enable incremental complication which is easy to implement. Who knows.

I've been thinking about this problem and Crystal's community problem for a while and I think that growing a larger community and getting 5x, 10x the number of people involved, and most importantly the coolness factor, is likely to solve at least some of our problems.

Member

RX14 commented Aug 19, 2017

I don't think anyone has actually tried to take an optimiser to the compiler and perform non-architectural performance tweaks. Sure, having a compiler architecture which is inherently fast is great but I think there may be some performance low hanging fruit in the compiler, especially in memory usage, which will help scale the compiler some more.

Regarding scaling in general, the worst case possible solution is that we add a concept of "modules" which break the program into pieces, each of which can be compiled separately. That's the worst case, and to be perfectly honest, it wouldn't be the end of the world. I believe there's a solution which doesn't involve doing that, but I feel comfortable knowing there's an ugly solution which will allow us to scale. Also, microservices are cool, we could always tell people who hit compilation time problems to split up their monolith!

I also think that we can't bury our heads in the sand and give up because we don't have a solution to scaling. The truth is, the more popular Crystal gets, the more smart people are likely to work on the compiler and tackle the scaling problem for themselves. Perhaps there's some neat hack, or compiler theory, we haven't thought of to enable incremental complication which is easy to implement. Who knows.

I've been thinking about this problem and Crystal's community problem for a while and I think that growing a larger community and getting 5x, 10x the number of people involved, and most importantly the coolness factor, is likely to solve at least some of our problems.

@mverzilli

This comment has been minimized.

Show comment
Hide comment
@mverzilli

mverzilli Aug 19, 2017

Member

In situations like this I like to think of the worst case scenario. Worst case, we reach to a point where we are convinced that Crystal doesn't scale beyond the size of its own compiler. So we end up with a great language which you can't use to write programs over 40kLOC and which forces you to split you projects in services beyond that. I don't know about the rest of the community, but for a worst case, I'd be quite damn happy with that :).

Member

mverzilli commented Aug 19, 2017

In situations like this I like to think of the worst case scenario. Worst case, we reach to a point where we are convinced that Crystal doesn't scale beyond the size of its own compiler. So we end up with a great language which you can't use to write programs over 40kLOC and which forces you to split you projects in services beyond that. I don't know about the rest of the community, but for a worst case, I'd be quite damn happy with that :).

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Aug 19, 2017

Contributor

@RX14 @mverzilli Well said!

By the way, I changed bin/ci to do make crystal std_spec compiler_spec doc instead of make crystal spec doc and now the specs pass 🎉

If approved, I'll merge this PR with that change.

Contributor

asterite commented Aug 19, 2017

@RX14 @mverzilli Well said!

By the way, I changed bin/ci to do make crystal std_spec compiler_spec doc instead of make crystal spec doc and now the specs pass 🎉

If approved, I'll merge this PR with that change.

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Aug 19, 2017

Member

@mverzilli I think it's pretty clear that you can scale beyond that inside a single executable as long as you can define seperable parts which you know will have a consistent binary ABI. And I don't see any reason why explicitly defining those modules wouldn't be possible. So I like to think the worst case is slightly better than you think!

Member

RX14 commented Aug 19, 2017

@mverzilli I think it's pretty clear that you can scale beyond that inside a single executable as long as you can define seperable parts which you know will have a consistent binary ABI. And I don't see any reason why explicitly defining those modules wouldn't be possible. So I like to think the worst case is slightly better than you think!

@RX14

RX14 approved these changes Aug 19, 2017 edited

I don't like having to change this to reduce memory usage but it's been holding up PRs and being intermittent for a while so we should just use this workaround. @asterite you mentioned another possible memory optimization, it would be nice to get that merged in addition to this if that's sane.

@asterite asterite merged commit e811cf9 into master Aug 19, 2017

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@asterite asterite deleted the feature/improve-json-yaml-error-location branch Aug 19, 2017

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Aug 19, 2017

Contributor

@RX14 Hmm, I tried using a single LLVM::Context instead of many and the memory usage, I was probably mistaken.

Contributor

asterite commented Aug 19, 2017

@RX14 Hmm, I tried using a single LLVM::Context instead of many and the memory usage, I was probably mistaken.

@straight-shoota

This comment has been minimized.

Show comment
Hide comment
@straight-shoota

straight-shoota Aug 20, 2017

Contributor

@asterite It looks like you missed a significant verb... ;) What did the memory usage do?

Contributor

straight-shoota commented Aug 20, 2017

@asterite It looks like you missed a significant verb... ;) What did the memory usage do?

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Aug 20, 2017

Contributor

Ah, the memory usage remained the same. Right now on my machine it takes about 2GB to compile the compiler. Brutal.

Contributor

asterite commented Aug 20, 2017

Ah, the memory usage remained the same. Right now on my machine it takes about 2GB to compile the compiler. Brutal.

@kirbyfan64

This comment has been minimized.

Show comment
Hide comment
@kirbyfan64

kirbyfan64 Aug 20, 2017

Contributor

A little off-topic, but would a custom GC help? I know of GCs that have options for being more memory efficient. Assuming the compiler is creating a crapton of objects, switching from Boehm would probably give either decent speed gains or memory gains or both.

Contributor

kirbyfan64 commented Aug 20, 2017

A little off-topic, but would a custom GC help? I know of GCs that have options for being more memory efficient. Assuming the compiler is creating a crapton of objects, switching from Boehm would probably give either decent speed gains or memory gains or both.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Aug 20, 2017

Contributor

@kirbyfan64 The problem is that the compiler is creating a crapton of objects that can't be released with the current compiler's design. It's not about the GC.

Contributor

asterite commented Aug 20, 2017

@kirbyfan64 The problem is that the compiler is creating a crapton of objects that can't be released with the current compiler's design. It's not about the GC.

@kirbyfan64

This comment has been minimized.

Show comment
Hide comment
@kirbyfan64

kirbyfan64 Aug 20, 2017

Contributor

It can be, though!

A custom GC allows a custom allocator and more intimate language knowledge. You can implement language-specific optimizations and adjust the internal memory management structures accordingly. Even a savings of a word per object can seriously add up when thousands of objects are allocated.

In addition, generational GCs work better in languages where objects are more frequently allocated like Crystal.

I mean, at some point Crystal's going to outgrow Boehm anyway. It might be at least work some slight investigation.

Contributor

kirbyfan64 commented Aug 20, 2017

It can be, though!

A custom GC allows a custom allocator and more intimate language knowledge. You can implement language-specific optimizations and adjust the internal memory management structures accordingly. Even a savings of a word per object can seriously add up when thousands of objects are allocated.

In addition, generational GCs work better in languages where objects are more frequently allocated like Crystal.

I mean, at some point Crystal's going to outgrow Boehm anyway. It might be at least work some slight investigation.

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Aug 20, 2017

Member

A different GC isn't going to change which object can or cannot be GCed.

Member

RX14 commented Aug 20, 2017

A different GC isn't going to change which object can or cannot be GCed.

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Aug 20, 2017

Member

@asterite 2GB is nothing compared to all_spec which takes 8gb of ram to compile.

Member

RX14 commented Aug 20, 2017

@asterite 2GB is nothing compared to all_spec which takes 8gb of ram to compile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment