Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upImprove JSON and YAML exception locations #4855
Conversation
asterite
added
kind:feature
pr:needs-review
topic:stdlib
labels
Aug 19, 2017
asterite
self-assigned this
Aug 19, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
asterite
Aug 19, 2017
Contributor
It seems the compiler can't compile the specs in linux 64 bits because of lack of memory (or the compiler eating it all up).
One of the reasons memory grew in the last time is because the codegen now uses one LLVM::Context pero LLVM::Module, with one module per crystal type, more or less. I could merge this and then send a PR to revert that change and use a single LLVM::Context like before, because we still don't need it (we'll need it once we have parallelism).
Or maybe we can run separately:
make crystal
make std_spec
make compiler_spec
That should consume a bit less memory...
Thoughts?
|
It seems the compiler can't compile the specs in linux 64 bits because of lack of memory (or the compiler eating it all up). One of the reasons memory grew in the last time is because the codegen now uses one Or maybe we can run separately:
That should consume a bit less memory... Thoughts? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kirbyfan64
Aug 19, 2017
Contributor
I could merge this and then send a PR to revert that change and use a single LLVM::Context like before, because we still don't need it (we'll need it once we have parallelism).
Isn't that just delaying the inevitable, though? At some point it'll need to be resolved.
Also it's really neat to see a PR from you again!
Isn't that just delaying the inevitable, though? At some point it'll need to be resolved. Also it's really neat to see a PR from you again! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mverzilli
Aug 19, 2017
Member
Glad you took a look at that :). I was looking at it because I want to unblock #4707. Both solutions are temporal, I'd go with splitting the runnables for now, but no strong opinion. What I know is we have to apply one of them now because a long term solution will not be available anytime soon.
|
Glad you took a look at that :). I was looking at it because I want to unblock #4707. Both solutions are temporal, I'd go with splitting the runnables for now, but no strong opinion. What I know is we have to apply one of them now because a long term solution will not be available anytime soon. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
asterite
Aug 19, 2017
Contributor
@kirbyfan64 Yes, this is delaying the inevitable. This is why almost every day I think "It's fun to program in Crystal, but it is scalable?". I don't have an answer for that, and from time to time I lose hope and quit. Then I think "maybe someone some day will come with a solution" and continue helping Crystal grow in areas where solutions exist. But I don't know, I personally can't see a way to make compilation modular (without dramatically changing the language) to allow compiling more and more code, and I've expressed this feeling many times now. That's why I think focusing on solving this first is the most important thing to do right now. Not Windows, not parallelism, not adding more features or fixing small bugs, but finding a way to make the language scale, and doing it, not just believing it can or will be done.
People at Manas told me there is a solution (they believe there is a solution) so in the meantime I'll continue programming stuff that I find interesting/fun without worrying about whether all of it makes sense. Much like playing a video game doesn't have a purpose other than having fun.
|
@kirbyfan64 Yes, this is delaying the inevitable. This is why almost every day I think "It's fun to program in Crystal, but it is scalable?". I don't have an answer for that, and from time to time I lose hope and quit. Then I think "maybe someone some day will come with a solution" and continue helping Crystal grow in areas where solutions exist. But I don't know, I personally can't see a way to make compilation modular (without dramatically changing the language) to allow compiling more and more code, and I've expressed this feeling many times now. That's why I think focusing on solving this first is the most important thing to do right now. Not Windows, not parallelism, not adding more features or fixing small bugs, but finding a way to make the language scale, and doing it, not just believing it can or will be done. People at Manas told me there is a solution (they believe there is a solution) so in the meantime I'll continue programming stuff that I find interesting/fun without worrying about whether all of it makes sense. Much like playing a video game doesn't have a purpose other than having fun. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
akzhan
Aug 19, 2017
Contributor
@asterite what about designing of new language? Something more scalable with union types and modules.
And thread aware from start.
|
@asterite what about designing of new language? Something more scalable with union types and modules. And thread aware from start. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
asterite
Aug 19, 2017
Contributor
@akzhan I have no idea what that language could look like without being one of the existing languages (Go, Rust, Pony, Nim, D). Crystal has unique features, but those features don't exist in other statically typed languages for a very good reason (or that's at least what I think now, though I have no proof for that): they don't scale to big programs.
|
@akzhan I have no idea what that language could look like without being one of the existing languages (Go, Rust, Pony, Nim, D). Crystal has unique features, but those features don't exist in other statically typed languages for a very good reason (or that's at least what I think now, though I have no proof for that): they don't scale to big programs. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
asterite
Aug 19, 2017
Contributor
On the other hand, the compiler is where this issue happens most. I didn't see any other program to present this problem. The compiler has a big AST hierarchy and many, many types and lines of codes. I don't think the typical Crystal program has many lines or such a complex type hierarchy, so the problem won't happen there. By running all the specs in halves the problem will be kind of fixed.
|
On the other hand, the compiler is where this issue happens most. I didn't see any other program to present this problem. The compiler has a big AST hierarchy and many, many types and lines of codes. I don't think the typical Crystal program has many lines or such a complex type hierarchy, so the problem won't happen there. By running all the specs in halves the problem will be kind of fixed. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RX14
Aug 19, 2017
Member
I don't think anyone has actually tried to take an optimiser to the compiler and perform non-architectural performance tweaks. Sure, having a compiler architecture which is inherently fast is great but I think there may be some performance low hanging fruit in the compiler, especially in memory usage, which will help scale the compiler some more.
Regarding scaling in general, the worst case possible solution is that we add a concept of "modules" which break the program into pieces, each of which can be compiled separately. That's the worst case, and to be perfectly honest, it wouldn't be the end of the world. I believe there's a solution which doesn't involve doing that, but I feel comfortable knowing there's an ugly solution which will allow us to scale. Also, microservices are cool, we could always tell people who hit compilation time problems to split up their monolith!
I also think that we can't bury our heads in the sand and give up because we don't have a solution to scaling. The truth is, the more popular Crystal gets, the more smart people are likely to work on the compiler and tackle the scaling problem for themselves. Perhaps there's some neat hack, or compiler theory, we haven't thought of to enable incremental complication which is easy to implement. Who knows.
I've been thinking about this problem and Crystal's community problem for a while and I think that growing a larger community and getting 5x, 10x the number of people involved, and most importantly the coolness factor, is likely to solve at least some of our problems.
|
I don't think anyone has actually tried to take an optimiser to the compiler and perform non-architectural performance tweaks. Sure, having a compiler architecture which is inherently fast is great but I think there may be some performance low hanging fruit in the compiler, especially in memory usage, which will help scale the compiler some more. Regarding scaling in general, the worst case possible solution is that we add a concept of "modules" which break the program into pieces, each of which can be compiled separately. That's the worst case, and to be perfectly honest, it wouldn't be the end of the world. I believe there's a solution which doesn't involve doing that, but I feel comfortable knowing there's an ugly solution which will allow us to scale. Also, microservices are cool, we could always tell people who hit compilation time problems to split up their monolith! I also think that we can't bury our heads in the sand and give up because we don't have a solution to scaling. The truth is, the more popular Crystal gets, the more smart people are likely to work on the compiler and tackle the scaling problem for themselves. Perhaps there's some neat hack, or compiler theory, we haven't thought of to enable incremental complication which is easy to implement. Who knows. I've been thinking about this problem and Crystal's community problem for a while and I think that growing a larger community and getting 5x, 10x the number of people involved, and most importantly the coolness factor, is likely to solve at least some of our problems. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mverzilli
Aug 19, 2017
Member
In situations like this I like to think of the worst case scenario. Worst case, we reach to a point where we are convinced that Crystal doesn't scale beyond the size of its own compiler. So we end up with a great language which you can't use to write programs over 40kLOC and which forces you to split you projects in services beyond that. I don't know about the rest of the community, but for a worst case, I'd be quite damn happy with that :).
|
In situations like this I like to think of the worst case scenario. Worst case, we reach to a point where we are convinced that Crystal doesn't scale beyond the size of its own compiler. So we end up with a great language which you can't use to write programs over 40kLOC and which forces you to split you projects in services beyond that. I don't know about the rest of the community, but for a worst case, I'd be quite damn happy with that :). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
asterite
Aug 19, 2017
Contributor
@RX14 @mverzilli Well said!
By the way, I changed bin/ci to do make crystal std_spec compiler_spec doc instead of make crystal spec doc and now the specs pass
If approved, I'll merge this PR with that change.
|
@RX14 @mverzilli Well said! By the way, I changed If approved, I'll merge this PR with that change. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RX14
Aug 19, 2017
Member
@mverzilli I think it's pretty clear that you can scale beyond that inside a single executable as long as you can define seperable parts which you know will have a consistent binary ABI. And I don't see any reason why explicitly defining those modules wouldn't be possible. So I like to think the worst case is slightly better than you think!
|
@mverzilli I think it's pretty clear that you can scale beyond that inside a single executable as long as you can define seperable parts which you know will have a consistent binary ABI. And I don't see any reason why explicitly defining those modules wouldn't be possible. So I like to think the worst case is slightly better than you think! |
RX14
approved these changes
Aug 19, 2017
•
I don't like having to change this to reduce memory usage but it's been holding up PRs and being intermittent for a while so we should just use this workaround. @asterite you mentioned another possible memory optimization, it would be nice to get that merged in addition to this if that's sane.
asterite
merged commit e811cf9
into
master
Aug 19, 2017
asterite
deleted the
feature/improve-json-yaml-error-location
branch
Aug 19, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
asterite
Aug 19, 2017
Contributor
@RX14 Hmm, I tried using a single LLVM::Context instead of many and the memory usage, I was probably mistaken.
|
@RX14 Hmm, I tried using a single |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
straight-shoota
Aug 20, 2017
Contributor
@asterite It looks like you missed a significant verb... ;) What did the memory usage do?
|
@asterite It looks like you missed a significant verb... ;) What did the memory usage do? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
asterite
Aug 20, 2017
Contributor
Ah, the memory usage remained the same. Right now on my machine it takes about 2GB to compile the compiler. Brutal.
|
Ah, the memory usage remained the same. Right now on my machine it takes about 2GB to compile the compiler. Brutal. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kirbyfan64
Aug 20, 2017
Contributor
A little off-topic, but would a custom GC help? I know of GCs that have options for being more memory efficient. Assuming the compiler is creating a crapton of objects, switching from Boehm would probably give either decent speed gains or memory gains or both.
|
A little off-topic, but would a custom GC help? I know of GCs that have options for being more memory efficient. Assuming the compiler is creating a crapton of objects, switching from Boehm would probably give either decent speed gains or memory gains or both. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
asterite
Aug 20, 2017
Contributor
@kirbyfan64 The problem is that the compiler is creating a crapton of objects that can't be released with the current compiler's design. It's not about the GC.
|
@kirbyfan64 The problem is that the compiler is creating a crapton of objects that can't be released with the current compiler's design. It's not about the GC. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kirbyfan64
Aug 20, 2017
Contributor
It can be, though!
A custom GC allows a custom allocator and more intimate language knowledge. You can implement language-specific optimizations and adjust the internal memory management structures accordingly. Even a savings of a word per object can seriously add up when thousands of objects are allocated.
In addition, generational GCs work better in languages where objects are more frequently allocated like Crystal.
I mean, at some point Crystal's going to outgrow Boehm anyway. It might be at least work some slight investigation.
|
It can be, though! A custom GC allows a custom allocator and more intimate language knowledge. You can implement language-specific optimizations and adjust the internal memory management structures accordingly. Even a savings of a word per object can seriously add up when thousands of objects are allocated. In addition, generational GCs work better in languages where objects are more frequently allocated like Crystal. I mean, at some point Crystal's going to outgrow Boehm anyway. It might be at least work some slight investigation. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
A different GC isn't going to change which object can or cannot be GCed. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RX14
Aug 20, 2017
Member
@asterite 2GB is nothing compared to all_spec which takes 8gb of ram to compile.
|
@asterite 2GB is nothing compared to |
asterite commentedAug 19, 2017
This PR improves the error location reported when parsing invalid or unexpected JSON/YAML sources. Many places in the current code were using
{0, 0}as a location, which probably made debugging invalid sources a bit hard.For example, in this snippet:
Before we got:
Now we get:
Similarly, when an attribute is missing or it's unknown (in strict mode) the reported error will be correct this time.