Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler takes too much time and memory to build project #12879

Open
mohd-akram opened this issue Dec 30, 2022 · 10 comments
Open

Compiler takes too much time and memory to build project #12879

mohd-akram opened this issue Dec 30, 2022 · 10 comments

Comments

@mohd-akram
Copy link

Bug Report

While trying to build this project, I noticed that Crystal takes a long time, and a lot of memory to finish compiling. See the stats below of running command time -l crystal build src/ktistec/server.cr (-v on Linux):

      176.38 real        54.45 user        36.00 sys
          1992302592  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
             4295388  page reclaims
               22119  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                1200  signals received
               11157  voluntary context switches
              453792  involuntary context switches
        146270321513  instructions retired
        180825112116  cycles elapsed
          2739666944  peak memory footprint

That's 2GB of memory and close to 3 minutes to compile. I noticed this issue on a server with limited RAM as the build was failing there. Running cloc --include-lang crystal src lib returns:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Crystal                        576           6159           6725          32559

2GB for 32k lines of code seems excessive. Is this expected?

@asterite
Copy link
Member

Hi!

In general the answer is "Yes, we know it, the compiler is slow and there's nothing you can do about it"

I compiled that project on my machine:

$ time crystal build -s src/ktistec/server.cr
Parse:                             00:00:00.000062667 (   0.77MB)
Semantic (top level):              00:00:19.595361792 ( 397.56MB)
Semantic (new):                    00:00:00.002887875 ( 413.56MB)
Semantic (type declarations):      00:00:00.059406167 ( 413.56MB)
Semantic (abstract def check):     00:00:00.008872958 ( 429.56MB)
Semantic (restrictions augmenter): 00:00:00.011726958 ( 429.56MB)
Semantic (ivars initializers):     00:00:00.011378917 ( 429.56MB)
Semantic (cvars initializers):     00:00:00.027046083 ( 461.56MB)
Semantic (main):                   00:00:08.561638208 (1797.31MB)
Semantic (cleanup):                00:00:00.000362167 (1797.31MB)
Semantic (recursive struct check): 00:00:00.001437250 (1797.31MB)
Codegen (crystal):                 00:00:02.997802166 (1925.31MB)
Codegen (bc+obj):                  00:00:05.747257750 (1925.31MB)
Codegen (linking):                 00:00:00.541649917 (1925.31MB)
dsymutil:                          00:00:00.300245417 (1925.31MB)

Macro runs:
 - /Users/aryborenszweig/Sandbox/ktistec/lib/slang/src/slang/process.cr: 00:00:06.573074375
 - /opt/homebrew/Cellar/crystal/1.6.2/share/crystal/src/ecr/process.cr: 00:00:05.429221958

Codegen (bc+obj):
 - no previous .o files were reused
crystal build -s src/ktistec/server.cr  41.25s user 8.12s system 129% cpu 38.063 total

So about 40 seconds? But it's the first compilation. Subsequent compilations should take less time:

$ time crystal build -s src/ktistec/server.cr
Parse:                             00:00:00.000050375 (   0.77MB)
Semantic (top level):              00:00:04.025315084 ( 269.33MB)
Semantic (new):                    00:00:00.003109416 ( 269.33MB)
Semantic (type declarations):      00:00:00.052643666 ( 269.33MB)
Semantic (abstract def check):     00:00:00.008744125 ( 285.33MB)
Semantic (restrictions augmenter): 00:00:00.011646625 ( 285.33MB)
Semantic (ivars initializers):     00:00:00.009647083 ( 285.33MB)
Semantic (cvars initializers):     00:00:00.026696750 ( 317.33MB)
Semantic (main):                   00:00:08.528931208 (1605.23MB)
Semantic (cleanup):                00:00:00.000367458 (1605.23MB)
Semantic (recursive struct check): 00:00:00.001348750 (1605.23MB)
Codegen (crystal):                 00:00:02.932395917 (1749.23MB)
Codegen (bc+obj):                  00:00:03.202873083 (1749.23MB)
Codegen (linking):                 00:00:00.531656542 (1749.23MB)
dsymutil:                          00:00:00.286188458 (1749.23MB)

Macro runs:
 - /Users/aryborenszweig/Sandbox/ktistec/lib/slang/src/slang/process.cr: reused previous compilation (00:00:00.003257750)
 - /opt/homebrew/Cellar/crystal/1.6.2/share/crystal/src/ecr/process.cr: reused previous compilation (00:00:00.003397875)

Codegen (bc+obj):
 - 1554/1579 .o files were reused

These modules were not reused:
- ...
crystal build -s src/ktistec/server.cr  18.95s user 4.95s system 121% cpu 19.686 total

So about 20 seconds. It's still a lot: you'll always have to wait at least 20 seconds to compile the entire program. The way Crystal exists, there's no way around it.

My machine in a Mac M1. So one solution could be: get a better machine.

The other solution would be to introduce incremental or modular compilation, but that's impossible without greatly changing the language, and it also involves a lot of thinking and effort. It's unlikely to happen soon, or ever.

@asterite
Copy link
Member

Also about the memory: the compiler will hold the entire program (your code but also libs) in memory, for every compilation. That's the way Crystal works. So it's bound to use lots and lots of memory. And, like before, there's nothing you can do about it.

@mohd-akram
Copy link
Author

I was somewhat familiar with the time issue due to Crystal's type inference and the like, but perhaps the memory issue is more approachable? I tried doing a memory profile, unsurprisingly I suppose most allocations seem related to ASTNode. Perhaps that structure can be optimized. I think even if you put the whole program in memory, I feel like the memory usage should be lower than this.

@asterite
Copy link
Member

I tried to do that in the past but it led to subtle bugs.

Im any case, such optimizations will be good for a while but then your program keeps growing and it will inevitably use more and more memory.

Without modular compilation all such optimization, in the grand scheme of things, are useless.

@mohd-akram
Copy link
Author

Have any experiments been done with using a custom memory allocator such as mimalloc? From another compiler project, switching to it had a significant performance effect.

@asterite
Copy link
Member

I think someone tried using jemalloc but I don't know much about it. That said, the memory allocator is boehm GC and I don't know if that can be integrated with another allocator.

@bararchy
Copy link
Contributor

Actually I think @jwoertink did tests with different allocators for building Lucky and I remember something about large speedups

@jwoertink
Copy link
Contributor

That was actually @wyhaines that I copied it from 🥳 but jemalloc gives you a small boost, and hoard gives you a huge boost. Though, I believe there's some other tradeoffs that I can't remember at the moment. I think you just point your LD_PRELOAD to the allocator .so file, and then magic 🤷‍♂️

@straight-shoota
Copy link
Member

Citing @BlobCodes from #13060 (comment)

Other interesting metric: The GC was responsible for over 60% of the compile time. Maybe the compiler should be improved using data-oriented design.

Yes, that's definitely a route for performance improvements. There have also been some efforts about switching the GC implementation, which would affect runtime performance of all Crystal programs. But optimizing the compiler architecture would be a good thing

@toddsundsted
Copy link
Contributor

fwiw, i just made changes to the project above to address build times.

a huge contributing factor was the inlining of the functions taking blocks. using procs instead of blocks cut the executable size by about a third. i don't think this is a general criticism of the implementation of blocks—i just happened to be passing blocks to large functions in generated code. there's definitely more to criticize in my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants