Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce startup time #135

laci37 opened this issue Jul 31, 2015 · 19 comments

Reduce startup time #135

laci37 opened this issue Jul 31, 2015 · 19 comments


Copy link

laci37 commented Jul 31, 2015

Starting a new REPL takes 5 to 10 seconds. This should be decreased, as this becomes annoying if you are using tmux and launch a new shell for each program you are working with.

Copy link

lihaoyi commented Aug 3, 2015

Mostly fixed by #138

@lihaoyi lihaoyi closed this as completed Aug 3, 2015
@lihaoyi lihaoyi reopened this Aug 3, 2015
Copy link

lihaoyi commented Aug 3, 2015

Here's the current breakdown of time spent, from a cold start (i.e. running the executable), totaling about ~3600ms:

Repl init printer: 84.54493
Interpreter init Preprocess: 30.836161
Interpreterinit eval: 60.138826
Interpreter init init 0: 0.124249
Interpreter init init compiler: 1604.931653
Interpreter init init pressy: 8.220424
Interpreter init init: 0.097012
processScript 0: 0.430406
processScript 0a: 1064.453571
processScript 0b: 4.182362
processScript 1: 77.239036
processScript 2: 2.145553
processScript loop 0: 1.507183
processScript loop 1: 0.225567
cachedCompileBlock 0: 0.644507
cachedCompileBlock 1: 20.326575
importBlock 0: 1.293807
importBlock 1: 1.857475
cachedCompileBlock 2: 0.288193
cachedCompileBlock 3: 179.775141
cachedCompileBlock 4: 0.851549
cachedCompileBlock: 3.291105
evalMain: 0.898124
evaluationResult: 1.203557
processScript loop 2: 0.500568
processScript loop 0: 1.375926
processScript loop 1: 0.082119
cachedCompileBlock 0: 0.057498
cachedCompileBlock 1: 5.546784
importBlock 0: 0.571201
importBlock 1: 87.550585
cachedCompileBlock 2: 0.432626
cachedCompileBlock 3: 11.731382
cachedCompileBlock 4: 0.677519
cachedCompileBlock: 0.664165
evalMain: 0.874076
evaluationResult: 0.209653
processScript loop 2: 0.08605
processScript 3: 0.262857
processScript 0: 0.093635
processScript 0a: 22.522637
processScript 0b: 1.340209
processScript 1: 11.727824
processScript 2: 0.079972
processScript loop 0: 0.047323
processScript loop 1: 0.033937
cachedCompileBlock 0: 0.030247
cachedCompileBlock 1: 1.499044
importBlock 0: 0.077171
importBlock 1: 0.078691
cachedCompileBlock 2: 0.076055
cachedCompileBlock 3: 8.645031
cachedCompileBlock 4: 0.087303
cachedCompileBlock: 0.079682
evalMain: 0.075286
evaluationResult: 0.136687
processScript loop 2: 0.050073
processScript loop 0: 0.150977
processScript loop 1: 0.056949
cachedCompileBlock 0: 0.046464
cachedCompileBlock 1: 2.239923
importBlock 0: 0.075145
importBlock 1: 5.431956
cachedCompileBlock 2: 0.10245
cachedCompileBlock 3: 8.266911
cachedCompileBlock 4: 0.079038
cachedCompileBlock: 0.604941
evalMain: 4.118145
evaluationResult: 0.240224
processScript loop 2: 0.05571
processScript 3: 0.191214
Interpreter init predef 0: 0.052892
Interpreter init init 0: 0.038423
Interpreter init init compiler: 265.805788
Interpreter init init pressy: 0.094988
Interpreter init predef 1: 0.200351
Repl init interpreter: 0.059778
FrontEnd.Ammonite.action start: 5.556636
FrontEnd.Ammonite.readLine start: 0.091351
FrontEnd.Ammonite.readLine 1: 25.840735

Here's the cost of a hot start: the cost of initializing everything a second time in the same process, totaling ~500ms:

Repl init printer: 0.688851
Interpreter init Preprocess: 0.28367
Interpreterinit eval: 3.713168
Interpreter init init 0: 0.048788
Interpreter init init compiler: 232.008386
Interpreter init init pressy: 0.073394
Interpreter init init: 0.023952
processScript 0: 0.0194
processScript 0a: 1.04189
processScript 0b: 1.644444
processScript 1: 3.342866
processScript 2: 0.255964
processScript loop 0: 0.141631
processScript loop 1: 0.183286
cachedCompileBlock 0: 0.0929
cachedCompileBlock 1: 0.568324
importBlock 0: 0.036594
importBlock 1: 0.05212
cachedCompileBlock 2: 0.029661
cachedCompileBlock 3: 6.491874
cachedCompileBlock 4: 0.09878
cachedCompileBlock: 1.146064
evalMain: 1.174745
evaluationResult: 0.278112
processScript loop 2: 0.043369
processScript loop 0: 0.324716
processScript loop 1: 0.033573
cachedCompileBlock 0: 0.019511
cachedCompileBlock 1: 1.108908
importBlock 0: 0.042798
importBlock 1: 1.276786
cachedCompileBlock 2: 0.0535
cachedCompileBlock 3: 7.41467
cachedCompileBlock 4: 0.345641
cachedCompileBlock: 1.010325
evalMain: 0.987365
evaluationResult: 0.082922
processScript loop 2: 0.154863
processScript 3: 0.278478
processScript 0: 0.078804
processScript 0a: 1.041454
processScript 0b: 0.765162
processScript 1: 3.003545
processScript 2: 0.103447
processScript loop 0: 0.048784
processScript loop 1: 0.028001
cachedCompileBlock 0: 0.022096
cachedCompileBlock 1: 0.831282
importBlock 0: 0.044502
importBlock 1: 0.038198
cachedCompileBlock 2: 0.031764
cachedCompileBlock 3: 3.477881
cachedCompileBlock 4: 0.032899
cachedCompileBlock: 0.077103
evalMain: 0.115308
evaluationResult: 0.037246
processScript loop 2: 0.015204
processScript loop 0: 0.036274
processScript loop 1: 0.013951
cachedCompileBlock 0: 0.011799
cachedCompileBlock 1: 1.081051
importBlock 0: 0.034951
importBlock 1: 1.251119
cachedCompileBlock 2: 0.04715
cachedCompileBlock 3: 5.4183
cachedCompileBlock 4: 0.047208
cachedCompileBlock: 1.184805
evalMain: 4.027924
evaluationResult: 0.078648
processScript loop 2: 0.054991
processScript 3: 0.205902
Interpreter init predef 0: 0.033241
Interpreter init init 0: 0.040679
Interpreter init init compiler: 220.017737
Interpreter init init pressy: 0.071013
Interpreter init predef 1: 0.024703
Repl init interpreter: 0.024638
FrontEnd.Ammonite.action start: 2.350002
FrontEnd.Ammonite.readLine start: 0.051353
FrontEnd.Ammonite.readLine 1: 0.496466

Not terrible now (used to be 10s!) but still could probably be sped up

@lihaoyi lihaoyi closed this as completed Aug 3, 2015
@lihaoyi lihaoyi reopened this Aug 3, 2015
Copy link

lihaoyi commented Aug 3, 2015

Looks like about 1600ms is spend initializing scalac the first time. We should be able to work around the usage of scalac in the case where everything is cached. That will speed it up considerably

Another 1000s is spend initializing Scalaparse for the first time. I'm not sure how much of that time is spent classloading and how much is spent in the static initializers, but it seems plausible we could speed it up

Copy link

lihaoyi commented Aug 3, 2015

@laci37 do you want to take a crack at this? I have an idea that should save us the 1600ms to initialize scalac:

  • Cache the result of compileClass(preprocess(...)) instead of just compileClass(...)
  • Initialize Interpreter#compiler only on demand
  • After showing the prompt, kick off compilation asynchronously for a simple string like 1 + 1 to try and get the compiler running before the user enters their first input

This would involve some refactoring to move the caching bits into the correct place, but I'm pretty sure it'll work. That'll bring our startup time into the 1-2s range which is not fast, but not bad =)

Copy link

lihaoyi commented Aug 3, 2015

We could also trivially cache the output of the Parsers.split call, which would allow us to defer that initialization to after we show the prompt, which can also be kicked of asynchronously

Copy link

I think one should allow for Ammonite-REPL to be a non-JVM program connecting to a daemon JVM running Ammonite sessions, though not all will want to use that. That can be done using NailGun ( The daemon JVM should try to keep one prespawned session. This would allow literally instant startup, since nailgun's client is written in C. I've used this in the past for Blaisorblade/git-filter-branch-msgs, so that I could use invoke a JVM program inside a bash loop.

Nailgun is a client, protocol, and server for running Java programs from the command line without incurring the JVM startup overhead.

Copy link

lihaoyi commented Aug 16, 2015

I think one should allow for Ammonite-REPL to be a non-JVM program connecting to a daemon JVM running Ammonite sessions

Yep! @laci37 actually wanted to do that and it was part of the original GSOC proposal. We've since been distracted with other slightly-more-high-pri things, but we could always revisit.

If anyone else wants to take a shot at daemonizing Ammonite's JVM at this feel free to take a crack at it and prove out what it'd take.

Copy link

geggo98 commented Nov 9, 2015

A persistent daemon comes with all kinds of problems when it runs for a long time. The most obvious is that when we import the same library in different versions, we can fill up the class cache. So I would advise against it. (At most, use a daemon only for coordination, like how "universal" environment variables are done in fish shell)

My suggestion: Use the approach (or even the code) from Drip: Spin up VMs in a pool and use these already initialized VMs from the pool. Fill up the pool in the background as needed.

In a simple test, I get "startup" times at about 0.126 seconds for Ammonite (with jdk1.8.0_25). Bash needs 0.063 seconds and Dash 0.032 seconds on the same machine. This looks good enough for me.

Copy link

Interesting point about the daemon. And Drip is already designed to be incorporated. Thanks for the pointer!

In a simple test, I get "startup" times at about 0.126 seconds for Ammonite (with jdk1.8.0_25). Bash needs 0.063 seconds and Dash 0.032 seconds on the same machine. This looks good enough for me.

I thought you meant from bash, failed to reproduce, and then I realized you might mean from Drip. I'll try the latter, seems easy enough.

Copy link

geggo98 commented Nov 9, 2015

@Blaisorblade I got the comparison times with time bash -c exit (and the same with dash instead of bash). I tested Ammonite mainly by replacing java with drip in the startup file. But I didn't get a full successful start: At one point Ammonite seems to play around with the file descriptors, which confuses Drip. But for a simple indication about what startup times to expect this should be good enough.

Copy link

geggo98 commented Nov 9, 2015

@Blaisorblade With the newest version of Ammonite I have no problems with file handles any more. But the effect of Drip is not as big as before.

$ time java -cp amm ammonite.repl.Repl -c exit
        4.66 real        12.09 user         0.51 sys
$ time drip -cp amm ammonite.repl.Repl -c exit
        5.11 real        13.11 user         0.55 sys
$ time drip -cp amm ammonite.repl.Repl -c exit
        4.81 real         0.05 user         0.07 sys

So I get startup time from 16 seconds down to under 5 seconds. Nice, but not as great as in the previous measurements. Probably with playing around with DRIP_INIT_CLASS (and modifying that class accordingly) I can get back to the tenth of a second from the previous test.

Copy link

Nice, but not as great as in the previous measurement,

I don't have time to contribute, but I'd like to point out how to interpret those data.
That doesn't mean that Drip is not going to help, just that there's some fiddling (or maybe refactoring) to do to get the benefits. From the description (and my experience with nailgun) I'd assume that by using Drip should still allow a small startup (like 0.1 s).

This snippet from Drip's docs sounds relevant:

By default, Drip only loads your main class at startup, but you can tell Drip to run additional code at startup.

This suggests that the new version of Ammonite moved some initialization code to later, so that the work is not done early enough (can't confirm without knowing the versions, but also doesn't necessarily matter).
Quite possibly, one would need to refactor initialization to be compatible with Drip, to separate what can be done before startup, from the code that can only be run when the user starts Ammonite (e.g. anything using the current directory, environment variables or startup parameters).

Copy link

Just for fun, I experimented with nailgun integration and got some positive results for running Scala scripts, see diff: olafurpg@381680e

To run it:

  • install nailgun client (for example on OSX, brew install nailgun)
  • run sbt amm/pack
  • start server ./amm/target/pack/bin/amm_ng
  • run ng ammonite.Main <args> instead of typical amm <args>

Scala scripts work like a charm. Rough numbers from compiling a new file on a warm sever:

  • single file: ~0.3s
  • import $file: ~0.5s
  • import $ivy: ~1.0s

With a file watcher like entr, I can run a script on every file save

ls | entr ng ammonite.Main

The repl doesn't work. Pressing up/down/ctrl-c is broken for some
reason. I think the input/output/error streams are injected correctly from nailgun. However, this is not an issue for me, since I rarely exit/restart the repl. Scala scripts is another story entirely.

I am sure @geggo98 raised some legitimate concerns about having a persistent daemon running. Regularly restarting the server might be a solution for that.

Copy link

The repl doesn't work. Pressing up/down/ctrl-c is broken for some
reason. I think the input/output/error streams are injected correctly from nailgun.

If anybody is interested in fast shell startup, the basic problem is that nailgun offers only a socket forwarding to a terminal, and TTY detection only sees a socket and not a terminal. In fact, nailgun's redirection is done at the Java level and might not even affect the native stdin/stdout/stderr file descriptors (

To see this forwarding problem, observe the perfect stream forwarder (cat) fail at this.

$ cat|stty -a
stty: stdin isn't a terminal

One hacky but relatively easy-ish fix would be that Ammonite's calls to stty/tput/whatnot should be called by the nailgun C server on the original terminal. (See e.g. This fix still requires hacking some C code, but the C nailgun code is small.

A cleaner fix is harder: nailgun (or a replacement) could offer a virtual terminal rather than a pipe to the process, and forward terminal commands. That would be cleaner but much harder (if it is feasible at all), and UNIX terminal APIs are a mess and non-portable. No clue on Windows. I suspect this would only work with some existing terminal emulator... You'd probably want some expect variant, but not in Tcl please. Luckily, there are a few in Java:

Finally, here's a testcase showing that expect does better forwarding than cat. I'm not sure if those "pure Java" implementations achieve the same.

$ expect -c "system stty"
speed 38400 baud;
lflags: echoe echok echoke echoctl pendin
iflags: -ixany iutf8 ignpar
oflags: onocr onlret
cflags: cs8 -parenb

Copy link

I am not convinced it's worth the effort to get the repl running via nailgun. In my experience, repl sessions usually stay open several minutes if not hours. Shaving 2-3 seconds off the startup time won't have a big impact on my productivity.

I hacked together a script ss (short for scala script) to automatically manage the nailgun server.

set -e

command -v ng &> /dev/null
command -v amm_ng &> /dev/null
command -v entr &> /dev/null

startNailgun() {
  echo "Starting nailgun..."
  amm_ng &
  sleep 1
  ng ng-alias amm ammonite.Main

usage() {
  cat << EOF
ss [file]      # run scala script
ss loop [file] # continuously run script on file save
ss restart     # restart nailgun server
ss help        # print this help message

if [ $1 = "help" ]; then
elif [ "$1" = "restart" ]; then
  ng ng-stop || true
  sleep 1
  ng ng-stats &> /dev/null || startNailgun
  if [ "$1" = "loop" ]; then
    shift 1
    ls "$@" | entr time ng amm "$@"
    ng amm "$@"

I don't know if this should be bundled with ammonite, but it's already pretty useful for myself.

Copy link

lihaoyi commented Aug 14, 2016

My current inclination is to bundle the file-watching ability as part of the amm executable (#364); that way you'd run amm --watch or something on a script, and it would keep the JVM open and re-instantiate the Repl object every time it notices a change and re-run it on the script.

That should give us the benefits of a warm JVM without the complexity of dealing with nailgun and all that, as long as we're happy with just the --watch functionality and don't care about other uses of nailgun e.g. run a script ad-hoc and having it start fast. It would be far simpler to implement too, and would be familiar to users of the many build tools which have --watch-style functionality built in (conversely, very few build tools keep long-running daemons around in the background)

Copy link

A --watch flag would eliminate the need for nailgun in my case. I care most about having a tight edit-run-debug cycle. Once a script is ready, the JVM startup penalty becomes less of an issue.

I agree that the long-running daemon setup is error-prone and it would be best to get away without it, if possible.

Copy link

I've needed Scala scripts with fast startup—once (because most of my scripting is in bash). I also think @olafurpg's code is in fact simple enough for inclusion for those interested, especially to use Ammonite to replace bash scripting. But I understand the maintenance burden might matter.

But in my case since the nailgun integration was easy enough to do it on my own, I'm currently fine with leaving this use case open.

Copy link

lihaoyi commented May 18, 2017

The current state (~1s startup overhead for hello world) is we seem bottlenecked on classloading time:

screenshot 2017-05-19 00 38 08

Given that, I think this is about as good as it'll get for now, short of something really clever happening around scala-{native, js}

@lihaoyi lihaoyi closed this as completed May 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

5 participants