Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance #8

Closed
ocharles opened this issue Oct 31, 2018 · 48 comments
Closed

Performance #8

ocharles opened this issue Oct 31, 2018 · 48 comments

Comments

@ocharles
Copy link
Contributor

The gcc branch gives us ./example/gcc.dhall, which is used to build GCC. This has 3 dependencies - GMP, MPFR and MPC. However, just evaluating gcc.dhall takes minutes:

time ./bin/dhall +RTS -s <<< ./examples/gcc.dhall

285,234,734,472 bytes allocated in the heap
   2,149,148,384 bytes copied during GC
       8,560,952 bytes maximum residency (89 sample(s))
         341,888 bytes maximum slop
              26 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     274811 colls,     0 par    4.488s   4.463s     0.0000s    0.0004s
  Gen  1        89 colls,     0 par    0.559s   0.559s     0.0063s    0.0123s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time   92.299s  ( 92.484s elapsed)
  GC      time    5.047s  (  5.022s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   97.346s  ( 97.507s elapsed)

  %GC     time       5.2%  (5.2% elapsed)

  Alloc rate    3,090,343,422 bytes per MUT second

  Productivity  94.8% of total user, 94.8% of total elapsed

./bin/dhall +RTS -s <<< ./examples/gcc.dhall  96.82s user 0.54s system 99% cpu 1:37.51 total

Applying dhall freeze everywhere actually makes things worse:

find ./examples -name '*.dhall' -exec ./bin/dhall freeze --inplace '{}' ';'
time ./bin/dhall +RTS -s <<< ./examples/gcc.dhall

343,492,628,992 bytes allocated in the heap
   2,511,808,920 bytes copied during GC
       9,187,192 bytes maximum residency (94 sample(s))
          46,576 bytes maximum slop
              28 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     330961 colls,     0 par    5.495s   5.461s     0.0000s    0.0008s
  Gen  1        94 colls,     0 par    0.520s   0.521s     0.0055s    0.0137s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time  113.356s  (113.629s elapsed)
  GC      time    6.015s  (  5.982s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time  119.372s  (119.611s elapsed)

  %GC     time       5.0%  (5.0% elapsed)

  Alloc rate    3,030,198,903 bytes per MUT second

  Productivity  95.0% of total user, 95.0% of total elapsed

./bin/dhall +RTS -s <<< ./examples/gcc.dhall  118.75s user 0.63s system 99% cpu 1:59.62 total
@ocharles
Copy link
Contributor Author

CC @Gabriel439

@ocharles
Copy link
Contributor Author

The (huge) normalized expression can be found at https://github.com/dhallix/nix-derivation/blob/gcc/nf.dhall

@ocharles
Copy link
Contributor Author

Here is the derivation dependency graph, to give an idea of what nf.dhall represents.

graph

@Gabriella439
Copy link
Contributor

@ocharles: I can think of two possible ways to solve this:

Approach 1: Use lamping's algorithm for normalization in Dhall so that it can internally represent the abstract syntax tree as a graph so that it doesn't need to materialize the entire result except to display to the user. This is a significant amount of work.

Approach 2: Use the support for custom type-checking and custom normalization so that mkDerivation is now a built-in that when evaluated actually builds the derivation and replaces it with a Text value pointing to the built path. In other words, exactly how Nix works

@ocharles
Copy link
Contributor Author

ocharles commented Nov 1, 2018

Approach 2 has the same problems that always come up with this, namely siloing yourself from any other Dhall tool in the field. That does feel like a shame for something like dhallix. I'm also not convinced it really solves the problem, but perhaps more just delays when it actually becomes problematic. I was wondering how much normalisation would benefit from parallel reduction and memoisation - my hunch is a lot of similar terms are repeatedly normalised. The former is suggested as dhall-nix only uses a single core on my 8 core machine to normalise this term, but I think a lot of normalisation could happen in parallel. It doesn't change the asymptotics, but may still be helpful.

@Gabriella439
Copy link
Contributor

@ocharles: Right, the graph representation is the way to avoid repeated work and Lamping's algorithm is the most efficient graph representation that I know of. I also believe it is an algorithm that's amenable to parallelization, too (cc: @MaiaVictor to confirm)

Given that this is quite a bit of work to I'll probably have to set up a KickStarter or something similar to fund somebody to do this because I know that I won't have the bandwidth to do this myself

@ocharles
Copy link
Contributor Author

ocharles commented Nov 1, 2018

While that might be the most optimal it is, as you say, a lot of work. My approach would be to do a pass on the input to normalize to replace identical terms with shared terms, and then use sharing as memoisation in normalisation. It at least shares duplicate work, even if it's not optimal. I'll have a play.

@ocharles
Copy link
Contributor Author

ocharles commented Nov 3, 2018

Unfortunately even adding derivation as a built-in doesn't really seem to help. I forked dhall-nix and added an extra step in dhallToNix to convert derivation {...} into a Nix derivation call. Then, I changed everything to just use the new derivation : DerivationArgs -> Text function. Just trying to generate the Nix expression corresponding to gcc.dhall still takes 3 minutes on my laptop (Dhall 1.17). Dhall 1.18 takes 1min 30s on this laptop to normalize the original approach, which does suggest the 1.18 improvements might make a difference, but it still feels like it'll be too slow to be practical.

@Gabriella439
Copy link
Contributor

@ocharles: That seems unusual. I think you'll have to profile it to see what is the bottleneck

@ocharles
Copy link
Contributor Author

ocharles commented Nov 3, 2018

I'll have a look in the week. I took out the Nix specific stuff and switched to Dhall 1.18. Going from examples/gcc.dhall to a normal form (Dhall.Core.normalizeWith) still takes 1min 30s so I don't think it's a problem particularly unique to Dhallix.

@Gabriella439
Copy link
Contributor

Yeah, what makes this weird is that it seems like the AST should be small from what I've skimmed so far, unless it is duplicating calls to derivation which cause that to be unnecessarily called repeatedly

@ocharles
Copy link
Contributor Author

ocharles commented Nov 3, 2018

The normal form is still pretty huge - 6MB formatted: https://github.com/dhallix/nix-derivation/blob/builtin-derivation/normal-form.dhall

@ocharles
Copy link
Contributor Author

ocharles commented Nov 3, 2018

I'm not sure what you mean about derivation calls be duplicated unnecessarily. Everything builds on top of bootstrap stuff, so the bootstrapping derivation gets inlined into everything that needs it. I can't see how else Dhall could represent that sharing

@Gabriella439
Copy link
Contributor

Ah, okay, now I see why you didn't get a performance speed up.

What I meant was to use normalizeWith to add a custom hook for evaluating calls of the derivation function, instead of evaluating the derivation function in a post-processing step after normalize. The reason for doing so is that if you evaluate derivation as you are normalizing the tree then it will never get an opportunity to grow so large.

@ocharles
Copy link
Contributor Author

ocharles commented Nov 3, 2018

I actually don't think it's normalization that is the problem. If I take examples/gcc.dhall and just pass it through importing and type checking it still takes 1min 28s. I've gotta leave soon though so I can't do much more testing for now.

@ocharles
Copy link
Contributor Author

ocharles commented Nov 3, 2018

Actually, I seem to be getting a normal form even if I only do

    expr' <-
      StateT.evalStateT
        (Dhall.Import.loadWith expr)
        (Dhall.Import.emptyStatus "."
          & Dhall.Import.startingContext .~ Dhallix.context)
    case Dhall.TypeCheck.typeWith Dhallix.context expr' of
        Left  err -> Control.Exception.throwIO err
        Right _   -> return ()

    print (Dhall.Pretty.prettyExpr expr')))

Does importing now do normalization?

@Gabriella439
Copy link
Contributor

@ocharles: Yes

@ocharles
Copy link
Contributor Author

ocharles commented Nov 3, 2018

Ok, then this is going to be much harder - doable, but harder. That's because normalisation of a call to the builtin derivation function has to return the out path. That requires building a derivation and doing all of the hashing. It's doable, and I've done that before, but this project can no longer be about translating .dhall to .nix files and then using existing tools that work on .nix. That's fine, and was ultimately always going to be the case - I was just hoping to push that out as much as possible!

I was hoping that I could just plug in the above extra case to dhallToNix (converting App (Var "derivation") (Record args) into a Fix NExpr value) by simply Embeding the Nix expression - but I can't do that because imports normalizer only supports Embed X.

I'll proceed in the direction of directly building .drv files (which is pretty much exactly what https://github.com/ocharles/dhall-build does), unless you can see a way to go from .dhall to .nix.

@Gabriella439
Copy link
Contributor

@ocharles: Actually, there is a trick we might be able to apply here. Detecting common subexpressions in general is going to be hard if we consider all possible subexpressions, but in this case we don't need to consider all possible subexpressions. We only need to factor out common subexpressions that are derivations (i.e. a saturated call to the derivation function), so if we restrict ourselves to just those expressions then we can do something like the following algorithm:

  • Find all saturated calls to derivation
  • Compute the semantic integrity check of each one
  • For any two calls that share the same hash, perform common subexpression elimination to factor them out into a top-level let binding

@ocharles
Copy link
Contributor Author

ocharles commented Nov 3, 2018

This sounds interesting, but where would this be done? Is this something that would be plugged into dhall itself? I can't see how else it could be done, because the expression ./examples/gcc.dhall nedes normalization. Unless there's some with this can already be done with normalizeWith?

@Gabriella439
Copy link
Contributor

@ocharles: Yeah, you're right. This would instead require a performance improvement to the normalization engine. At best the trick could be used to minimize the rendered output, but wouldn't improve normalization speed.

@ocharles
Copy link
Contributor Author

ocharles commented Nov 6, 2018

Ok, we now have normalizeWithM but I still don't think I have the machinery I need. normalizeWithM gives me a way to plugin normalization from the bottom up, but I believe I really need to be driving this top down. Let's look at a simple example:

derivation
{ environment =
    [ { name = "dep-1", value = derivation dep-1-args }
    , { name = "dep-2", value = derivation dep-2-args }
    ]
}

Here our expression is a derivation, and it has two derivations as children. Let's consider dep-1-args and dep-2-args disjoint - that is, they share no common derivations. Normalization with normalizeWithM would proceed from the bottom up - so we'd normalize both derivation dep-1-args and derivation dep-2-args into Text (Nix store paths). This is fine, but the problem is with the top-level derivation - how does it know what derivations are inputs (in Nix terminology - dependencies, to clarify for other readers)? That is, the top level .drv needs to mention the .drv for derivation dep-1-args, but at the point of normalizeWithM that has been replaced with just Text. Two options come to mind:

  1. If you used StateT [Derivation] you could just get the current list of derivations. This seems fine, but would actually mean that derivation dep-2-args would depend on derivation dep-1-args. The [Derivation] would actually accumulate a list of all previously normalized derivations, so that's no good.
  2. Maintain a mapping of NixStorePath -> Deriver. This mapping can be maintained by the custom normalization step, but it means whenever a derivation args term is encountered, I have to walk all of term to find what NixStorePaths it refers to which is going to be tedious to determine.

The best option actually seems to be able to enter my custom normalizer top-down. Whenever I see a derivation args term, I actually use runStateT (normalizeWithM ...) args to find both a normal form of args, and all its input derivations. There would still be a bottom up normalization step to actually reduce derivation args to a store path, but this would require args to not mention derivation (which is trivially true by being bottom up).

Before I go ahead and add an extra call in normalizeWithM I wanted to think out loud (rubber 🦆 debugging!) and also make sure no one can see any better options.

@Gabriella439
Copy link
Contributor

@ocharles: I believe what you want is to use WriterT instead of StateT. Then you could use listen to collect all of the derivations that you emitted within a sub-expression in such a way that it is not polluted by derivations outside of that subexpression

@ocharles
Copy link
Contributor Author

ocharles commented Nov 6, 2018

@Gabriel439 That requires a top-down traversal rather than bottom up, unless I'm missing something.

@ocharles
Copy link
Contributor Author

ocharles commented Nov 6, 2018

To clarify,

listen :: Monad m => WriterT w m a -> WriterT w m (a, w)

So listen is a handler for MonadWriter. Revisiting the above example, if I'm at the top-level derivation, the child derivation calls will already have been normalised due to the bottom up algorithm. Thus there is nothing to listen to! If I try and normalize the arguments record, I still won't see any derivation calls because it's too late. Not sure how listen could help.

@Gabriella439
Copy link
Contributor

@ocharles: Oh yeah, you're right. listen would only work if you were reimplementing the normalization function yourself, instead of just providing a Normalizer. In that case then I agree that a nested call to normalizeWithM within the Normalizer would probably be the way to go

@ocharles
Copy link
Contributor Author

Right, but normalizeWithM would need to be changed to have a top down recursive call before anything else. If you're ok with that, I'll add it

@Gabriella439
Copy link
Contributor

@ocharles: What do you mean by a top-down recursive call?

@ocharles
Copy link
Contributor Author

I mean that it needs to call the custom Expr -> m (Maybe Expr) function before doing any normalisation. This is because if I'm to use listen, I need to make the recursive call on the original AST, not the normalised one. Taking the example from here, the first thing I should see should be the whole AST - that way when I recurse I will see the sub-derivations. For example, when I pattern match on derivation <args>, then listen (normalizeWithM .. <args>) needs to still see actual derivation calls in the AST, rather than having them normalized to store paths.

The current algorithm only calls the context function at the leaves in a bottom-up fashion. There is no way for me to perform what I'm calling top-down recursion.

Does that make sense?

@Gabriella439
Copy link
Contributor

@ocharles: Ah, yes, I get it now!

Yeah, I would be fine with this if it doesn't significantly degrade performance. My intuition is that if we only apply the custom Normalizer when we encounter an App constructor then performance probably shouldn't noticeably degrade.

@ocharles
Copy link
Contributor Author

Alright, cool! I'll throw something up soon.

@ocharles
Copy link
Contributor Author

ocharles commented Nov 17, 2018

I still don't think this is possible, because I'm unable to use listen (or fresh state) over import boundaries. Consider the following:

a.dhall:

derivation { builder = ./bash.dhall, ... }

bash.dhall

derivation { ... } -- No further imports or derivations

If I try and build the expression ./a.dhall, the importing process will first import and normalize bash.dhall. Then, once this import is done, when normalizing a.dhall with imports resolved, I will only be able to witness derivation { builder = "/nix/store/...." }, and not the fact that this came from an import.

The rough flow is thus:

  • Parse a.dhall
    • Load imports
      • Parse bash.dhall
      • Load imports
      • Normalize bash.dhall
    • Normalize a.dhall. The imported derivation { ... } will now just be a string

Does that make sense? You can witness the problem if you build http://github.com/ocharles/dhall-build and then create a Dhall file that points to, say, examples/gcc.dhall from this repository. It will print out a bunch of diagnostic information along the way, but ultimately you'll see that all derivations are built, but there is no dependency information between them. That's because this modify call is always hit, but this runStateT is never really useful, because the sub-expression never contains a derivation call (because all derivation calls are in separate imports).

I'll double check my reasoning tomorrow, but I'm pretty convinced this is correct.

@Gabriella439
Copy link
Contributor

@ocharles: Yeah, now that you mention that I think you are going to hit a wall since we don't have a good way for effects to cross import boundaries.

Maybe the way to go here is to do what you originally proposed to:

Maintain a mapping of NixStorePath -> Deriver

@ocharles
Copy link
Contributor Author

That's actually going to be really messy. I would have to scan all arguments to derivation for Nix store paths in order to consult the mapping. Worse, those store paths occur in TextLit expressions, so I have to essentially run a regular expression against Chunks to find nix store paths.

This is doable but feels really unfortunate - the information is right there in a much more structured form! It's really the repeated normalisation that's causing problems, but not sure if that can be changed

@ocharles
Copy link
Contributor Author

ocharles commented Dec 1, 2018

@Gabriel439 did you see my last comment? I've been thinking about this a bit more and I still can't see a good way to do this. The only thing I can think of that wouldn't be awful would be to generalise loading from Expr Src Import -> IO (Expr Src X) to forall x. Expr Src Import -> IO (Expr Src x) and the same for normalizing. Then I can Embed the .drv path and later normalize that to the store output.

@ocharles
Copy link
Contributor Author

ocharles commented Dec 1, 2018

It's that, or maybe we wait until standard normalization is better and the original approach can be used (which is certainly the most conceptually simple from my side).

@Gabriella439
Copy link
Contributor

Yeah, I think making standard normalization faster is going to be the way to go here. If Dhall is going to replace Nix then it needs to be as fast as the Nix interpreter (and possibly as lazy)

@AndrasKovacs
Copy link

AndrasKovacs commented Apr 22, 2019

@ocharles Hi, could you perhaps update the gcc.dhall file to the current language standard? I'm working on an optimized branch, and would like to benchmark, but your file seems out of date.

Preliminarily, on this file type checking and normalization is about 100 times faster on the new branch, and now command time is dominated by networking and expression rendering. If you check the branch out, I shall note that basically only dhall and dhall type work currently, and many things are missing, including most informative error messages.

@ocharles
Copy link
Contributor Author

See the gcc-latest-dhall branch

@AndrasKovacs
Copy link

@ocharles thanks. I'm happy to tell you that gcc.dhall typechecks in 0.02s and I can write the normal form to a file in 0.2s :)

@ocharles
Copy link
Contributor Author

Woah, this project just became viable again!

@ocharles
Copy link
Contributor Author

Can you share your branch?

@AndrasKovacs
Copy link

AndrasKovacs commented Apr 23, 2019

Do you mean the link: https://github.com/dhall-lang/dhall-haskell/tree/nbe-elaboration, or is there some git sharing thing which I don't know about (being a git/github dummy)?

@ocharles
Copy link
Contributor Author

Nope, that's what I was looking for! I had only seen your work that got merged into master, I wasn't sure where the latest and greatest work was. I'll have a play now - thanks!

@hyperfekt
Copy link

For anyone wondering, on Dhall 1.30.0 this takes like 0.35s. I think the issue can be regarded as fixed.

@ocharles
Copy link
Contributor Author

Ok, interesting! I thought we never got all the improvements merged. I will revisit before closing.

@hyperfekt
Copy link

IIUC not all the improvements are merged indeed, but a bunch more of them were in the meantime - amongst them, apparently, those that fixed the performance for this case.

@ocharles
Copy link
Contributor Author

I agree, this is certainly better than minutes. I get 0.3s on Dhall 1.32. I think we can safely close this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants