Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intern package/module names in LF packages #1496

Closed
S11001001 opened this issue Jun 3, 2019 · 3 comments · Fixed by #1614
Closed

Intern package/module names in LF packages #1496

S11001001 opened this issue Jun 3, 2019 · 3 comments · Fixed by #1614
Assignees
Labels

Comments

@S11001001
Copy link
Contributor

S11001001 commented Jun 3, 2019

Looking at a snippet of DAML-LF, @ndmitchell sees a huge number of repeated package and module names, because now the stdlib is in a different location. That's a major cause of really huge packages. @bitonic has a solution to that, which should be easy for @hurryabit :

Rough plan: add a int64 -> string table to each daml-lf package. add a variant to PackageRef, e.g.:

message PackageRef {
  oneof Sum {
    Unit self = 1;

    // A ascii7 lowercase hex-encoded package identifier. This refers
    // to the DAML LF Archive Hash.
    string package_id = 2;
    int64 interned_package_id = 3;
  }
}

The change is backwards compatible to DAML-LF 1.x.

Moreover, we will not include this change in the Haskell / Scala AST, they'll be resolved on decode. therefore, consumers of those libraries won't have to do anything new.

@ndmitchell suggests to intern module names too, @bitonic suggests we test this before deciding what the best compromise is.

Note that this not only makes the packages smaller, it should also significantly decrease gc pressure, since we'll allocate a lot less strings.

Speaking of making the archives' memory usage smaller, we can reduce GC pressure while loading an archive with something like SetAssociativeCache (usage).

(from DEL-7170, text by @bitonic & @ndmitchell)

@raphael-speyer-da
Copy link
Collaborator

Sounds like a simple form of compression that benefits in-memory representation as well as over the wire. Also having a central table of metadata like package ids could be beneficial for other tools wanting to introspect.

I'd just mention that the order in which package ids are added to this table could be an additional source of non-determinism in output that could frustrate caches, etc. Would it be worth having a rule/convention around the package ids being listed in sorted order, for example? That would also enable more efficient reverse lookup when needed.

@S11001001 S11001001 self-assigned this Jun 6, 2019
@S11001001
Copy link
Contributor Author

Also having a central table of metadata like package ids could be beneficial for other tools wanting to introspect.

@raphael-speyer-da I don't plan to require that package IDs appear in the central table, given how the additional field must be defined for LF 1.x backward compatibility; technically I could add such a rule, but a tool relying upon such an invariant would have to limit its LF input compatibility accordingly. Perhaps this is merely a matter of mindset, since the DAML Engine promises compatibility all the way back to 1.0 (which is not matched by the compiler), so no tool built with the Engine's principles in mind could take advantage of it.

I think the least confusing approach is to make no rules for 1.x and consider eliminating the inline package ID case in LF 2.0.

Would it be worth having a rule/convention around the package ids being listed in sorted order, for example?

I won't introduce such a rule or convention, but I have in mind a design that should satisfy your concerns of nondeterminism with no additional bother or imposition; details to follow in the PR.

@S11001001
Copy link
Contributor Author

S11001001 commented Jun 26, 2019

We got about 15% LF size savings from package ref interning; let's do module ref interning as well. #1882

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants