Intern package/module names in LF packages #1496

S11001001 · 2019-06-03T13:25:41Z

Looking at a snippet of DAML-LF, @ndmitchell sees a huge number of repeated package and module names, because now the stdlib is in a different location. That's a major cause of really huge packages. @bitonic has a solution to that, which should be easy for @hurryabit :

Rough plan: add a int64 -> string table to each daml-lf package. add a variant to PackageRef, e.g.:

message PackageRef {
  oneof Sum {
    Unit self = 1;

    // A ascii7 lowercase hex-encoded package identifier. This refers
    // to the DAML LF Archive Hash.
    string package_id = 2;
    int64 interned_package_id = 3;
  }
}

The change is backwards compatible to DAML-LF 1.x.

Moreover, we will not include this change in the Haskell / Scala AST, they'll be resolved on decode. therefore, consumers of those libraries won't have to do anything new.

@ndmitchell suggests to intern module names too, @bitonic suggests we test this before deciding what the best compromise is.

Note that this not only makes the packages smaller, it should also significantly decrease gc pressure, since we'll allocate a lot less strings.

Speaking of making the archives' memory usage smaller, we can reduce GC pressure while loading an archive with something like SetAssociativeCache (usage).

(from DEL-7170, text by @bitonic & @ndmitchell)

raphael-speyer-da · 2019-06-04T00:22:03Z

Sounds like a simple form of compression that benefits in-memory representation as well as over the wire. Also having a central table of metadata like package ids could be beneficial for other tools wanting to introspect.

I'd just mention that the order in which package ids are added to this table could be an additional source of non-determinism in output that could frustrate caches, etc. Would it be worth having a rule/convention around the package ids being listed in sorted order, for example? That would also enable more efficient reverse lookup when needed.

S11001001 · 2019-06-07T19:48:28Z

Also having a central table of metadata like package ids could be beneficial for other tools wanting to introspect.

@raphael-speyer-da I don't plan to require that package IDs appear in the central table, given how the additional field must be defined for LF 1.x backward compatibility; technically I could add such a rule, but a tool relying upon such an invariant would have to limit its LF input compatibility accordingly. Perhaps this is merely a matter of mindset, since the DAML Engine promises compatibility all the way back to 1.0 (which is not matched by the compiler), so no tool built with the Engine's principles in mind could take advantage of it.

I think the least confusing approach is to make no rules for 1.x and consider eliminating the inline package ID case in LF 2.0.

Would it be worth having a rule/convention around the package ids being listed in sorted order, for example?

I won't introduce such a rule or convention, but I have in mind a design that should satisfy your concerns of nondeterminism with no additional bother or imposition; details to follow in the PR.

S11001001 · 2019-06-26T09:16:45Z

We got about 15% LF size savings from package ref interning; let's do module ref interning as well. #1882

S11001001 added the component/daml-lf DAML-LF label Jun 3, 2019

S11001001 mentioned this issue Jun 3, 2019

Extractor fails to work with a large package (buffer size not configurable) #1463

Closed

S11001001 self-assigned this Jun 6, 2019

S11001001 mentioned this issue Jun 12, 2019

intern package IDs in LF #1614

Merged

5 tasks

S11001001 closed this as completed in #1614 Jun 26, 2019

S11001001 reopened this Jun 26, 2019

S11001001 mentioned this issue Jun 26, 2019

Intern module names in LF packages #1882

Closed

S11001001 closed this as completed Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intern package/module names in LF packages #1496

Intern package/module names in LF packages #1496

S11001001 commented Jun 3, 2019 •

edited

raphael-speyer-da commented Jun 4, 2019

S11001001 commented Jun 7, 2019

S11001001 commented Jun 26, 2019 •

edited

Intern package/module names in LF packages #1496

Intern package/module names in LF packages #1496

Comments

S11001001 commented Jun 3, 2019 • edited

raphael-speyer-da commented Jun 4, 2019

S11001001 commented Jun 7, 2019

S11001001 commented Jun 26, 2019 • edited

S11001001 commented Jun 3, 2019 •

edited

S11001001 commented Jun 26, 2019 •

edited