Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module system for Catala #88

Closed
denismerigoux opened this issue Mar 10, 2021 · 3 comments
Closed

Module system for Catala #88

denismerigoux opened this issue Mar 10, 2021 · 3 comments
Assignees
Labels
🔧 compiler Issue concerns the compiler ✨ enhancement New feature or request 👪 help wanted Extra attention is needed
Milestone

Comments

@denismerigoux
Copy link
Contributor

denismerigoux commented Mar 10, 2021

The problem

Right now, the only abstraction mechanism in Catala are the scopes, which correspond roughly to functions. However, as programs get bigger and bigger, we will want to group functions into modules in order to split and compartmentalize implementations.

Even though it is possible to split an implementation across multiple files currently, this can only be done via a primitive textual include mechanism that should ultimately be complemented by a proper module system.

Design of a simple module system

Syntax

This proposal is heavily inspired from the F* and Rust module system. Each file is its own module. The name of the module is introduced at the top of the file with the syntax (taking #84 into account):

> Module Foo

The name of the file must correspond to the ASCII snake case version of the module name.

Scopes and types declared in a module can be accessed without prefix in the same module. To use types and scopes declared and defined in other module Bar, insert at the top of the file:

> Using Bar

Then simply refer to Bar's scopes and types with Bar.<name of type/scope>. Important: it is impossible to add rules and definitions to Bar inside Foo via a scope Bar: ....

Prefixes can be omitted for certain modules with the mention

> Using Bar implicit

For scopes or types with names defined in two distinct implicit modules, name resolution selects the one corresponding to the last implicit when traversing the file top to bottom.

Local aliases can be defined for modules with

> Using Bar as B

External module lookup.

When invoking the compiler with catala bar.catala_en, the compiler retrieves the list of the names external modules used in bar.catala_en and proceeds to find .catala_en files corresponding to those module names in the current directory. Additional lookup directories can be added with the -I option.

Compilation units

For the OCaml backend and future backends, each Catala module should be translated to a different target language file. Hence, the -o option should take an output directory instead of file and the generated files names should be the same as the source files, but with a different extension.

Literate programming

When invoking Catala to produce a literate programming output from a particular file, the output should only contain that single file and not the modules on which this file depends. To produce a literate programming output containing multiple files, one can uss the master file and dependencies list syntax like

@@Master file@@
@@Include: preamble.catala_en@@
@@Include: section_121.catala_en@@
@@Include: section_132.catala_en@@
@@Include: section_1015.catala_en@@

Implementation

The implementation of this module system is going to vastly affect the compiler code. First, programs in the different intermediate representations should be represented as a list of modules (each of them containing types and scopes). The list should be ordered according to the topological order of the dependency graph between modules, which will have to be constructed. Cyclic dependencies between modules are forbidden. The passage to a collections of modules to an ordered list can be implemented in the surface -> desugared translation.

When parsing expressions like <expression>.x.y that are transformed into Dotted

| Dotted of expression Pos.marked * constructor Pos.marked option * ident Pos.marked
(** Dotted is for both struct field projection and sub-scope variables *)

the name resolution of x and y should be resolved in that order:

  • first, lookup if the string is a module name, in which case we expect a following Dotted qualifier that can be a scope or a type belonging to that module
  • second, lookup if the string is a struct name, in which case we expect a following Dotted qualifier containing a field name belonging to that struct
  • thrid, lookup if the string is a struct field or a subscope.

Before starting this implementation, fixing #47 seems to be a good pre-requisite as we don't want to bother with ordering the types and scopes declaration correctly as long as they are cycle-free.

@denismerigoux denismerigoux added ✨ enhancement New feature or request 👪 help wanted Extra attention is needed 🔧 compiler Issue concerns the compiler labels Mar 10, 2021
@denismerigoux
Copy link
Contributor Author

denismerigoux commented Apr 29, 2021

A dump from a Zulip discussion :

@AltGr says:

The issue mentions that you can't add scopes or definitions to another module, which seems quite reasonable
but the structure in this case is a prologue that defines many scopes, then several other files that progressively fill them
which follows the structure of the law and the design of Catala
Am I led to conclude that modules aren't fit for that use ?

I reply:

Thanks for raising this problem, it relates actually to something quite deep with the way law is structured. The crux of this issue is that law is spaghetti code, and we're trying to bring some structure to it with scopes and now modules. Scopes are adapted to the spaghetti structure of the law, because they can be extended with new definitions and rules when a new piece of law comes along. Scopes are the right form of modularity for law. However, the module system is purely a computer science concern. The text of the law does not have the notion of modules in it, everything is interconnected in a non-modular way. But with modules, we want to split up things arbitrarily and divide things so that we can compile and share units of code separately.
Hence, I believe that the main use for modules will be very high-level, like you'll have one module for the whole of the French Social Security Code, another for the French Tax Code, etc. This may not even be possible without fine cyclic dependency analysis because the French Tax Code and the Social Security Code cross-reference themselves in both ways.
Another practical use for modules is that we want externally defined modules containing computation-heavy parts that we don't want to code up in Catala, but rather in the target language.
For all these reasons, I believe that the semantics of Catala's module should be aligned with the semantics of an OCaml module or a C compilation unit, or a Java class. That way, we offer in Catala a tool for programmers to organize their source Catala code in a way that lets them control the shape of the computation units they want to generate in the target language.
Since C compilation units are self-contained (except for externally-defined symbols) and that each scope is translated to a function, we can't allow adding definitions in a scope from outside its module, because that would entail changing the source code of a separate compilation unit.
(Of course you can always come up with complicated control-passing schemes to do that but that is not the spirit of Catala, which is to keep things simple on the backend side)
This limitation is quite strong and will restrict modules' usability. As you noted, it is likely that they won't be useful to refactor existing examples like the "allocations familiales". For these examples that follow the text of the law in a linear fashion with a prologue, I still believe that regular textual inclusion is the way to go since it mimics what the law is doing. However, what you could split up using modules in the "allocations familiales" example is the part where the value of the SMIC is defined. Indeed, the definition of the value of the SMIC is self-contained (it does not depend on other texts of law), and sits at the bottom of the dependency graph. Moreover, other types of law (tax lax, labor law) will have to use the value of the SMIC, hence sharing it as a module makes sense.

@AltGr
Copy link
Collaborator

AltGr commented Apr 30, 2021

Thanks for the details! One more thing pops to mind: does that mean that being in a module is actually optional, so you have separate files starting with > Module Foo for reusable parts, but you can keep things "toplevel" when writing a final law, without worrying about them ?

(it's just a detail actually, internally it makes no difference to have no module or an implicit "Toplevel" module ; but I feel that things get perceived a bit differently from an interface point of view)

@denismerigoux
Copy link
Contributor Author

Hum good point. Yes we can make module declaration optional, but the implementation in the compiler should have a special TopLevel module like you say rather than an option type everywhere for modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔧 compiler Issue concerns the compiler ✨ enhancement New feature or request 👪 help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants