Module system for Catala #88

denismerigoux · 2021-03-10T21:24:37Z

The problem

Right now, the only abstraction mechanism in Catala are the scopes, which correspond roughly to functions. However, as programs get bigger and bigger, we will want to group functions into modules in order to split and compartmentalize implementations.

Even though it is possible to split an implementation across multiple files currently, this can only be done via a primitive textual include mechanism that should ultimately be complemented by a proper module system.

Design of a simple module system

Syntax

This proposal is heavily inspired from the F* and Rust module system. Each file is its own module. The name of the module is introduced at the top of the file with the syntax (taking #84 into account):

> Module Foo

The name of the file must correspond to the ASCII snake case version of the module name.

Scopes and types declared in a module can be accessed without prefix in the same module. To use types and scopes declared and defined in other module Bar, insert at the top of the file:

> Using Bar

Then simply refer to Bar's scopes and types with Bar.<name of type/scope>. Important: it is impossible to add rules and definitions to Bar inside Foo via a scope Bar: ....

Prefixes can be omitted for certain modules with the mention

> Using Bar implicit

For scopes or types with names defined in two distinct implicit modules, name resolution selects the one corresponding to the last implicit when traversing the file top to bottom.

Local aliases can be defined for modules with

> Using Bar as B

External module lookup.

When invoking the compiler with catala bar.catala_en, the compiler retrieves the list of the names external modules used in bar.catala_en and proceeds to find .catala_en files corresponding to those module names in the current directory. Additional lookup directories can be added with the -I option.

Compilation units

For the OCaml backend and future backends, each Catala module should be translated to a different target language file. Hence, the -o option should take an output directory instead of file and the generated files names should be the same as the source files, but with a different extension.

Literate programming

When invoking Catala to produce a literate programming output from a particular file, the output should only contain that single file and not the modules on which this file depends. To produce a literate programming output containing multiple files, one can uss the master file and dependencies list syntax like

catala/examples/us_tax_code/us_tax_code.catala_en

Lines 1 to 9 in 9282e5e

    
           @@Master file@@ 
        
           @@Include: preamble.catala_en@@ 
        
           @@Include: section_121.catala_en@@ 
        
           @@Include: section_132.catala_en@@ 
        
           @@Include: section_1015.catala_en@@

Implementation

The implementation of this module system is going to vastly affect the compiler code. First, programs in the different intermediate representations should be represented as a list of modules (each of them containing types and scopes). The list should be ordered according to the topological order of the dependency graph between modules, which will have to be constructed. Cyclic dependencies between modules are forbidden. The passage to a collections of modules to an ordered list can be implemented in the surface -> desugared translation.

When parsing expressions like <expression>.x.y that are transformed into Dotted

catala/src/catala/surface/ast.mli

Lines 151 to 152 in 9282e5e

    
           | Dotted of expression Pos.marked * constructor Pos.marked option * ident Pos.marked 
        
               (** Dotted is for both struct field projection and sub-scope variables *)

the name resolution of x and y should be resolved in that order:

first, lookup if the string is a module name, in which case we expect a following Dotted qualifier that can be a scope or a type belonging to that module
second, lookup if the string is a struct name, in which case we expect a following Dotted qualifier containing a field name belonging to that struct
thrid, lookup if the string is a struct field or a subscope.

Before starting this implementation, fixing #47 seems to be a good pre-requisite as we don't want to bother with ordering the types and scopes declaration correctly as long as they are cycle-free.

The text was updated successfully, but these errors were encountered:

denismerigoux · 2021-04-29T16:56:13Z

A dump from a Zulip discussion :

@AltGr says:

The issue mentions that you can't add scopes or definitions to another module, which seems quite reasonable
but the structure in this case is a prologue that defines many scopes, then several other files that progressively fill them
which follows the structure of the law and the design of Catala
Am I led to conclude that modules aren't fit for that use ?

I reply:

Thanks for raising this problem, it relates actually to something quite deep with the way law is structured. The crux of this issue is that law is spaghetti code, and we're trying to bring some structure to it with scopes and now modules. Scopes are adapted to the spaghetti structure of the law, because they can be extended with new definitions and rules when a new piece of law comes along. Scopes are the right form of modularity for law. However, the module system is purely a computer science concern. The text of the law does not have the notion of modules in it, everything is interconnected in a non-modular way. But with modules, we want to split up things arbitrarily and divide things so that we can compile and share units of code separately.
Hence, I believe that the main use for modules will be very high-level, like you'll have one module for the whole of the French Social Security Code, another for the French Tax Code, etc. This may not even be possible without fine cyclic dependency analysis because the French Tax Code and the Social Security Code cross-reference themselves in both ways.
Another practical use for modules is that we want externally defined modules containing computation-heavy parts that we don't want to code up in Catala, but rather in the target language.
For all these reasons, I believe that the semantics of Catala's module should be aligned with the semantics of an OCaml module or a C compilation unit, or a Java class. That way, we offer in Catala a tool for programmers to organize their source Catala code in a way that lets them control the shape of the computation units they want to generate in the target language.
Since C compilation units are self-contained (except for externally-defined symbols) and that each scope is translated to a function, we can't allow adding definitions in a scope from outside its module, because that would entail changing the source code of a separate compilation unit.
(Of course you can always come up with complicated control-passing schemes to do that but that is not the spirit of Catala, which is to keep things simple on the backend side)
This limitation is quite strong and will restrict modules' usability. As you noted, it is likely that they won't be useful to refactor existing examples like the "allocations familiales". For these examples that follow the text of the law in a linear fashion with a prologue, I still believe that regular textual inclusion is the way to go since it mimics what the law is doing. However, what you could split up using modules in the "allocations familiales" example is the part where the value of the SMIC is defined. Indeed, the definition of the value of the SMIC is self-contained (it does not depend on other texts of law), and sits at the bottom of the dependency graph. Moreover, other types of law (tax lax, labor law) will have to use the value of the SMIC, hence sharing it as a module makes sense.

AltGr · 2021-04-30T16:31:45Z

Thanks for the details! One more thing pops to mind: does that mean that being in a module is actually optional, so you have separate files starting with > Module Foo for reusable parts, but you can keep things "toplevel" when writing a final law, without worrying about them ?

(it's just a detail actually, internally it makes no difference to have no module or an implicit "Toplevel" module ; but I feel that things get perceived a bit differently from an interface point of view)

denismerigoux · 2021-04-30T17:21:01Z

Hum good point. Yes we can make module declaration optional, but the implementation in the compiler should have a special TopLevel module like you say rather than an option type everywhere for modules.

denismerigoux added ✨ enhancement New feature or request 👪 help wanted Extra attention is needed 🔧 compiler Issue concerns the compiler labels Mar 10, 2021

denismerigoux mentioned this issue Mar 10, 2021

Externally-implemented modules for Catala #89

Closed

denismerigoux added this to the Catala v1.0 milestone Feb 16, 2022

AltGr self-assigned this Oct 17, 2022

denismerigoux mentioned this issue Apr 5, 2023

Bug in formalizing US Section 121: Five-year periods #443

Open

AltGr closed this as completed May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module system for Catala #88

Module system for Catala #88

denismerigoux commented Mar 10, 2021 •

edited

Loading

denismerigoux commented Apr 29, 2021 •

edited

Loading

AltGr commented Apr 30, 2021

denismerigoux commented Apr 30, 2021

Module system for Catala #88

Module system for Catala #88

Comments

denismerigoux commented Mar 10, 2021 • edited Loading

The problem

Design of a simple module system

Syntax

External module lookup.

Compilation units

Literate programming

Implementation

denismerigoux commented Apr 29, 2021 • edited Loading

AltGr commented Apr 30, 2021

denismerigoux commented Apr 30, 2021

denismerigoux commented Mar 10, 2021 •

edited

Loading

denismerigoux commented Apr 29, 2021 •

edited

Loading