Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics of INCLUDE (this has to do with namespaces) #1467

Open
stefjoosten opened this issue Feb 16, 2024 · 2 comments
Open

Semantics of INCLUDE (this has to do with namespaces) #1467

stefjoosten opened this issue Feb 16, 2024 · 2 comments
Assignees

Comments

@stefjoosten
Copy link
Contributor

stefjoosten commented Feb 16, 2024

Problem

Namespaces require semantics that will prepare us to work with distributed systems and allow us to do data migrations. So far, we have generated information systems with one unified namespace. The semantics of the INCLUDE statement until Ampersand vs. 5.0 is the set union. To support data migration, we need to support three systems, one of which has an INCLUDE relation with the two others.

Requirements

Proposed solution

In issue #850 we decided to borrow Haskell's module mechanism, with one file for each module. Each file starts with a MODULE statement, so let's replace the CONTEXT statement from Ampersand with the MODULE statement. Without any INCLUDE statements, Ampersand compiles the entire file into one information system containing a dataset, a schema, and a set of interfaces. So it compiles a module called ${\tt bar}$ to a triple $\langle D_{\tt bar}, S_{\tt bar}, F_{\tt bar}\rangle$. With an INCLUDE statement, we need to define that every identifier in the included module is known in the including module by the prefix " ${\tt bar.}$ ". To define renaming, need an operator $\downarrow$, just for defining the semantics in the compiler:
${\tt x\downarrow y\ =\ x<>}$ "." ${\tt<>y}$
I will overload this operator to work for information systems, datasets, schemas, interface sets, and their constituent elements as well, meaning that $x\downarrow y$ prefixes the name $x$ together with a dot to every identifier in the namespace of $y$. For example, if $y$ contains the name client, then $x\downarrow y$ contains the name x.client on every qualifying occurrence of client in $y$.

Let ${\tt foo}$ and ${\tt bar}$ be information systems. Each has a dataset, a schema, and some (0...) interfaces.
Let $D_{\tt foo}$ and $D_{\tt bar}$ be datasets. Let $S_{\tt foo}$ and $S_{\tt bar}$ be schemas. Let $F_{\tt foo}$ and $F_{\tt bar}$ be sets of interfaces. Now we can define the system ${\tt foo\ INCLUDES\ bar}$ as:

$D_{\tt foo\ INCLUDES\ bar}\ =\ D_{\tt foo}\cup {\tt bar}\downarrow D_{\tt bar}$

$S_{\tt foo\ INCLUDES\ bar}\ =\ S_{\tt foo}\cup {\tt bar}\downarrow S_{\tt bar}$

$F_{\tt foo\ INCLUDES\ bar}\ =\ F_{\tt foo}\cup {\tt bar}\downarrow F_{\tt bar}$

For the datasets, this means that all relation names and concept names in ${\tt bar}$ are prefixed with ${\tt bar}$. Atoms are left alone. In the schema of ${\tt bar}$, all rule names, relation names, concept names, pattern names, and view names are prefixed with ${\tt bar}$. All rule names, relation names, concept names, and interface names from $F_{\tt bar}$are prefixed with ${\tt bar}$.

Surely, name clashes can occur. If, for example, system ${\tt foo}$ contains a name bar.account and ${\tt bar}$ contains a name account, the system $D_{\tt foo\ INCLUDES\ bar}$ has a name clash. We will forbid that to ensure a disjoint union semantics.

Alias

In the current implementation, two relation declarations with the same name, source, and target are treated as the same. I don't mind this to remain, but it does not work across the INCLUDE mechanism (because we forbid name clashes). I propose to do this explicitly with an ALIAS statement, for example:

ALIAS client, bar.client

This statement presumes that aliases have the same type, or else we get type errors. Needless to say, the ALIAS statement can also work inside one namespace. It is not linked to the INCLUDE mechanism. Aliasing works for concepts and relations, but not for other named entities.

Consequences

This mechanism excludes cyclic INCLUDE-dependencies. I expect the proposed mechanism to meet the requirements of the migration mechanism, but I will leave that to @sjcjoosten to verify. I hope that this include-relation between information systems is transitive. If not, I would like to fix that, so we can draw an include-graph of the system.

If module ${\tt foo}$ includes module ${\tt bar}$, we currently implement both ${\tt foo}$ and ${\tt bar}$ on the same database. For distributed systems, we will have to allow them to be implemented on different databases. I suggest we do that in another issue.

@stefjoosten stefjoosten changed the title Multiple datasets Semantics of INCLUDE (this has to do with namespaces) Feb 16, 2024
@hanjoosten
Copy link
Member

I don't get this. What problem is there to be solved?

@stefjoosten
Copy link
Contributor Author

stefjoosten commented Feb 18, 2024

The problem is that we have no agreed-upon semantics of the namespace stuff. So how are we going to build it first-time-right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants