Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata on grammar rules #818

Open
svallory opened this issue Dec 10, 2022 · 9 comments
Open

Metadata on grammar rules #818

svallory opened this issue Dec 10, 2022 · 9 comments
Labels
grammar Grammar language related issue proposal

Comments

@svallory
Copy link

svallory commented Dec 10, 2022

I would like to be able to define metadata on rules that I can then programmatically access using the generated Grammar.

For now, the use case I intend to use it for is automating the .tmLanguage.json generation. If I could write the code below...

/**
 * @scope: constant.character.escape.mylang
 */
terminal fragment ESCAPED_CHAR: '\\' ('n'|'t'|'r'|'\\');

/**
 * @scope: string.quoted.double.mylang
 */
terminal STRING: '"' ( ESCAPED_CHAR | !('\\'|'"') )* '"';

I could iterate the rules in the Grammar object, determine the tmLanguage json structure and even reuse the regex's.

In fact, I'm going to create a Map<RuleName, scope> and implement the tmLanguage generation algorithm. I won't have time to create a PR any time soon, but I'm happy to share my code here.

@msujew msujew added grammar Grammar language related issue proposal labels Dec 10, 2022
@msujew
Copy link
Member

msujew commented Dec 10, 2022

I believe this is related to #699. We spoke about using TypeScript-style in another discussion somewhere.

I like the idea, it keeps the grammar slim and makes services that use these kinds of annotations very flexible 👍

@Lotes
Copy link
Contributor

Lotes commented Jun 13, 2023

I still like the ideas (#699 + #818) of having annotations on my Langium grammar rules. That would be so sexy... so I was thinking about it and found some problems. Would be good to find an answer.

Problem 1: Hidden rules

One thing that bothers me here are binary operators as one symptom of a problem.
Imagine you implemented a calculator:

/** a */
Expression: Additive;
/** b */
Additive extends Expression: Multiplicative ({BinaryOp.left = current} op='+' right=Multiplicative)*;
/** c */
Multiplicative extends Expression: Primary ({BinaryOp.left = current} op='*' right=Primary)*;
/** d */
Primary: number=NUMBER;

Where does the metadata will end up? It is not the type definitions, cannot be. The reason is that Additive and Multiplicative are made to BinaryOp, which has no rule, but an interface.

The other location could be the grammar JSON. Then each rule could get a metadata field

/**
 * @scope table
 * @scope statement
 * @param out
 */

would become

const metadata: Record<string, string[]> = {
  scope: ["table", "statement"],
  param: ["out"]
}

Problem 2: Handling fragments

And what about fragments? We have these parser rule fragments...

FunctionSignature: 'fun' NameAndGenerics '(' ... ')';
VariableSignature: 'var' NameAndGenerics;
//@scope xyz
fragment NameAndGenerics: name=ID ('<' ... '>')?

I think when resolving the metadata, the data of NameAndGenerics should be copied into the two rules where it is used.

Problem 3: UX & metadata computation

How handy would it be to locate a rule meta data by giving an AstNode?
Let's say we have a calculator grammar and want to write a scope provider and want to know in which scope you can look up the cross-reference (e.g. class or function).

//@scope class
MemberCall: 'this' '.' member=[Member];

You will have a MemberCall object in your virtual hand. You get the grammar from the services, how to determine the metadata? You have the Node, not the Rule... Also, think of the binary operation, do you get a,b,c,d or all of them?

@svallory
Copy link
Author

Hey @Lotes maybe I wasn't clear in the original post, but the metadata I proposed should be applied to the Langium grammar rule (and Langium grammar AstNode) themselves, and not to the language nodes the rules will parse. So I think those concerns don't apply here.

I don't see how metadata would be useful in this way since you could simply inject an attribute in the node with the value you want.

The benefit I see in adding annotations to grammar rules is that it would allow one to provide information for custom generators that operate on the grammar to create, for example, syntax highlighting grammars for Sublime, TextMate/VSCode, prism, highlights, etc

@Lotes
Copy link
Contributor

Lotes commented Jun 16, 2023

Sorry, I have the tendency to overcomplicate things ^^*…

We have that internal function findCommentNode. We could add it as a new CommentService. I can have a closer look on Monday :)

@Lotes
Copy link
Contributor

Lotes commented Jun 26, 2023

@svallory I made a PR draft here

The comment provider extracts the comment that is located before the AstNode. Inside the PR is also some test case, which should be your use case.

If you want to have parsed comments, we also have a documentation provider under services.documentation.DocumentationProvider.

@svallory
Copy link
Author

Hey, @Lotes! I'm sorry for the late response, I was sure I had answered this. I was finishing a project this past month and with all the rush to launch, this slipped my mind.

I just checked your PR and saw the test case. This is precisely what I needed! Thank you so much! :)

Well, I'll still need to parse the comment to get structured data as I want to be able to add more information as I need to easily.

What I want to do with this is automatically generate a .tmLanguage TextMate grammar for syntax highlighting so I'm thinking at least I'll need a @scope and @ruleName. Does langium have any kind of JSDoc parser?

I'm not promising any delivery date, but would you guys be interested in adding that to Langium? If so, I would appreciate some guidance on how to plug it (btw, this could be the start of a generator plugin architecture). If not, I'll just build it as a separate CLI that takes in a .langium grammar file and spits out a .tmlanguage

@msujew
Copy link
Member

msujew commented Jul 20, 2023

@svallory We do have a JSDoc parser integrated in the framework, see here.

I'm not promising any delivery date, but would you guys be interested in adding that to Langium? If so, I would appreciate some guidance on how to plug it (btw, this could be the start of a generator plugin architecture).

Having a plugin architecture would be pretty interesting. We're currently moving to ESM see (#1125), so dynamic imports in the CLI shouldn't be an issue.

@svallory
Copy link
Author

svallory commented Jul 25, 2023

@msujew Awesome! The integrated JSDoc parser will come in pretty handy. And ESM will make plugin loading really flexible allowing for build runtime selection of which plugins to apply

I'll look at the easiest way to run my generator for now, since I want to focus on the feature. I was thinking of external CLI importing Langium to use the grammar and JSDoc parsers, but I noticed there's no public API or programmatic use documentation in the docs, so it would be hard to know what is exported and what isn't.

Is there a Langium API or internal architecture documentation I can read to understand Langium better?

@spoenemann
Copy link
Contributor

Is there a Langium API or internal architecture documentation I can read to understand Langium better?

It depends on what aspects you'd like to understand. Basically the code is a collection of services that are plugged together using DI:
https://langium.org/docs/configuration-services/

The processing of text documents is explained here:
https://langium.org/docs/document-lifecycle/

To understand the LSP integration, the best thing is to look through this code:
https://github.com/eclipse-langium/langium/blob/main/packages/langium/src/lsp/language-server.ts

Otherwise, feel free to ask specific questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grammar Grammar language related issue proposal
Projects
None yet
Development

No branches or pull requests

4 participants