Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import/include other grammars #38

Closed
ceymard opened this issue Aug 15, 2011 · 32 comments
Closed

Import/include other grammars #38

ceymard opened this issue Aug 15, 2011 · 32 comments
Labels

Comments

@ceymard
Copy link

ceymard commented Aug 15, 2011

It could be extremely useful to have the ability to define grammars by importing rules from other grammars.

Several ideas ;

@include "expression.pegjs"
(or @from "expression.pegjs" import expression)

tag_if
    = "if" space? expression space? { ... }

@import "expression.pegjs" as expr

tag_if
    = "if" space? expr.expression space?

Ideally, this would not re-generate the whole code in every .pegjs that includes another ; maybe we would have to modify a little the behaviour of parse() to something of the like ;

Editing as per what you were saying in the options issue ;

parse(input, startRule)
->
parse(input, { startRule: "...", startPos : 9000 })

And at the end, if startPos != 0 && result !== null, we don't check if we went until input.length, but instead return the result as well as the endPos (don't really know how to do that elegantly - maybe simply modifying the options parameter ?).

It would allow reusability of grammars and modularisation of the code, which I think are two extremely important aspects of coding in general.

@dmajda
Copy link
Contributor

dmajda commented Aug 20, 2011

I agree that this is an important feature, I want to do this after version 1.0.

(BTW I don't like the Python-like syntax you propose — something similar to Node.js's require would be better because it would be more familiar to JavaScript programmers. But this is a minor thing that can be ironed out later.)

@ceymard
Copy link
Author

ceymard commented Aug 20, 2011

Would you consider it for inclusion before 1.0 if provided with a patch ?

I agree on your remark about the python syntax.

@s3u
Copy link

s3u commented Oct 2, 2011

+1 for this feature

@dmajda
Copy link
Contributor

dmajda commented Jan 10, 2012

@ceymard Yes, I would consider it.

@dignifiedquire
Copy link

+1 for the feature and +1 for require style inclusion

@dignifiedquire
Copy link

@dmajda @ceymard Do you have any thoughts already on how to implement this? I need this for a project at work and will try to implement. The question is should this be just an addition to split grammars into multiple files or something like inheritance, so one could inherit all rules for example and then overwrite specific rules in the new grammar.

@dmajda
Copy link
Contributor

dmajda commented Feb 23, 2013

@dignifiedquire I am currently thinking about syntax & semantics that can probably be best explained by an example:

static-languages.pegjs

langauges  = "C" / "C++" / "Java" / "C#"

dynamic-languages.pegjs

languages = "Ruby" / "Python" / "JavaScript"

all-languages.pegjs

static  = require("./static-languages")
dynamic = require("./dynamic-languages")

all = static.languages / dynamic.languages

Each .pegjs file would implicitly define a module that would export all the rules it contains. The <name> = require(<module>) construct would import such a module. Its rules would then be available inside a namespace.

This design is deliberately similar to Node.js. Using namespaces will avoid conflicts. There are two downsides I see:

  1. The <name> = require(<module>) construct is too similar to rule definitions and thus can be confusing (one might think that just one rule is imported).
  2. The . syntax conflicts with the current meaning of ., which is “any character”. This can be solved by ugly hacks (e.g. . surrounded by whitespace means “any character”, while . surrounded by identifiers separates a namespace name from a rule name) or by changing the syntax (e.g. using any keyword to represent “any character”).

@dignifiedquire
Copy link

@dmajda As the <identifier> = <expression> pattern is already taken by the rule definitions, why not do something like this:

static := require("./static-languages")
dynamic := require("./dynamic-languages")

all = static::languages / dynamic::languages

The :: is not used anywhere that I know of in PEG.js and makes it easy to distinguish between namespaces and other things. I'm not sure about the := it brings the point across but feels very foreign for Javascript..

Also if you want to use namespaces, do you think there should be only one namespace per file or should there be a way of creating multiple namespaces in one file like this:

static := {
  languages  = "C" / "C++" / "Java" / "C#"
}

dynamic := {
  languages = "Ruby" / "Python" / "JavaScript"
}

@dmajda
Copy link
Contributor

dmajda commented Feb 24, 2013

I'm not much of a fan of :: and :=, they look alien in javaScript/CoffeeScript world.

I'd also like to keep things simple and define namespaces implicitly only by requiring files. I don't see a big need for anything more complicated.

@otac0n
Copy link

otac0n commented Mar 1, 2013

How about simply:

@require foo = "./foo"

bar = foo:languages

Colons are a compromise, but they are used to separate namespaces in many places: C++, C#, XML, etc.

@michaelficarra
Copy link

: will always be associated with cons for many, many functional programmers. I suggest staying away from that operator. :: looks fine to me. Isn't that used for C++ namespaces? I'm not convinced yet that . is a bad choice, either.

@otac0n
Copy link

otac0n commented Mar 1, 2013

. can't be used without a breaking change. It would be ambiguous in the language.

:: is used in C++ for namespaces, and in C# for namespace prefixes (global::System, for example).

@andreineculau
Copy link
Contributor

I was thinking of a quick workaround on this topic - to solve simple inheritance only - glue pegjs files together, while having everything namespaced.

This might make grammars too verbose, and involves a building step - but looking at the bright side, it would force you to have granular DRY&OTW grammars

And regarding the markup, no saying that this is a proper fit to this thread, but just an option to consider, I was going for a simple __

languages = static__languages / dynamic__languages
<static-languages.pegjs>
<dynamic-languages.pegjs>
/* alternative */
languages = STATIC__languages / DYNAMIC__languages

@odonnell
Copy link

@andreineculau I'm basically already doing this with a build step, so if you and others are just looking for something to generate useful parsers from a grammar with a dependency tree (where a single parser implementing the combined grammar is generated), I might clean what I have up and release it so the discussion can refocus on how to deal with this in a more permanent way.

Another thing: approaching this primarily by designing extensions to the grammar syntax misses something important, which is that one of the main reasons we all have the itch to pull in rules from other grammars (another being clarity) is the need to write parsers that share a lot of logic. So, while generated parsers might never be meaningfully re-composable at parse-time, it seems important that a tree of grammars generate a tree of parsers, rather than one monolithic parser. It's most important when a set of parsers will be part of a web UI, but it generally doesn't hurt to avoid unnecessary bloat in generated code.

@andreineculau
Copy link
Contributor

@odonnell +1 for releasing anything - no matter if you have the time to clean it up

and +1 for the clarification. This should be treated as a quick workaround, not a long-term proper solution.

@andreineculau
Copy link
Contributor

@odonnell my take on it is online at https://github.com/andreineculau/core-pegjs - please poke me if you have something better.

@cpettitt
Copy link

cpettitt commented Sep 3, 2013

+1 for this feature

@ne-sachirou
Copy link

👍

2 similar comments
@adammichalik
Copy link

👍

@goldibex
Copy link

👍

@goldibex
Copy link

I went and wrote a plugin/extension for PEG.js that does imports: https://github.com/casetext/pegjs-import.

@yinso
Copy link

yinso commented Oct 23, 2014

+1 for this as well.

@Mingun
Copy link
Contributor

Mingun commented Feb 3, 2015

I implement this in #308 in generic way: inclusion of grammar is only one way to implement decomposition rules.

@dmajda dmajda changed the title Import/Include other grammars Import/include other grammars Aug 14, 2015
Mingun added a commit to Mingun/pegjs that referenced this issue Dec 27, 2015
… grammar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.
Mingun added a commit to Mingun/pegjs that referenced this issue Dec 27, 2015
Mingun added a commit to Mingun/pegjs that referenced this issue Dec 27, 2015
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 3, 2016
… grammar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 3, 2016
@dmsnell
Copy link

dmsnell commented Dec 28, 2016

@dmajda I'm coming late to this party, but I wonder how often we need to import many rules from another library. I would love to be able to import things like Url and Email into my composed grammars but I don't care that Url may also have things like HierarchicalPart and AsciiLetter. Do you think something like Node's named exports would be a viable way forward, keeping the benefits of namespacing but allowing direct named imports?

import { SchemalessUrl, Url } from "./Urls.pegjs"

Token
  = PhoneNumber
  / Url
  / SchemalessUrl

Namespacing has been an issue for me as I try and explore writing otherwise-composable grammars. I'm stuck right now including files in files and naming things the way PHP functions were named before they introduced proper namespaces: UrlIpHost, HtmlQuotedString, etc…

@eraxillan
Copy link

@dmajda @futagoza

Any progress on this issue? or the primary discussion living now on #473 ?
My grammar file is growing very fast :(
It would be nice to split it several ones

@mikeaustin
Copy link

I wouldn't mind being able to split grammars between files, simply for organization and composition. It would make them easier to test and re-use, as well as providing a way swap grammars dynamically, maybe? Just some thoughts.

The JavaScript example that I used as a base is over 1,300 lines. It took a while to learn where everything was, and jump around and edit different sections.

@eraxillan
Copy link

@mikeaustin I see this feature as some kind of Node.JS required:

cat bash.pegjs
{
const _ = require("whitespace");
const LB = require("line_break");
const CodeBlock = require("code_block");
const BoolExpr = require("boolean_expression");
}
...
IfStatement = "if" _ "[" BoolExpr "]" _ ";" _ "then" LB? CodeBlock "fi"

@jodevsa
Copy link

jodevsa commented Jun 5, 2017

I agree, splitting grammars and making them modular is a great feature, however handling these case's would be a a problem:
1- sub-grammar that relies on a global variable that was defined in the main grammar code ?
2- duplicate variables and grammar name ?

IMO, a temporally convenient approach would be creating a new addon for PEG.js (independent from PEG.js) that defines a keyword for importing (for example @load(anotherGrammarFileLocation) ) keyword should not part of javacsript/peg.js grammar,
build a reg-exp or a peg grammar to detect that keyword and substitute it with "anotherGrammarFile Location" content , and send the substituted code to PEG.js

Example:

integers.pegjs

integers=[0-9]* {return parseInt(text())}

main.pegjs
arrayOfInteger="["(integers ",")* integers"]"
@load("integers.pegjs")

Note using this method, if someone did not define the start grammar, and placed @load before "arrayOfInteger" peg.js will assume the first grammar as the start ( integers grammar)

One approach to handle this is , use same names of filename and start grammar and let the new ad-don manually configure the start attribute as the file name, or substitute all content at the end of file.

user should be responsible of any duplication .

@andreineculau
Copy link
Contributor

I just want to highlight that this issue is primarily an optimization request, because composability/modularity is something that you can achieve on your own, especially when you control the full spectrum of the grammar.

If you're not comfortable with a grammar 1k-lines long, then split it up, and concatenate it back as you see fit before pumping it into pegjs.

Mingun added a commit to Mingun/pegjs that referenced this issue Jun 12, 2017
… grammar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.
Mingun added a commit to Mingun/pegjs that referenced this issue Jun 12, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Jun 12, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Oct 10, 2017
… grammar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.
Mingun added a commit to Mingun/pegjs that referenced this issue Oct 10, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Oct 10, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Oct 14, 2017
@futagoza futagoza removed this from the post-1.0.0 milestone Nov 29, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 22, 2018
… grammar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 22, 2018
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 22, 2018
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 22, 2018
…mar to another.

Conflicts:
	lib/compiler/index.js
	package.json
Mingun added a commit to Mingun/pegjs that referenced this issue Nov 24, 2019
Mingun added a commit to Mingun/peggy that referenced this issue Jun 11, 2022
… in the compiler:

```
import { rule1, rule2 as alias2, ..., ruleN } from '<string with path to the .peggy file>';
```

Import clauses expected before (top) initializer code block.

All import clauses appeared in the `grammar` AST node in the `imports` node property.
This property contains array of AST `import` nodes with the`rules` and `path` properties.
`rules` contains array of the `imported_rule` AST nodes.
Mingun added a commit to Mingun/peggy that referenced this issue Jun 11, 2022
Mingun added a commit to Mingun/peggy that referenced this issue Feb 22, 2023
… in the compiler:

```
import { rule1, rule2 as alias2, ..., ruleN } from '<string with path to the .peggy file>';
```

Import clauses expected before (top) initializer code block.

All import clauses appeared in the `grammar` AST node in the `imports` node property.
This property contains array of AST `import` nodes with the`rules` and `path` properties.
`rules` contains array of the `imported_rule` AST nodes.
Mingun added a commit to Mingun/peggy that referenced this issue Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests