Import support #50

ccleve · 2021-10-13T17:08:46Z

Antlr supports "import" statements, where you can import an external file and treat its contents as if they were part of the current file. I use this a lot to create multiple parsers with different capabilities that share common functionality.

See https://github.com/antlr/antlr4/blob/master/doc/grammars.md#grammar-imports

Does Packcc have similar functionality?

arithy · 2021-10-16T02:31:26Z

PackCC does not have a functionality like "import".
Thanks for the suggestion.

I understand its usability, but currently I'm reluctant to realize it since it brings some complexity in PackCC's simpleness.
Let me only add this issue to the want list for the time being.
If there are strong requests by several users, I'll begin to implement it.

dolik-rce · 2021-10-16T07:05:11Z

I believe import functionality should be quite easily implemented by custom PCC_GETCHAR. It would just have to check if the next few characers is an import statement and start to read another file. At the same time, it would have to keep track of the "stack" in *auxil structure, so it knows to which file to return when the current one is parsed.

I don't think support for this should be directly in PackCC, as it only works with stream of characters. It doesn't know anything about files. At best, there could be some function to help with the stack keeping and/or some examples how to implement this.

ccleve · 2021-10-16T19:15:27Z

@dolik-rce This isn't a suggestion to import data being parsed. It's a suggestion to import specs into the parser itself while the parser is being generated. For example, imagine that we're parsing a query language that must handle standard keywords like AND, OR, and NOT, but must also have special handling for different languages like French or Arabic. In that case it would be really helpful to create a separate .peg file for every language, but then import a standard, common .peg file that recognizes keywords. It would save a lot of copying and pasting.

dolik-rce · 2021-10-17T17:52:41Z

@ccleve Oh, I see. My bad, I guess I wasn't paying enough attention when I read the issue 🙄

ethindp · 2024-04-01T17:47:38Z

Please please implement this. I'm writing a parser for a language that is case-insensitive and needs UCD categories. I've generated both the categories and permutations for keywords but this creates an unwieldy grammar file that's more than 15K LOC. (I'm uncertain how packcc is going to handle this but we'll see....) Also it might be a good idea to add a way of telling ackcc "Hey this string literal should be matched case-insensitively" because I don't like generating thousands of word permutations even if it is fast.)

dolik-rce · 2024-04-01T18:03:01Z

Also it might be a good idea to add a way of telling ackcc "Hey this string literal should be matched case-insensitively" because I don't like generating thousands of word permutations even if it is fast.)

@ethindp: This is slightly off-topic, but do you know, you can use character classes to match keywords case-insensitively? E.g.: [kK] [eE] [yY] [wW] [oO] [rR] [dD] will match "keyword", "Keyword", "KEYWORD" as well as "kEyWoRd" and all other weird permutations.

ethindp · 2024-04-01T20:47:51Z

I... Stupidly didn't think about that, thank you for the reminder!

arithy · 2024-04-21T14:16:44Z

@ccleve , @ethindp , I have introduced the import functionality. Please check it.

%import "import file name"

The content of the specified import file is expanded at the text location of %import (version 2.0.0 or later).
This can be used multiple times anywhere and can be used also in imported files.
The import file name can be a relative path to the current directory or an absolute path.
If it is a relative path, the directories listed below are searched for the import file in the listed order.

the directory where the file that imports the import file is located

the directories specified with -I options

They are prioritized in order of their appearance in the command line.

the directories specified by the environment variable PCC_IMPORT_PATH

They are prioritized in order of their appearance in the value of this variable.

The character used as a delimiter between directory names is the colon ':' if PackCC is built for a Unix-like platform such as Linux, macOS, and MinGW.
The character is the semicolon ';' if PackCC is built as a native Windows executable.
(This is exactly the same manner as the environment variable PATH.)

the per-user default directory

This is the subdirectory .packcc/import in the home directory if PackCC is built for a Unix-like platform,
and in the user profile directory, "C:\Users\username" for example, if PackCC is built as a native Windows executable.

the system-wide default directory

This is the directory /usr/share/packcc/import if PackCC is built for a Unix-like platform,
and is the subdirectory packcc/import in the common application data directory, "C:\ProgramData" for example.

Note that the file imported once is silently ignored when it is attempted to be imported again.

dolik-rce · 2024-04-21T17:13:40Z

I have just quickly tested the imports. It's very intuitive and seems to work very well. If I understand correctly, there is slight difference in the behavior: Unused rules in the imported files are ignored, while in the main parsed files they result in error.

This is totally understandable and good if you wish to use a prepared library (as is the case with the bundled ascii and unicode classes), but it might be surprising if someone uses import just to break single grammar to multiple smaller files for better readability. It should probably be documented in the README.

It might also make sense to let user choose if he wants to check for unused rules in imported files (e.g. by using different directive) or not. Not sure if that is possible to implement easily. I didn't study the new code enough to understand it yet 🙂

EDIT: Oh, now I see. The warning for unused rules has been removed in 7b4aa25 for all rules, nut just those from imported files. That is a bit surprising, I liked that feature - it usually made me realized that I did some stupid error in my grammar 🙂

arithy · 2024-04-21T23:27:31Z

Thank you for your immediate feedback!

arithy added the enhancement New feature or request label Oct 16, 2021

arithy mentioned this issue Apr 14, 2024

Consider add support for UCD(Unicode Character Database) rule pattern #68

Closed

arithy closed this as completed May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import support #50

Import support #50

ccleve commented Oct 13, 2021

arithy commented Oct 16, 2021

dolik-rce commented Oct 16, 2021

ccleve commented Oct 16, 2021

dolik-rce commented Oct 17, 2021

ethindp commented Apr 1, 2024

dolik-rce commented Apr 1, 2024

ethindp commented Apr 1, 2024

arithy commented Apr 21, 2024

dolik-rce commented Apr 21, 2024 •

edited

Loading

arithy commented Apr 21, 2024

Import support #50

Import support #50

Comments

ccleve commented Oct 13, 2021

arithy commented Oct 16, 2021

dolik-rce commented Oct 16, 2021

ccleve commented Oct 16, 2021

dolik-rce commented Oct 17, 2021

ethindp commented Apr 1, 2024

dolik-rce commented Apr 1, 2024

ethindp commented Apr 1, 2024

arithy commented Apr 21, 2024

dolik-rce commented Apr 21, 2024 • edited Loading

arithy commented Apr 21, 2024

dolik-rce commented Apr 21, 2024 •

edited

Loading