Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import support #50

Closed
ccleve opened this issue Oct 13, 2021 · 10 comments
Closed

Import support #50

ccleve opened this issue Oct 13, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@ccleve
Copy link

ccleve commented Oct 13, 2021

Antlr supports "import" statements, where you can import an external file and treat its contents as if they were part of the current file. I use this a lot to create multiple parsers with different capabilities that share common functionality.

See https://github.com/antlr/antlr4/blob/master/doc/grammars.md#grammar-imports

Does Packcc have similar functionality?

@arithy
Copy link
Owner

arithy commented Oct 16, 2021

PackCC does not have a functionality like "import".
Thanks for the suggestion.

I understand its usability, but currently I'm reluctant to realize it since it brings some complexity in PackCC's simpleness.
Let me only add this issue to the want list for the time being.
If there are strong requests by several users, I'll begin to implement it.

@arithy arithy added the enhancement New feature or request label Oct 16, 2021
@dolik-rce
Copy link
Contributor

I believe import functionality should be quite easily implemented by custom PCC_GETCHAR. It would just have to check if the next few characers is an import statement and start to read another file. At the same time, it would have to keep track of the "stack" in *auxil structure, so it knows to which file to return when the current one is parsed.

I don't think support for this should be directly in PackCC, as it only works with stream of characters. It doesn't know anything about files. At best, there could be some function to help with the stack keeping and/or some examples how to implement this.

@ccleve
Copy link
Author

ccleve commented Oct 16, 2021

@dolik-rce This isn't a suggestion to import data being parsed. It's a suggestion to import specs into the parser itself while the parser is being generated. For example, imagine that we're parsing a query language that must handle standard keywords like AND, OR, and NOT, but must also have special handling for different languages like French or Arabic. In that case it would be really helpful to create a separate .peg file for every language, but then import a standard, common .peg file that recognizes keywords. It would save a lot of copying and pasting.

@dolik-rce
Copy link
Contributor

@ccleve Oh, I see. My bad, I guess I wasn't paying enough attention when I read the issue 🙄

@ethindp
Copy link

ethindp commented Apr 1, 2024

Please please implement this. I'm writing a parser for a language that is case-insensitive and needs UCD categories. I've generated both the categories and permutations for keywords but this creates an unwieldy grammar file that's more than 15K LOC. (I'm uncertain how packcc is going to handle this but we'll see....) Also it might be a good idea to add a way of telling ackcc "Hey this string literal should be matched case-insensitively" because I don't like generating thousands of word permutations even if it is fast.)

@dolik-rce
Copy link
Contributor

Also it might be a good idea to add a way of telling ackcc "Hey this string literal should be matched case-insensitively" because I don't like generating thousands of word permutations even if it is fast.)

@ethindp: This is slightly off-topic, but do you know, you can use character classes to match keywords case-insensitively? E.g.: [kK] [eE] [yY] [wW] [oO] [rR] [dD] will match "keyword", "Keyword", "KEYWORD" as well as "kEyWoRd" and all other weird permutations.

@ethindp
Copy link

ethindp commented Apr 1, 2024

I... Stupidly didn't think about that, thank you for the reminder!

@arithy
Copy link
Owner

arithy commented Apr 21, 2024

@ccleve , @ethindp , I have introduced the import functionality. Please check it.

%import "import file name"

The content of the specified import file is expanded at the text location of %import (version 2.0.0 or later).
This can be used multiple times anywhere and can be used also in imported files.
The import file name can be a relative path to the current directory or an absolute path.
If it is a relative path, the directories listed below are searched for the import file in the listed order.

  1. the directory where the file that imports the import file is located
  2. the directories specified with -I options
    • They are prioritized in order of their appearance in the command line.
  3. the directories specified by the environment variable PCC_IMPORT_PATH
    • They are prioritized in order of their appearance in the value of this variable.
    • The character used as a delimiter between directory names is the colon ':' if PackCC is built for a Unix-like platform such as Linux, macOS, and MinGW.
      The character is the semicolon ';' if PackCC is built as a native Windows executable.
      (This is exactly the same manner as the environment variable PATH.)
  4. the per-user default directory
    • This is the subdirectory .packcc/import in the home directory if PackCC is built for a Unix-like platform,
      and in the user profile directory, "C:\Users\username" for example, if PackCC is built as a native Windows executable.
  5. the system-wide default directory
    • This is the directory /usr/share/packcc/import if PackCC is built for a Unix-like platform,
      and is the subdirectory packcc/import in the common application data directory, "C:\ProgramData" for example.

Note that the file imported once is silently ignored when it is attempted to be imported again.

@dolik-rce
Copy link
Contributor

dolik-rce commented Apr 21, 2024

I have just quickly tested the imports. It's very intuitive and seems to work very well. If I understand correctly, there is slight difference in the behavior: Unused rules in the imported files are ignored, while in the main parsed files they result in error.

This is totally understandable and good if you wish to use a prepared library (as is the case with the bundled ascii and unicode classes), but it might be surprising if someone uses import just to break single grammar to multiple smaller files for better readability. It should probably be documented in the README.

It might also make sense to let user choose if he wants to check for unused rules in imported files (e.g. by using different directive) or not. Not sure if that is possible to implement easily. I didn't study the new code enough to understand it yet 🙂

EDIT: Oh, now I see. The warning for unused rules has been removed in 7b4aa25 for all rules, nut just those from imported files. That is a bit surprising, I liked that feature - it usually made me realized that I did some stupid error in my grammar 🙂

@arithy
Copy link
Owner

arithy commented Apr 21, 2024

Thank you for your immediate feedback!

@arithy arithy closed this as completed May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants