Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for operators #9

Open
Rua opened this issue Feb 24, 2021 · 5 comments
Open

Adding support for operators #9

Rua opened this issue Feb 24, 2021 · 5 comments

Comments

@Rua
Copy link

Rua commented Feb 24, 2021

I am thinking of making a PR to add support for operators to shlex. This would follow the same model as Python-shlex, where the caller specifies which characters they want to treat as operators. Then, the lexer ensures that unquoted strings contain either runs of only operator or non-operator characters (in a quoted string they can still be mixed). No actual interpretation of the operators is done, they just represent a separate set of words.

The current optimisation in shlex, where the input string is iterated by byte instead of codepoint, gets in the way of this however. In order to support any Unicode codepoint as an operator, the lexer has to receive potentially multibyte characters in one go, not individual UTF-8 high bytes. Alternatively, the caller could be asked to specify operator characters as bytes, but this brings its own safety problems; what if the user specifies a high byte as an operator character?

@fenhl
Copy link
Collaborator

fenhl commented Feb 25, 2021

Maybe the best approach here would be to use a separate implementation without that optimization, but only in the case where multibyte characters are relevant.

@Rua
Copy link
Author

Rua commented Feb 25, 2021

So alongside the current Shlex type, a new ShlexOperators type? It's doable, but I fear that would duplicate a lot of code, and make maintenance harder.

@fenhl
Copy link
Collaborator

fenhl commented Feb 26, 2021

Could you tell me what the use-case for this is? Maybe we can come up with a better API then.

@Rua
Copy link
Author

Rua commented Feb 26, 2021

In my case, I want to include the ; operator as a token, as part of the lexing process. Right now, it just gets included in adjacent words, so that, for example both unquoted foo;bar and quoted "foo;bar" get returned by Shlex as the single word foo;bar. In the shell, only the latter would be lexed as one token, while in the former case you would get foo, ;, bar as three. Adding operator support to Shlex would allow more of the shell language to be used, and bring it closer in behaviour to its Python namesake.

@fenhl
Copy link
Collaborator

fenhl commented Apr 23, 2022

Sorry, I forgot about this. I think it's a good idea, though obviously the question of how to implement it is a concern.

So alongside the current Shlex type, a new ShlexOperators type? It's doable, but I fear that would duplicate a lot of code, and make maintenance harder.

I was thinking keeping a single public API but internally switching to a different implementation if and only if at least one of the operators is multibyte.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants