-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for operators #9
Comments
Maybe the best approach here would be to use a separate implementation without that optimization, but only in the case where multibyte characters are relevant. |
So alongside the current |
Could you tell me what the use-case for this is? Maybe we can come up with a better API then. |
In my case, I want to include the |
Sorry, I forgot about this. I think it's a good idea, though obviously the question of how to implement it is a concern.
I was thinking keeping a single public API but internally switching to a different implementation if and only if at least one of the operators is multibyte. |
I am thinking of making a PR to add support for operators to shlex. This would follow the same model as Python-shlex, where the caller specifies which characters they want to treat as operators. Then, the lexer ensures that unquoted strings contain either runs of only operator or non-operator characters (in a quoted string they can still be mixed). No actual interpretation of the operators is done, they just represent a separate set of words.
The current optimisation in shlex, where the input string is iterated by byte instead of codepoint, gets in the way of this however. In order to support any Unicode codepoint as an operator, the lexer has to receive potentially multibyte characters in one go, not individual UTF-8 high bytes. Alternatively, the caller could be asked to specify operator characters as bytes, but this brings its own safety problems; what if the user specifies a high byte as an operator character?
The text was updated successfully, but these errors were encountered: