Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restrict symbol and keyword names #94

Merged
merged 10 commits into from
Nov 3, 2020
Merged

restrict symbol and keyword names #94

merged 10 commits into from
Nov 3, 2020

Conversation

carocad
Copy link
Owner

@carocad carocad commented Oct 10, 2020

  • fix: only allow a single / inside symbols not possible in order to maintain backward compatibility of Clojure code. See comments below
  • fix: : is a valid character inside a symbol
  • fix: keywords cannot end with /

@carocad carocad self-assigned this Oct 10, 2020
@sogaiu
Copy link

sogaiu commented Oct 10, 2020

I was surprised to find this case: https://github.com/VladimirMarkovic86/ocr-lib/blob/master/src/clj/ocr_lib/core.clj#L13

parcera.core=> (meta (ast (slurp "https://raw.githubusercontent.com/VladimirMarkovic86/ocr-lib/master/src/clj/ocr_lib/core.clj")))
#:parcera.core{:start {:row 1, :column 0}, :end {:row 1808, :column 5}, :errors ({:row 13, :column 6, :message "extraneous input 'Base64/Decoder/getDecoder' expecting {'(', ')', '[', '{', '^', '#^', '`', ''', '~', '~@', '@', '#', '#{', '#'', '#_', '#?', '#?@', '##', '#=', OCTAL, HEXADECIMAL, RADIX, RATIO, LONG, DOUBLE, STRING, WHITESPACE, COMMENT, NAMED_CHAR, UNICODE_CHAR, UNICODE, OCTAL_CHAR, MACRO_KEYWORD, KEYWORD, SYMBOL}", :type :parser, :symbol "[@88,425:449='Base64/Decoder/getDecoder',<39>,13:6]", :stack (:code :input :form :collection :list :input :form :collection :list)})}

Had no idea this would work:

$ clj
Clojure 1.10.1
user=> (import '[java.util Base64])
java.util.Base64
user=> (Base64/Decoder/getDecoder)
#object[java.util.Base64$Decoder 0x4b9df8a "java.util.Base64$Decoder@4b9df8a"]
user=> 

@carocad
Copy link
Owner Author

carocad commented Oct 10, 2020

wow, that was unexpected. I wasnt sure if this was really "supported" or if this was a bug so I asked on the #clojure-dev channel and it seems to be a bug on Clojure side. The prove is that:

  • Decoder class doesnt contain any getDecoder static method
  • (Base64/flibbetygibbit/getDecoder) this also works, which makes no sense. So it seems that Clojure is just ignoring the second part entirely.

I will wait for more feedback on that and keep this PR as it is until then. Thanks again for the report and let me know if you find any other cases worth looking into :)

@sogaiu
Copy link

sogaiu commented Oct 10, 2020

Thanks for the inquiry -- I didn't expect multiple slashes to work (apart from clojure.core//) in a symbol and didn't realize there may be a bug in clojure.

Here are some other examples:

In order to get clj to not error, this is what I tried:

user=> '(export SQ_DIR=/usr/lib/squeak/$VM_VERSION)
(export SQ_DIR=/usr/lib/squeak/$VM_VERSION)
user=> (type (second '(export SQ_DIR=/usr/lib/squeak/$VM_VERSION)))
clojure.lang.Symbol

Trying to feed in just SQ_DIR=/usr/lib/squeak/$VM_VERSION gave me an error:

user=> SQ_DIR=/usr/lib/squeak/$VM_VERSION
Syntax error compiling at (REPL:0:0).
No such namespace: SQ_DIR=

In any case, this looks like part of data that is consumed by stevedore. I don't know anything about that really, but it looks like it is something that is designed to "Embed shell scripts in Clojure".

via: https://github.com/pallet/pallet/blob/d76241aeba43fef90a8f10208b7abd0c88f17d35/src/pallet/stevedore.clj#L2


I realize you have this:

// multiple : and / are allowed inside keywords for backward compatibility

but FWIW, the current commit seems to not like some keywords with multiple slashes:

parcera.core=> (meta (ast ":/path/from/request/to/a/file/or/folder"))
#:parcera.core{:start {:row 1, :column 0}, :end {:row 1, :column 44}, :errors ({:row 1, :column 0, :message "extraneous input ':/path/from/request/to/a/file/or/folder' expecting {<EOF>, '(', '[', '{', '^', '#^', '`', ''', '~', '~@', '@', '#', '#{', '#'', '#_', '#?', '#?@', '##', '#=', OCTAL, HEXADECIMAL, RADIX, RATIO, LONG, DOUBLE, STRING, WHITESPACE, COMMENT, NAMED_CHAR, UNICODE_CHAR, UNICODE, OCTAL_CHAR, MACRO_KEYWORD, KEYWORD, SYMBOL}", :type :parser, :symbol "[@0,0:38=':/path/from/request/to/a/file/or/folder',<39>,1:0]", :stack (:code)})}

The source for that is (sorry got this wrong initially, edited to fix):

parcera.core=> (meta (ast (slurp "https://raw.githubusercontent.com/cyverse-de/common-swagger-api/master/src/common_swagger_api/schema/data/exists.clj")))#:parcera.core{:start {:row 1, :column 0}, :end {:row 51, :column 5}, :errors ({:row 28, :column 3, :message "extraneous input ':/path/from/request/to/a/file/or/folder' expecting {'(', '[', '{', '}', '^', '#^', '`', ''', '~', '~@', '@', '#', '#{', '#'', '#_', '#?', '#?@', '##', '#=', OCTAL, HEXADECIMAL, RADIX, RATIO, LONG, DOUBLE, STRING, WHITESPACE, COMMENT, NAMED_CHAR, UNICODE_CHAR, UNICODE, OCTAL_CHAR, MACRO_KEYWORD, KEYWORD, SYMBOL}", :type :parser, :symbol "[@130,947:985=':/path/from/request/to/a/file/or/folder',<39>,28:3]", :stack (:code :input :form :collection :list :input :form :collection :map)})}

via: https://github.com/cyverse-de/common-swagger-api/blob/master/src/common_swagger_api/schema/data/exists.clj#L28

@sogaiu
Copy link

sogaiu commented Oct 22, 2020

I made an errror in one of the code snippets above and have corrected it. Sorry about that.

For reference, I believe the correct code and response is:

parcera.core=> (meta (ast (slurp "https://raw.githubusercontent.com/cyverse-de/common-swagger-api/master/src/common_swagger_api/schema/data/exists.clj")))
#:parcera.core{:start {:row 1, :column 0}, :end {:row 51, :column 5}, :errors ({:row 28, :column 3, :message "extraneous input ':/path/from/request/to/a/file/or/folder' expecting {'(', '[', '{', '}', '^', '#^', '`', ''', '~', '~@', '@', '#', '#{', '#'', '#_', '#?', '#?@', '##', '#=', OCTAL, HEXADECIMAL, RADIX, RATIO, LONG, DOUBLE, STRING, WHITESPACE, COMMENT, NAMED_CHAR, UNICODE_CHAR, UNICODE, OCTAL_CHAR, MACRO_KEYWORD, KEYWORD, SYMBOL}", :type :parser, :symbol "[@130,947:985=':/path/from/request/to/a/file/or/folder',<39>,28:3]", :stack (:code :input :form :collection :list :input :form :collection :map)})}

The original version was missing the slurp-related portion and consequently had an incorrect response as well.

@carocad carocad added the hammock needed More time is required to solve this label Oct 22, 2020
@carocad
Copy link
Owner Author

carocad commented Nov 1, 2020

Although the current Clojure reader implementation does "support" multiple / on literal keywords and symbols this is clearly not a feature; see CLJ-1530.

Some projects out there currently rely on that but that still leaves the door open to consumers of this library as to what to do with those symbols/keywords at runtime. What is the name, what is the ns? without being able to answer those questions and without a clear position from the Clojure team, I am unwilling to push this responsability to the consumers and just "leave them on their own".

Assuming that the current Clojure reader documentation is wrong; there would be no clear rules as to what a valid symbol/keyword is therefore parcera would just have to "accept anything" which not useful either.

@sogaiu
Copy link

sogaiu commented Nov 2, 2020

Thanks for mentioning CLJ-1530.

FWIW, my impression is that use in symbols is less frequent than for keywords (though I don't have numbers).

I guess this is another reason to work on querying code for certain patterns to get some measurements...I wonder if grape or grasp might be usefully applied.

Not a particularly great idea may be, but in the case of keywords, I saw the portion after the colon as being URLs on more than a few occasions. If that is the main use case, perhaps that can be treated as a special case...This is just speculation though, I would imagine one would want some concrete data to back up this kind of thing.

@carocad
Copy link
Owner Author

carocad commented Nov 2, 2020

I saw the portion after the colon as being URLs on more than a few occasions.

yeah, I have seen those. In general, I know that the Clojure team supports those through runtime keywords (keyword "my.domain/url/path") but not as keywords literals. However, then again those wouldn't support clj -> end -> clj roundtrips so we really need clarification there 🤷

This is just speculation though, I would imagine one would want some concrete data to back up this kind of thing.

For the time being Clojure follows the benevolent dictator for life workflow so even with data to back it up, the decision is still up to them.

@carocad carocad merged commit 718de49 into master Nov 3, 2020
@carocad carocad deleted the fix/names branch November 3, 2020 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hammock needed More time is required to solve this
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants