Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keywords mode broken for complex keys that happen to contain / #64

Closed
bshepherdson opened this issue Sep 28, 2022 · 6 comments · Fixed by #74
Closed

Keywords mode broken for complex keys that happen to contain / #64

bshepherdson opened this issue Sep 28, 2022 · 6 comments · Fixed by #74

Comments

@bshepherdson
Copy link

This logic https://github.com/clj-commons/clj-yaml/blob/master/src/clojure/clj_yaml/core.clj#L144 uses (or (keyword k) k) to try to parse map keys when the keywords setting is true.

Unfortunately clojure.core/keyword is a bit too forgiving, and parses some things that it probably shouldn't. My map has JSON strings as keys, and some of the parts of those keys have / in them, which results in this:

(def k "[\"ref\",\"type/BigInteger\"]")
(keyword k)              ; => :["ref","type/BigInteger"]
(namespace (keyword k))  ; => "[\"ref\",\"type"
(name (keyword k))       ; => "BigInteger\"]"

Since there are unmatched quotes and so on in there, the resulting EDN is mangled - doesn't (read-string (pr-str x)) properly anymore.

(yaml.core/parse-string
  "column_settings: {'[\"ref\",\"type/BigInteger\"]': {column_title: Invoices}}"
  :keywords true)
; => {:column_settings #:["ref","type{:BigInteger"] {:column_title "Invoices"}}}

which has unbalanced {}s because the second { is quoted but there are three real }}} at the end.

A stricter condition for what makes a valid keyword is needed. I've worked around this with the regex #"^[0-9a-zA-Z_/\.\-]+$" for my own purposes; that may not be quite general enough.

@lread
Copy link
Collaborator

lread commented Sep 28, 2022

Interesting! Thanks for reporting @bshepherdson!

@lread
Copy link
Collaborator

lread commented Sep 28, 2022

Research

I had a look around at what other libraries do here.

  • Clojure's data.json is a bit smarter by leaving things up to the caller. It lets you provide a :key-fn - you decide exactly how you want to convert.
  • Cheshire has both options, effectively a boolean like us and the ability to specify a fn like data.json. It looks like it just uses Clojure's keyword like we do when not using a fn.

Here are some initial ideas:

Option 1: Do nothing.

Let the people suffer! 😈

Option 2: Silently do not convert a YAML key to a Clojure keyword when the resulting keyword would be unreadable.

We do this for some cases now, anything that returns nil for keyword is left as is.
Ex.

user=> (keyword 45)
nil

Option 3: Throw when the conversion from YAML key to Clojure keyword results in an unreadable keyword.

This is a variant of Option 2.
Would this be considered a breaking change?
Is it ever desirable behaviour?

Option 4: Introduce a :keywords-fn option ala data.json and Cheshire libs.

This would allow the caller to do whatever conversion they like.

Musings

I don't like option 1. Far too cruel.

I don't know that option 3 would be desirable. Maybe it would be good to know that what you asked for could not happen, then you'd switch to Option 4?

Option 4 seems like a clear winner to me, it gives the user full control to do whatever they like.

We could also maybe do some work around option 2 - or alternatively even deprecate the :keywords option in favour of a new :keywords-fn option.

Thoughts, alternatives, ideas anyone?

@borkdude
Copy link
Collaborator

I'm away today but will take a look tomorrow

@borkdude
Copy link
Collaborator

I prefer option 4. Roundtripping JSON <-> EDN with keywords is a known problem when keys contain spaces, etc. I would not put any efforts in trying to "do the right thing" but leave this right thing to the user. I prefer the name :key-fn (subjective, I know, but it's the same as clojure.data.json).

@lread
Copy link
Collaborator

lread commented Sep 30, 2022

Thanks @borkdude, sounds good to me.

I'll follow up with a PR that also updates docs to promote usage of :key-fn over existing :keywords.

Note clj-yaml docs state:

By default, keys are converted to clojure keywords. To prevent this, add :keywords false parameters to the parse-string function

I feel this is a bit of an unfortunate inherited design choice. I would have preferred that this behaviour be off by default.

@lread
Copy link
Collaborator

lread commented Oct 4, 2022

I like :key-fn but I will drift from clojure.data.json in function args.
I'll have it accept a map for future extensibility.
Initially, the only recognized key will be :key.

lread added a commit to lread/clj-yaml that referenced this issue Oct 4, 2022
The parse-string and parse-stream function now accept new :key-fn
option.

See docs and docstrings for details.

Closes clj-commons#64
@lread lread closed this as completed in #74 Oct 4, 2022
lread added a commit that referenced this issue Oct 4, 2022
The parse-string and parse-stream function now accept new :key-fn
option.

See docs and docstrings for details.

Also:
- docs: move the note on keyword args outside of parsing

Closes #64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants