Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding a yaml library #263

Closed
jwhitlark opened this issue Feb 10, 2020 · 31 comments
Closed

Consider adding a yaml library #263

jwhitlark opened this issue Feb 10, 2020 · 31 comments
Assignees

Comments

@jwhitlark
Copy link

Is your feature request related to a problem? Please describe.
I'd like to be able to handle yaml as well as json and edn

Describe the solution you'd like
A yaml library included and exposed in the same way the csv and json are.

@borkdude
Copy link
Collaborator

Do you have a specific Clojure YAML library in mind that we should add? I'm not that familiar with using YAML from Clojure myself.

@dharrigan
Copy link
Contributor

dharrigan commented Feb 11, 2020

SnakeYAML is probably the defacto on the JVM.

https://bitbucket.org/asomov/snakeyaml

@mk
Copy link

mk commented Feb 11, 2020

This sounds useful to me as well. Can only say that we're using https://github.com/owainlewis/yaml which is also based on snakeyaml.

@borkdude
Copy link
Collaborator

borkdude commented Feb 11, 2020

https://github.com/retrogradeorbit/bootleg also uses the lib that @mk mentioned. These are notes from a chat with @retrogradeorbit, the author. bootleg is also based on sci and compiled with GraalVM.

In summary, I think it would be good to add the YAML lib that @mk mentioned in the same way as bootleg does it. A PR is welcome. I'm interested in how much this would add to the binary size.

9:15 AM
Crispin yeah its working fine, but there was some dancing to get it working with graal
9:15 AM
check my project.clj
9:15 AM
and someone opened a ticket with an alternative yaml way
9:16 AM
but I havent had time to try it. And the lib it comes from is bigger than just yaml and wanted to keep my deps fairly small.
9:16 AM
(or as small as I could get away with)
9:16 AM
retrogradeorbit/bootleg#46
cloojurecloojure
#46 YAML upgrade to SnakeYAML 2.0
Hi - I also had a problem with the stale io.forward/yaml and this issue:
owainlewis/yaml#28

So, I wrote my own YAML wrapper to SnakeYAML 2.0. You can see it here:
https://github.com/cloojure/tupelo/blob/master/src/clj/tupelo/parse/yaml.clj

Feel free to either reference this or copy it into your own lib.
Alan
https://github.com/retrogradeorbit/bootleg|retrogradeorbit/bootlegretrogradeorbit/bootleg | Dec 17th, 2019 | Added by GitHub
9:17 AM
wasnt just going to use this without some testing. when I get the time...
9:17 AM
https://github.com/retrogradeorbit/bootleg/blob/master/project.clj#L12-L15
project.clj:12-15
;; owainlewis/yaml#35
[io.forward/yaml "1.0.9"
:exclusions [[org.yaml/snakeyaml]]]
[org.yaml/snakeyaml "1.25"]
https://github.com/retrogradeorbit/bootleg|retrogradeorbit/bootlegretrogradeorbit/bootleg | Added by GitHub
9:17 AM
waiting for io.forward/yaml to do a new release

@jeroenvandijk
Copy link
Contributor

I've also used https://github.com/owainlewis/yaml Works fine for simple use cases. More complicated yaml files (e.g. with custom tags) are more troublesome if i remember correctly. I've never been able to convert cloudformation yaml to json and edn. I still use online converters for this 🙈 owainlewis/yaml#27 (comment) (edited)

@agilecreativity
Copy link
Contributor

I am personally using the clj-yaml with GraalVM with out any issue. The library is maintained by the community and I think that is a big plus.

@borkdude
Copy link
Collaborator

borkdude commented Feb 14, 2020

From a conversation on Slack with @slipset:

borkdude Hi Erik. Small question. What is your experience with clj-yaml, is it a quality lib?
slipset Not using it myself, but it’s what circle is using for their yaml stuff. 
slipset They’re also the main maintainers. 
borkdude main maintainers on clj-commons?
slipset Yeah, Marc and Stig both work at Circle (if you look at the latest commits to clj-yaml)
So I guess if clj-yaml breaks, a lot of stuff breaks. 
borkdude cool, good to know. thanks a lot.

@borkdude
Copy link
Collaborator

I'm not reading or writing YAML that much myself, but if there is a library that is well maintained, I can defer any issues with it to that repo and update to the newest version in the future. Since clj-yaml seems to have a solid foundation with CircleCI backing it, I'll consider it as well.

@borkdude
Copy link
Collaborator

Conversation on Slack:

souenzzo:fulcro: Yesterday at 9:37 PM
Do anyone know how do I parse AmazonYAML in clojure/JVM?
clj-yaml can't understand that !Ref  things
viesti:portkey:  2 hours ago
A while ago dis some experiments on parsing those local tags in Cloudformation yaml: https://github.com/portkey-cloud/cfn-yaml/blob/master/src/cfn_yaml/tags.clj
viesti:portkey:  2 hours ago
the tag support in snakeyaml is quite Java class inheritance heavy, if I recall correctly
viesti:portkey:  2 hours ago
would need some Clojure interop foo to make it feel more mike data

I'm not sure how this would work with babashka?

@borkdude borkdude self-assigned this Feb 18, 2020
@borkdude borkdude added this to To do in Global board via automation Feb 18, 2020
@borkdude
Copy link
Collaborator

Just for reference, here's a project which can deal with Cloudformation yaml:

https://github.com/portkey-cloud/cfn-yaml

@borkdude
Copy link
Collaborator

Interesting comment here about the differences between owainlewis/yaml and clj-commons/clj-yaml:

lancepantz/clj-yaml#28 (comment)

One useful feature of owainlewis/yaml is the ability to ignore unknown YAML tags using the passthrough-constructor. I've had to use this in the past to parse yaml generated by other tools/languages. Arguably, this feature should be easy to port to clj-commons/yaml ;)

@borkdude borkdude moved this from To do to Needs hammock time in Global board Mar 4, 2020
@jeroenvandijk
Copy link
Contributor

FYI, I'm trying to include cfn-yaml in a native library. While trying this, I noticed that at least the cfn-yaml code relies on duck typing (i.e. two different classes both have a .getValue method which is not defined in a common interface). This is troublesome for type hints and basically requires reflection. So now I'm trying to work around the warnings by using the reflection.json file. I'm not sure if this holds for the parent yaml libraries as well, but this is probably something to take into account

@borkdude
Copy link
Collaborator

What yaml CLI tools are available in this space? I can image there are some tools that allow you to convert yaml into JSON? This JSON could then be processed by bb.

@jeroenvandijk
Copy link
Contributor

After working around the issue owainlewis/yaml#35 and ploughing through the type warnings I got an integration with cfn-yaml to compile and to run. Whenever coming across a custom tag though, it fails. I'm not sure what button I need to press to get this to work.

For regular YAML it seems to work fine though

@bn-darindouglass-zz
Copy link

To add my $0.02 from Slack:

My primary use-case for needing bb to be compatible with a YAML library is to convert simple EDN to YAML (for kube manifests). As such, really any library would work.

In the interim I wrote up a very quick edn->yaml parser that could work for some simple cases. It's moderately opinionated but doesn't include any extra java classes that bb doesn't already include (it's just pure clojure).

@viesti
Copy link
Contributor

viesti commented Mar 29, 2020

To my understanding, the parser local tags (used by CloudFormation for example) always require custom implementation from the parser. The snakeyaml documentation puts this nicely:

https://bitbucket.org/asomov/snakeyaml/wiki/Documentation#markdown-header-tags

Since every parser must provide more information at runtime to create instances, local tags help a YAML document to be exchanged with other languages (every parser is free to define what to do with !car)

cfn-yaml is just a very specific parser for CloudFormation YAML. So I think we'd be better off by using a generic snakeyaml wrapper.

@borkdude
Copy link
Collaborator

borkdude commented Mar 29, 2020

I'll try going ahead with the community maintained https://github.com/clj-commons/clj-yaml, unless there are very good reasons to choose a different library. Any features that are missing in clj-yaml can then be requested at the community maintained repo and will eventually end up in babashka. Good idea?

@viesti
Copy link
Contributor

viesti commented Mar 29, 2020

Sounds good to me :)

@borkdude
Copy link
Collaborator

borkdude commented Mar 30, 2020

Why does clj-yaml pull in all these dependencies?

Run lein deps
Retrieving clj-commons/clj-yaml/0.7.0/clj-yaml-0.7.0.pom from clojars
Retrieving org/yaml/snakeyaml/1.24/snakeyaml-1.24.pom from central
Retrieving org/flatland/ordered/1.5.7/ordered-1.5.7.pom from clojars
Retrieving org/flatland/useful/0.11.6/useful-0.11.6.pom from clojars
Retrieving org/clojure/tools.macro/0.1.1/tools.macro-0.1.1.pom from central
Retrieving org/clojure/pom.contrib/0.0.20/pom.contrib-0.0.20.pom from central
Retrieving org/clojure/clojure/1.3.0-alpha5/clojure-1.3.0-alpha5.pom from central
Retrieving org/clojure/tools.reader/0.7.2/tools.reader-0.7.2.pom from central
Retrieving org/clojure/pom.contrib/0.0.26/pom.contrib-0.0.26.pom from central
Retrieving org/yaml/snakeyaml/1.24/snakeyaml-1.24.jar from central
Retrieving org/clojure/tools.macro/0.1.1/tools.macro-0.1.1.jar from central
Retrieving org/flatland/ordered/1.5.7/ordered-1.5.7.jar from clojars
Retrieving org/flatland/useful/0.11.6/useful-0.11.6.jar from clojars
Retrieving clj-commons/clj-yaml/0.7.0/clj-yaml-0.7.0.jar from clojars

That seems quite a lot for some simple back and forth transformation library?
I also noticed adding this library adds 2.5mb to the binary size, which is quite a lot. I'm starting to have second thoughts.

@borkdude borkdude moved this from Needs hammock time to Paused in Global board Mar 30, 2020
@viesti
Copy link
Contributor

viesti commented Mar 30, 2020

Hmm, seems that org.flatland/ordered pulls org.flatland/useful, which then pulls a couple more.

I wonder if we could have a version that uses https://github.com/frankiesardo/linked, which seems to have no dependencies. clj-yaml uses ordered set/hashmap for preserving the order of items read from yaml, when read into Clojure data.

@borkdude
Copy link
Collaborator

@viesti Tried swapping it out with frankiesardo/linked and that worked: https://github.com/borkdude/clj-yaml, as in: the test worked. Also roundtripped .github/workflows/build.yml with the old version and the one with linked and they yield the same result.

At the same time, @slipset is looking into making flatland/ordered not depend on flatland/useful:
clj-commons/ordered#48

When that is in place, we can have a look at including it in bb again.

@borkdude
Copy link
Collaborator

borkdude commented Apr 14, 2020

I have tried with the newest clj-yaml (0.7.1).

  • I still see 2.5MB added to the size of the binary. Which may not be the worst thing in the world, but if using yaml from bb is only used by a minority of people it may not be worth the extra size. Any thoughts?

  • One other issue I found when roundtripping a yaml file from the babashka repository:

$ cat .github/workflows/build.yml | head
name: build
on: [push
    , pull_request
    ]
$ ./bb '(println (yaml/generate-string (yaml/parse-string (slurp ".github/workflows/build.yml"))))'
name: build
true: [push, pull_request]

It seems it cannot roundtrip that yaml file: on is translated into true?

@viesti
Copy link
Contributor

viesti commented Apr 14, 2020

Re: roundtrip, might this be a YAML thing, Python's PyYAML behaves the same way:

0% python3
Python 3.7.5 (default, Nov 28 2019, 11:58:10)
[Clang 11.0.0 (clang-1100.0.33.12)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> document = """
... name: build
... on: [push
...     , pull_request
...     ]"""
>>> yaml.load(document)
__main__:1: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
{'name': 'build', True: ['push', 'pull_request']}

@viesti
Copy link
Contributor

viesti commented Apr 14, 2020

😅 https://yaml.org/type/bool.html

@borkdude
Copy link
Collaborator

So how does one write a Github-actions like YAML file using this library?

@slipset
Copy link

slipset commented Apr 14, 2020

Not helping here, but try roundtripping with "Norway" or "NO" :)

@viesti
Copy link
Contributor

viesti commented Apr 14, 2020

:flag-no: So I guess roundtripping the built-in boolean type as string would require dropping into snakeyaml Java interop, in similar style as to pyyaml, with a custom constructor & representer.

Re: ordered vs clj-yaml patched with linked, I was thinking what the implications would be. They have different tagged literals, but this might not be a user-visible thing?

@jcooley
Copy link

jcooley commented Apr 14, 2020

Yaml is messy but it would be nice to have it included if possible. It look like it puts snakeyaml puts on: in single quotes 'on': and that makes it legal. Funny how the first thing that was tried looked broken :)

@borkdude
Copy link
Collaborator

borkdude commented Apr 14, 2020

Here are links to binaries with clj-yaml included:

Linux:
https://6687-201467090-gh.circle-artifacts.com/0/release/babashka-0.0.85-SNAPSHOT-linux-amd64.zip

macOS:
https://6686-201467090-gh.circle-artifacts.com/0/release/babashka-0.0.85-SNAPSHOT-macos-amd64.zip

I propose folks interested in yaml support test these binaries and report back whether this works as expected in their scripts.

@justone
Copy link
Contributor

justone commented Apr 14, 2020

I wrote a script to assist in migrating my blog from octopress to hugo. Had to run it under clojure because bb didn't have yaml at the time. I just re-ran my migration with bb instead of clojure and it worked the first time.

Code I ran: https://github.com/justone/endot-hugo/blob/master/src/cleanup.clj

borkdude added a commit that referenced this issue Apr 15, 2020
@borkdude
Copy link
Collaborator

Thanks y'all. I merged clj-commons/yaml now into master. When something is missing from that library that you would like to see in bb, go contribute over there.

Global board automation moved this from Paused to Done Apr 15, 2020
Clojurists Together automation moved this from To do to Done Apr 15, 2020
@borkdude borkdude removed this from Done in Global board May 6, 2020
@borkdude borkdude removed this from Done in Clojurists Together May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests