Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print multiline strings #95

Merged
merged 2 commits into from
Dec 11, 2020
Merged

Print multiline strings #95

merged 2 commits into from
Dec 11, 2020

Conversation

christopher-dG
Copy link
Contributor

As discussed in #91, there are some cases where multiline strings are beneficial.
With this implementation, strings are printed on multiple lines if:

  • they have newlines which are not leading or trailing
  • they contain double quotes

I'm using YAML primarily for BrokenRecord.jl where we store HTTP response bodies, which are often JSON. So both cases listed above are very common, and they look much much better as a multiline string that doesn't require escaping.

This is very much open for discussion, if people prefer the simplicity of repr then we can, then we can stick with that. But I'd argue that most people use YAML because it's meant to be human-readable :)

@christopher-dG
Copy link
Contributor Author

The one thing I do not like about using multiline strings is what happens when they are the last key in a file.

For example:

a: true
b: |
  hello








This file has a bunch of trailing newlines, many people (like me) have their editors set up to remove trailing whitespace from their files. I think you can mark the end of a YAML document with a --- line, perhaps it's a good idea to always put that at the end?

Copy link
Contributor

@mirkobunse mirkobunse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Looks good to me.

Only thing I'm curious to hear about is why you don't use repr when " occurs in str (see my comment in line 82 of src/writer.jl).

I guess Kevin needs to decide whether multi-line handling should become the one-and-only type of writing yaml, or whether we need a keyword argument multiline to make the user choose between a pure repr-based printing (like before) and your multiline-solution.

@@ -79,7 +79,22 @@ end

# _print a single string
_print(io::IO, str::AbstractString, level::Int=0, ignore_level::Bool=false) =
println(io, repr(MIME("text/plain"), str)) # quote and escape
if occursin('\n', strip(str)) || occursin('"', str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use repr when " occurs in str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following string:

{"a":"b","c":"d","e":"f"}

repr on this is "{\"a\":\"b\",\"c\":\"d\",\"e\":\"f\"}", which is much less readable than multiline version.

indent = repeat(" ", level)
for line in split(str, "\n")
println(io, indent, line)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you print each line separately, thus effectively calling repr for each separate line.

@test YAML.load(YAML.yaml(Dict("a" => "many\nnls\n\n\n")))["a"] == "many\nnls\n\n\n"
@test YAML.load(YAML.yaml(Dict("a" => "no\ntrailing\nnls")))["a"] == "no\ntrailing\nnls"
@test YAML.load(YAML.yaml(Dict("a" => "foo\n\"bar\\'")))["a"] == "foo\n\"bar\\'"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good you test that the strings are identical. My first tought about src/writer.jl was, why are newlines handled differently depending on their number? Now I see that this different handling is only to ensure that input and output match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the format is very finnicky 😅

@mirkobunse
Copy link
Contributor

mirkobunse commented Dec 8, 2020

The one thing I do not like about using multiline strings is what happens when they are the last key in a file.

Actually, I do like what happens in this case, as long as the roundtrip julia/yaml/julia is producing identical results :)

EDIT: Just to be clear, do the newlines actually come from b or are they an artifact of the writer? If the latter is the case, we should remove them, but not add a boilerplate --- at the end.

@christopher-dG
Copy link
Contributor Author

The trailing newlines come from b. The only problem I see with it is it's easy to accidentally strip them if your editor is configured to do so.

@DilumAluthge
Copy link

The trailing newlines come from b. The only problem I see with it is it's easy to accidentally strip them if your editor is configured to do so.

Yeah e.g. my editor (Atom) will remove the trailing newlines.

Any reason why we shouldn't just put the --- at the end?

@christopher-dG
Copy link
Contributor Author

I support always adding the ---. I think it'll in those edge cases both to prevent accidentally editing of the file, and to make it more clear to readers that such a string has trailing newlines.

@@ -1,6 +1,6 @@
name = "YAML"
uuid = "ddb6d928-2868-570f-bddf-ab3f9cf99eb6"
version = "0.4.3"
version = "0.4.4"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have version bumps occur in PRs? In my own repos, I tend to make separate commits where the only thing that happens is a version bump.

Doesn't make a huge difference to me, except that, in a case where multiple PRs are happening at the same time, there may be conflicts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually do it. It doesn't conflict if other PRs bump to the same version. But I can totally remove it if you want!

@kescobo
Copy link
Collaborator

kescobo commented Dec 9, 2020

I guess Kevin needs to decide whether multi-line handling should become the one-and-only type of writing yaml or whether we need a keyword argument multiline to make the user choose between a pure repr-based printing (like before) and your multiline-solution.

Oh no! Why is this on me?!? 🤣

I think having a multiline keyword would be beneficial, the question is what the best default would be - true seems the most sensible. I also wonder if having the trailing --- could be an option as well. Ideally, we'd have a situation where any (valid) input could be round-tripped and produce identical output, though of course this isn't always feasible given the flexibility of the format.

Making these options doesn't have to be part of this PR in my opinion, I'm fine with multi-line and trailing --- being the defaults.

Though I should say in all seriousness, it probably doesn't make sense for me to be the one making these decisions, since I rarely use this package - my stewardship is a somewhat strange accident. In the absence of any objection, I'm happy to let y'all have final say.

@kescobo
Copy link
Collaborator

kescobo commented Dec 10, 2020

The error in CI is

test = spec-02-13: Error During Test at /home/travis/build/JuliaData/YAML.jl/test/runtests.jl:139

  Got exception outside of a @test

  ArgumentError: `nothing` should not be printed; use `show`, `repr`, or custom output instead.

  Stacktrace:

   [1] print(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Nothing) at ./show.jl:566

   [2] print(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::String, ::String, ::Vararg{Any,N} where N) at ./strings/io.jl:42

   [3] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::YAML.ParserError) at /home/travis/build/JuliaData/YAML.jl/src/parser.jl:25
# ...

@christopher-dG
Copy link
Contributor Author

Looks like a valid failure, I didn't even realize that you could have strings at the top level...

@christopher-dG
Copy link
Contributor Author

Also, I read the spec and we should actually be using ... to end a document if we decide to go that route.

@codecov
Copy link

codecov bot commented Dec 10, 2020

Codecov Report

Merging #95 (ad4052b) into master (6813b9f) will increase coverage by 0.46%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #95      +/-   ##
==========================================
+ Coverage   80.85%   81.31%   +0.46%     
==========================================
  Files          12       12              
  Lines        1452     1461       +9     
==========================================
+ Hits         1174     1188      +14     
+ Misses        278      273       -5     
Impacted Files Coverage Δ
src/writer.jl 98.03% <100.00%> (+0.42%) ⬆️
src/scanner.jl 82.97% <0.00%> (+0.71%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6813b9f...ad4052b. Read the comment docs.

@kescobo
Copy link
Collaborator

kescobo commented Dec 11, 2020

Good to know! But this PR is ready to merge? Maybe worth opening issues for optional single-line printing and including ... at the end so we can keep track of it?

@christopher-dG
Copy link
Contributor Author

Yeah, this is good to go.

@kescobo kescobo merged commit 9748e1b into JuliaData:master Dec 11, 2020
@christopher-dG christopher-dG deleted the cdg/strings branch December 11, 2020 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants