New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of special characters (line breaks etc..) #26

Closed
vh-d opened this Issue Oct 23, 2018 · 12 comments

Comments

Projects
None yet
2 participants
@vh-d
Contributor

vh-d commented Oct 23, 2018

I am trying to understand why parsing of TOML files with multi-line strings such as this

value = '''
Hellow
world!
'''

yields escaped special characters:

List of 1
 $ value: chr "Hellow\\nworld!\\n"

Can't we get this?

List of 1
 $ value: chr "Hellow\nworld!\n"

Am I missing something?
Thanks

@vh-d vh-d changed the title from Handling of special characters (end of line) to Handling of special characters (line breaks etc..) Oct 23, 2018

@eddelbuettel

This comment has been minimized.

Owner

eddelbuettel commented Oct 23, 2018

What does the TOML spec say?

@eddelbuettel

This comment has been minimized.

Owner

eddelbuettel commented Oct 23, 2018

You may have a point. When I use the upstream code as a standalone command-line parser, I get:

edd@rob:~/git/cpptoml/examples(master)$ ./parse_stdin_2018-10-22 < /tmp/multiline.toml
{"value":{"type":"string","value":"Hellow\nworld!\n"}}
edd@rob:~/git/cpptoml/examples(master)$

I'll look into it. I already added another missing element on the commute in this morning: plain 'time' input (ie HH:MM:SS) for R has no native support will now be returned (rather than dropped) as a string.

@vh-d

This comment has been minimized.

Contributor

vh-d commented Oct 23, 2018

To me, it looks like escaping special characters was supposed only for printing verbose output (for debugging) in printValue() but it somehow made its way to getValue() as well.

This easy change solves the issue for me. But it is a breaking change of course.

@eddelbuettel

This comment has been minimized.

Owner

eddelbuettel commented Oct 23, 2018

Yes, it is likely something like that. I have to think if there are other cases of escaped chars we would need to protect. And that protect might be better done at the R accessor ...

Can you take a peek at the TOML spec if it says something about special chars and escapes?

@eddelbuettel

This comment has been minimized.

Owner

eddelbuettel commented Oct 23, 2018

It seems it is all over cpptoml as well from where I probably took it.

Could you possibly deal with it at the R level?

> RcppTOML::parseTOML("/tmp/twolines.toml")
List of 1
 $ value: chr "hello\\nworld!\\n"
R>
R> txt <- RcppTOML::parseTOML("/tmp/twolines.toml")
R>
R> cat( gsub('\\\\n', '\n', txt[[1]]) )
hello
world!
R>
@vh-d

This comment has been minimized.

Contributor

vh-d commented Oct 23, 2018

Sure that's what I was doing until I stopped wondering why would anyone prefer the current way :-)

I was worried that I would have to handle various cases besides "\n" -> "\n". And when I failed with generic gsub('\\\\', '\\', value) which obviously cannot work I came here :-)

But it seems there are only 3 cases ('\n', '"' and '\') so its relatively small issue.

@vh-d

This comment has been minimized.

Contributor

vh-d commented Oct 23, 2018

Still it may be worth it to have some RcppToml::parseTOML2() without the escaping.

@vh-d

This comment has been minimized.

Contributor

vh-d commented Oct 23, 2018

Doing it on the R level basically means climbing the whole list tree and applying gsub() on any character value. Maybe I will come back if I come up with some elegant solution.

@eddelbuettel

This comment has been minimized.

Owner

eddelbuettel commented Oct 23, 2018

I think a better solution may be to pass an option down to the parse function which, if set, will skip the escaping.

@vh-d

This comment has been minimized.

Contributor

vh-d commented Oct 23, 2018

exactly... and it would be set to FALSE by default for backward compatibility

@eddelbuettel

This comment has been minimized.

Owner

eddelbuettel commented Oct 23, 2018

Yup. I see you forked already. Care to try a small and focused pull request? ;-)

@eddelbuettel

This comment has been minimized.

Owner

eddelbuettel commented Oct 26, 2018

Fixed in #28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment