Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of special characters (line breaks etc..) #26

Closed
vh-d opened this issue Oct 23, 2018 · 28 comments
Closed

Handling of special characters (line breaks etc..) #26

vh-d opened this issue Oct 23, 2018 · 28 comments

Comments

@vh-d
Copy link
Contributor

@vh-d vh-d commented Oct 23, 2018

I am trying to understand why parsing of TOML files with multi-line strings such as this

value = '''
Hellow
world!
'''

yields escaped special characters:

List of 1
 $ value: chr "Hellow\\nworld!\\n"

Can't we get this?

List of 1
 $ value: chr "Hellow\nworld!\n"

Am I missing something?
Thanks

@vh-d vh-d changed the title Handling of special characters (end of line) Handling of special characters (line breaks etc..) Oct 23, 2018
@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Oct 23, 2018

What does the TOML spec say?

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Oct 23, 2018

You may have a point. When I use the upstream code as a standalone command-line parser, I get:

edd@rob:~/git/cpptoml/examples(master)$ ./parse_stdin_2018-10-22 < /tmp/multiline.toml
{"value":{"type":"string","value":"Hellow\nworld!\n"}}
edd@rob:~/git/cpptoml/examples(master)$

I'll look into it. I already added another missing element on the commute in this morning: plain 'time' input (ie HH:MM:SS) for R has no native support will now be returned (rather than dropped) as a string.

@vh-d
Copy link
Contributor Author

@vh-d vh-d commented Oct 23, 2018

To me, it looks like escaping special characters was supposed only for printing verbose output (for debugging) in printValue() but it somehow made its way to getValue() as well.

This easy change solves the issue for me. But it is a breaking change of course.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Oct 23, 2018

Yes, it is likely something like that. I have to think if there are other cases of escaped chars we would need to protect. And that protect might be better done at the R accessor ...

Can you take a peek at the TOML spec if it says something about special chars and escapes?

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Oct 23, 2018

It seems it is all over cpptoml as well from where I probably took it.

Could you possibly deal with it at the R level?

> RcppTOML::parseTOML("/tmp/twolines.toml")
List of 1
 $ value: chr "hello\\nworld!\\n"
R>
R> txt <- RcppTOML::parseTOML("/tmp/twolines.toml")
R>
R> cat( gsub('\\\\n', '\n', txt[[1]]) )
hello
world!
R>
@vh-d
Copy link
Contributor Author

@vh-d vh-d commented Oct 23, 2018

Sure that's what I was doing until I stopped wondering why would anyone prefer the current way :-)

I was worried that I would have to handle various cases besides "\n" -> "\n". And when I failed with generic gsub('\\\\', '\\', value) which obviously cannot work I came here :-)

But it seems there are only 3 cases ('\n', '"' and '\') so its relatively small issue.

@vh-d
Copy link
Contributor Author

@vh-d vh-d commented Oct 23, 2018

Still it may be worth it to have some RcppToml::parseTOML2() without the escaping.

@vh-d
Copy link
Contributor Author

@vh-d vh-d commented Oct 23, 2018

Doing it on the R level basically means climbing the whole list tree and applying gsub() on any character value. Maybe I will come back if I come up with some elegant solution.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Oct 23, 2018

I think a better solution may be to pass an option down to the parse function which, if set, will skip the escaping.

@vh-d
Copy link
Contributor Author

@vh-d vh-d commented Oct 23, 2018

exactly... and it would be set to FALSE by default for backward compatibility

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Oct 23, 2018

Yup. I see you forked already. Care to try a small and focused pull request? ;-)

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Oct 26, 2018

Fixed in #28

@tkschmidt
Copy link

@tkschmidt tkschmidt commented Mar 11, 2019

I think the issue still exists. I'm using version 0.1.5 and multiline strings will still break it.

e.g.

query = """  SELECT * FROM 
TABLE
WHERE 
property IS NOT NULL;

after loading

toml <- parseTOML(file.path(PATH,"config.toml"))
toml$query: SELECT *FROM \\n (...)

This will break the SQL parser and my dirty solution is dbGetQuery(con, gsub("\\\\n", "", toml$query))

Am I holding it wrongly?

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Mar 11, 2019

I am not sure I read the TOML spec and examples the same way you do -- in short, I do not think that looks like valid TOML. Can you point to a TOML example or different TOML parser doing this?

@vh-d
Copy link
Contributor Author

@vh-d vh-d commented Mar 11, 2019

You need to explicitly set the parameter escape=FALSE. So in your example

toml <- parseTOML(file.path(PATH,"config.toml"), escape=FALSE)

The default behaviour escape=TRUE was kept due to backward compatibility.

@tkschmidt
Copy link

@tkschmidt tkschmidt commented Mar 11, 2019

I just checked again and I forget to add an important line in the top example

[PT]
query = """  SELECT * FROM 
TABLE
WHERE 
property IS NOT NULL;
"""

This results in

> toml <- parseTOML(file.path("/tmp","config.toml"), escape=F)
> toml
List of 1
 $ PT:List of 1
  ..$ query: chr "  SELECT * FROM TABLE\\nWHERE \\nproperty IS NOT NULL;\\n"

If I now delete the section header

query = """  SELECT * FROM 
TABLE
WHERE 
property IS NOT NULL;
"""

it results in


> toml <- parseTOML(file.path("/tmp","config_no_section.toml"), escape=F)
> toml
List of 1
 $ query: chr "  SELECT * FROM TABLE\nWHERE \nproperty IS NOT NULL;\n"

The toml parser of python eats both versions, but I've never checked if it is correct behaviour.

@vh-d
Copy link
Contributor Author

@vh-d vh-d commented Mar 11, 2019

OK, that's strange. Are you up to date with master? I believe CRAN version may be outdated.

@tkschmidt
Copy link

@tkschmidt tkschmidt commented Mar 11, 2019

I updated to master and now it's fine.
I didn't check in which state CRAN was because the merge event was 4 days older than the CRAN version.
I assumed that CRAN was fine ^^

Could we get git tags that resemble the version in the description file?

Thanks for all your fast replies

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Mar 11, 2019

CRAN has versions.

Github has a commit stream. Master may at times be ahead of CRAN.

You have a money-back guarantee, but not much more. Remember we're volunteers here.

@tkschmidt
Copy link

@tkschmidt tkschmidt commented Mar 11, 2019

Sry, wasn't meant to be that harsh ^^
It's weird to be at the other end of the issue/request stream ;)

@salim-b
Copy link

@salim-b salim-b commented Jun 18, 2019

CRAN has versions.

Github has a commit stream. Master may at times be ahead of CRAN.

You have a money-back guarantee, but not much more. Remember we're volunteers here.

I don't wanna overstrain your volunteering here, but I think it would be really cool to update the CRAN version because it doesn't include @vh-d 's second PR #32 which fixes the escape switch in case of TOML tables and arrays.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 18, 2019

Yes, looking at my git there where #32 and the lesser #35 (doc only). Looks like cpptoml has not changed so it should be quick. Just sent something else to CRAN so the timing may be right to do a quick release here too.

@vh-d
Copy link
Contributor Author

@vh-d vh-d commented Jun 18, 2019

I would have a small PR fixing unicode chars in arrays. I will try to send it ASAP so it would make it to this new CRAN version

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 22, 2019

Thanks everybody for the help, particularly @vh-d for the different PRs, @heavywatal for catching an issue with a stale statement in the README.md and @salim-b for the reminder.

I put some work into switching to the (awesome, trust me) and still new-ish tinytest package to have proper tests, also of the installed package (!!) which is really important (and under-appreciated) feature (that testthat lacks in the default settings).

So now with the version in the master branch:

image

As per f22c0aa I renamed it 0.1.5.1 as a release candidate. If you have a moment, give it a test and let me know (here) it is ok. It should be fine. I also added something to ChangeLog and NEWS to reflect the PRs. Thanks again for those.

@vh-d
Copy link
Contributor Author

@vh-d vh-d commented Jun 24, 2019

Nice. All tests pass on my Linux machine. There is one issue with encoding on Windows (which you have already spotted as well as I see in later commits). See my PR for a possible solution.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 24, 2019

Yes -- the machines at rhub and win-builder are good for such tests.

And thanks for the PR which I just merged.

@salim-b
Copy link

@salim-b salim-b commented Jun 25, 2019

Awesome! All tests performed by tinytest::test_package("RcppTOML") run fine here, too (Ubuntu 16.04; R 3.6.0).

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 25, 2019

Yes it should as I develop on Ubuntu (albeit "current", now 19.04) and CRAN etc test on it too. Thanks for your patience, last few months were busy but a release was clearly a good idea.

And it does of course make my day that you already groked that you too can now run tests locally! Paging @markvanderloo who is the magician behind the curtain for that one...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
You can’t perform that action at this time.