Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TOML library #7104

Open
5 of 19 tasks
Spartee opened this issue Aug 24, 2017 · 18 comments
Open
5 of 19 tasks

Improve TOML library #7104

Spartee opened this issue Aug 24, 2017 · 18 comments

Comments

@Spartee
Copy link
Contributor

Spartee commented Aug 24, 2017

To compare against TOML spec -> TOML Spec

TOML Spec to add support for

Test Suites

  • test/library/packages/TOML/BurntSushi/
  • test/library/packages/TOML/toml-spec/

Known bugs

User errors currently not caught

  • Table name repeated (overwrites table)
  • key name repeated in same table (overwrites value)
  • Arrays of mixed data types e.g. [2, "string", 4.5]

Testing

  • More futures with user errors like key or table name repeated
  • Tests for handled errors (once error handling is put in)

Other

  • plug some of the remaining memory leaks
  • error handling instead of halt()
@Spartee Spartee mentioned this issue Aug 24, 2017
69 tasks
ben-albrecht added a commit that referenced this issue Nov 28, 2017
Split up TOML tests based on pass/fail

Added a `Passing.chpl` test for the passing tests of the BurntSushi suite. (#7104)
This will remove any passing futures from testing.

As tests pass, they should be migrated over to `Passing.execopts`

Also removed a unintended copy of the future (`TestToml.future`)


**Testing**
- [x] `start_test test/library/packages/TOML/BurntSushi`

[eagerly reviewed by @lydia-duncan]
ben-albrecht added a commit that referenced this issue Apr 22, 2019
[TOML Parser] Updated fieldTag and added date support

Renamed `fieldDate` to `fieldDateTime` and added date support in TOML Parser
solved #12801 and added ``date`` support as stated in #7104 

[Contributed by @sanket143]
[Reviewed by @ben-albrecht]
@sanket143
Copy link
Contributor

Added date support in #12802 ⬆️

ben-albrecht added a commit that referenced this issue Jun 10, 2019
TOML Parser: Added 'time' support

Added ``time`` support in TOML Parser as stated in #7104 

[Contributed by @sanket143]
[Reviewed by @ben-albrecht]
@sanket143
Copy link
Contributor

Known Bug: #12843 (Inline comment treated as if it's in written in a different line)

@ankushbhardwxj
Copy link
Member

ankushbhardwxj commented Mar 27, 2020

@ben-albrecht

Quoted keys with periods such as "this.key" = value

Is this related to #13332 ?

Comment on the last line of the file (easy)

Could you elaborate this issue ?

@ben-albrecht
Copy link
Member

Quoted keys with periods such as "this.key" = value

Is this related to #13332 ?

No, it is supporting quoted keys, such as described in the TOML spec.

Comment on the last line of the file (easy)

Could you elaborate this issue ?

I'm actually not sure about this one. Does having a comment on the last line of the file currently produce an error? @Spartee may have a better idea.

@Spartee
Copy link
Contributor Author

Spartee commented Apr 10, 2020

So there was a issue(which could be resolved now I haven't tested it recently) where having a comment on the last line of the TOML file would produce a rather unhelpful TomlError.

@ankushbhardwxj
Copy link
Member

ankushbhardwxj commented Apr 22, 2020

@ben-albrecht I have started to work on Datetime with offset. Should the regex validate for leap years too ?
Also there's a case mentioned in TOML Spec for replacing T in 1979-05-27T00:32:00-07:00 with a space. Should I consider that case?
If yes, our regex would fail there ! Additionally, the TOML module would also consider 1979-05-27 and 00:32:00-07:00 as different tokens if a space exists.

This is the regex I plan to use for this feature:

\d{4}-[01]{1}\d{1}-[0-3]{1}\d{1}T[0-2]{1}\d{1}:[0-6]{1}\d{1}:[0-6]{1}\d{1}[+|-][0-1][0-9]:[0-5][0-9]$

@ben-albrecht
Copy link
Member

ben-albrecht commented Apr 22, 2020

Should the regex validate for leap years too ?

I am not sure if this needs to be validated within the regex itself, but it should be validated elsewhere if not.

Also there's a case mentioned in TOML Spec for replacing T in 1979-05-27T00:32:00-07:00 with a space. Should I consider that case?

Ideally, yes.

Additionally, the TOML module would also consider 1979-05-27 and 00:32:00-07:00 as different tokens if a space exists.

Would we need to update our tokenizer then?

This is the regex I plan to use for this feature:

Thanks. I have found a good way to share regexes is to link to a online regex evaluator, such as https://regex101.com, with your regex expression and test cases plugged in. That makes it easier for a reviewer to catch an edge case you may have missed in your regular expression.

Feel free to split off a separate issue for this, if you think it will warrant more discussion.

@ankushbhardwxj
Copy link
Member

Would we need to update our tokenizer then?

@ben-albrecht Modifying the tokenizer wouldn't be wise because it might break already implemented features. I'll try to handle it with the regex.

Thanks. I have found a good way to share regexes is to link to a online regex evaluator,

This is great ! Thanks.

@ankushbhardwxj
Copy link
Member

@ben-albrecht @Spartee Would like to report an issue that showed up in the development of #16141 (comment).
There should be some differentiation between . found in table name which are used to form subtables. Currently, . encountered within "" are also considered to create subtables.
eg:

[LocalAtomics."0.1.0"]
score = "4"

creates a roottable

[LocalAtomics]
[LocalAtomics."0]
[LocalAtomics."0.1]
[LocalAtomics."0.1.0"]
score = "4"

instead of

[LocalAtomics]
[LocalAtomics."0.1.0"]
score = "4"

@ben-albrecht
Copy link
Member

Thanks @ankingcodes - Do you understand the source of this issue? Are we blindly splitting on . somewhere in the TOML module?

@ankushbhardwxj
Copy link
Member

@ben-albrecht Yes, that's exactly the issue. We need to find a way to ignore . inside of quotes. I'm trying to fix it but I wish the documentation of the module was better.

@ben-albrecht
Copy link
Member

Yes, that's exactly the issue. We need to find a way to ignore . inside of quotes

OK, great. Let us know if you run into issues with this.

While on the topic, I am wondering - does the regex handle escaped quotes? I haven't checked the TOML spec to know what is expected there myself...

I wish the documentation of the module was better.

Be the change you want to see in the world TOML module :)

@redhatturtle
Copy link

Is the status of this library in the OP up to date? I was hoping to use arrays of tables on a project but might have to use a clunky workaround if it's not available yet.

@bradcray
Copy link
Member

@redhatturtle : I saw this question yesterday, but didn't know the answer so waited to see whether someone else would chime in. Browsing the sources of the TOML module and its commit history, I'm not seeing any indication that arrays of tables have been supported. Tagging @ben-albrecht here to see if he can confirm.

My impression is that we've essentially implemented the minimum in TOML necessary to support our mason package manager, which is why this has languished. You could certainly advocate for making this more of a priority if it would be a big win for you. I don't know TOML or the structure of our TOML module to know whether it would be a heavy lift or not, nor what the challenges were / whether they've become easier since this issue was filed. Tagging @Spartee to see whether he can characterize what challenges he was alluding to in the OP.

@ben-albrecht
Copy link
Member

I can confirm arrays of tables are not yet supported. I believe it is a bigger effort because it will require some higher level changes to the TOML parser in order to support it, based on the last time I went over it with @Spartee -- he may be able to shed more light on the specifics.

@Spartee
Copy link
Contributor Author

Spartee commented May 13, 2021

I believe the problem with the TOML parser for array of tables was that the regular expressions we used for tokenization did not cover array of tables. They were not added when we did this because (I believe) we could not find regex strategy that would include tokens for array of tables without effecting the other tokens. Someone who is better with regex than I am could most likely figure this out.

This is the section that holds the regex the TOML module uses for parsing.

  pragma "no doc"
  class Parser {

    var source: Source;
    var rootTable: unmanaged Toml;
    var curTable: string;

    const doubleQuotes = '".*?"',
      singleQuotes = "'.*?'",
      bracket = '\\[|\\]',
      digit = "\\d+",
      keys = "^\\w+";
    const Str = compile(doubleQuotes + '|' + singleQuotes),
      kv = compile('|'.join(doubleQuotes, singleQuotes, digit, keys)),
      dt = compile('^\\d{4}-\\d{2}-\\d{2}[ T]\\d{2}:\\d{2}:\\d{2}$'),
      realNum = compile("\\+\\d*\\.\\d+|\\-\\d*\\.\\d+|\\d*\\.\\d+"),
      ld = compile('^\\d{4}-\\d{2}-\\d{2}$'),
      ti = compile('^\\d{2}:\\d{2}:\\d{2}(.\\d{6,})?$'),
      ints = compile("(\\d+|\\+\\d+|\\-\\d+)"),
      inBrackets = compile("(\\[.*?\\])"),
      corner = compile("(\\[.+\\])"),
      brackets = compile('\\[|\\]'),
      whitespace = compile("\\s"),
      comment = compile("(\\#)"),
      comma = compile("(\\,)");

@ben-albrecht may remember a conversation we had with @ankingcodes about this. I can't find where I outlined the suggestions.

As I said, someone who is more knowledgable about regex than I am may be able to figure this out pretty easily. The harder suggestion I was alluding to was to refactor the parseLoop itself to support arrays of tables. I don't think either is unreasonable.

@bradcray
Copy link
Member

Thanks Sam and Ben. Do you recall: Was there any sort of challenge relating to Chapel's arrays needing to have a consistent element type (as a statically typed language) and TOML arrays of tables potentially having a mix of logical types in different slots of the array? Or is that not an issue through other abstractions that are already present in the TOML module implementation that paper over those type differences?

@ben-albrecht
Copy link
Member

Do you recall: Was there any sort of challenge relating to Chapel's arrays needing to have a consistent element type (as a statically typed language) and TOML arrays of tables potentially having a mix of logical types in different slots of the array? Or is that not an issue through other abstractions that are already present in the TOML module implementation that paper over those type differences?

It is the latter (not an issue). The Toml type stores any possible Toml value such that an array of tables can just be represented as an array of Toml type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants