Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Go implementation #25

Closed
torresjrjr opened this issue Apr 30, 2021 · 25 comments
Closed

New Go implementation #25

torresjrjr opened this issue Apr 30, 2021 · 25 comments

Comments

@torresjrjr
Copy link

I have created an implementation of NestedText in Go.

https://git.sr.ht/~torresjrjr/go-nestedtext

It's in early development, and provides only an executable which converts NestedText to JSON for now. I wish to create a library too.

Comments and critiques welcome.

@torresjrjr torresjrjr changed the title Go implementation - dicussion New Go implementation Apr 30, 2021
@KenKundert
Copy link
Owner

Please let us know when it reaches maturity.

@torresjrjr
Copy link
Author

I have recently seen the issue on the new release 2.0. It's much too late to state my opinion now, but I disgree with the direction of NestedText. It's primary value was it's extreme simplicity, and I see two of the three stated changes as massive regressions, namely:

  • Multilined keys.
  • Single list lines and dictionaries.

The other change I do welcome as a much needed improvement:

  • Remove quoted keys.

As such, I feel at an impass. I am not willing enough to implement the new official release of "NestedText", but rather a fork of it, including only the quoteless keys feature.

I mention this to inform you and others in the interest of the spreading of NestedText or it's derivatives, such as the derivative I described. Perhaps it might have it's own name to distinguish itself. I don't know.

Anyway, thank you for creating this format and for you new release. I'll be sure to comment when go-nestedtext has developments.

@LewisGaul
Copy link

Hi @torresjrjr, just thought I'd say that my instinct was the same with respect to these new features deviating from the simplicity of NestedText. However, having implemented a Zig parser, I consider this to be a simplification in some sense in that it is now possible to represent any JSON datastructure in NestedText format, whereas before empty lists/objects could not be represented and neither could certain object keys.

I don't feel great about the syntax of multiline keys, but on balance I think allowing them in some form is better than not.

There are other practicality arguments for allowing 'flow-style' lists and objects mentioned on some of the other recent issues, and on the whole I think they're a positive addition that don't break the fundamentals of NestedText (such as each line having a clear type, independent of context on surrounding lines).

Anyway, that's just my perspective. There are certainly some elements of the recent changes I'm not 100% happy with (primarily that [,] means a list containing one empty string), but I'm thinking I might tackle this by defining a slightly more restrictive parse_strict() function alongside the main parse() function supporting the full spec. I can relate to your sentiments, but there's definitely value in working together in a unified direction on a single spec. Of course, in the end we're all free to do whatever we choose! :)

Good luck with your implementation.

@KenKundert
Copy link
Owner

There seems to a tension in NestedText. Some people are drawn to it by the simplicity of the format and its implementation. Others are drawn to it because it can hold any text without the need for quoting or escaping. For me, I was looking for a format that could hold code snippets in a form that was simple for the end user, so the simplicity of format and the lack of quoting and escaping are what drove me. I was never concerned about the simplicity of the implementation. Hence, NestedText 2.0. It makes the language complete, in that it can hold any hierarchical collection of dictionaries, lists, and strings using a simple and consistent set of rules (with the proviso that dictionary keys must be strings) at the expense of a more complicated format and implementation. I was okay making the format more complicated because most users would never need or see the new features. Only the power users would ever use, or even know about, the multiline keys and inline lists and dictionaries.

Having said that, I still understand the appeal of NestedText in its simplest form. It is tempting to define a subset that would be very easy to implement and would be sufficient for almost all applications. It seems to me that subset would be NestedText 2.0 with the multiline keys and inline dictionaries and lists removed. Then the question becomes: would this bifurcation of NestedText be good or bad for NestedText adoption? I kind of think it would be good for adoption as long as it remained a pure subset. And it would certainly be better than having people creating variants on a ad hoc basis.

@KenKundert KenKundert reopened this May 29, 2021
@KenKundert
Copy link
Owner

We have decided not to explicitly declare a NestedText subset at this time. If you do choose to implement a subset, please do not allow keys to start with [ or {, which could cause upward compatibility issues for those wishing to use both your subset and the full version of NestedText.

@torresjrjr
Copy link
Author

OK, understood.

The current go-nestedtext repo is set at version 1 of NestedText, so there should be no foreseable changes.

I'm not extremely invested in the immediate development of go-nestedtext, nor do I wish to "hijack" the spec, so please don't hesitate to let me know what you'd like to see from go-nestedtext if anything.

@LewisGaul
Copy link

It is tempting to define a subset that would be very easy to implement and would be sufficient for almost all applications. It seems to me that subset would be NestedText 2.0 with the multiline keys and inline dictionaries and lists removed.

This is an interesting idea, but I'm not sure how well it would tackle the problem. I personally think flow-style containers improve the format quite significantly for reading/writing (although do make the implementation noticeably more complex), and it's other parts of the spec I'm uneasy about. I think a simplified version of the spec would have to go further, also removing multiline keys. The problem is that this removes the ability to specify any valid JSON - certain characters not allowed in object keys, and empty lists/objects not expressible.

I find multiline object keys the strangest part of the format, but I have come around to realising their value, and while I think the use of colons can be confusing I can see some merit in using a familiar character for it.

The one thing in the spec I'm very reluctant to implement (and have no plans to - this is the only deviation from the 2.0 spec in zig-nestedtext) is accepting empty keys/values in flow-style containers. My main objection to this is the strange workaround of making [,] represent a single empty value (as mentioned above and in previous comments), but flow-style containing empty values is just generally not very clear to read and unnecessary in my opinion (given flow-style is just a shorthand). My biggest concern with this choice is that if people consider it a wart in the format (which it certainly is from my perspective) then removing it later would be much more painful than adding it later would have been.

@kalekundert
Copy link
Contributor

I probably should've weighed in on this in the previous thread, but for what it's worth I think that forbidding empty values in flow-style containers is a mistake. Perhaps this is subjective, but my feeling is that users would be more surprised to find that empty values can't be specified than to find that empty values following trailing commas don't count:

  • The syntax for empty values in the middle of flow-style containers is so intuitive that forbidding it would break expectations in a significant way.
  • In contrast, the rules for trailing commas have so much precedent in other programming/data languages that they break expectations in a much less significant way (even if the rules do make more sense in the context of multiline containers).
  • Empty values are short, which makes them good candidates for flow-style containers.

My perspective on this is also influenced by the fact that I use nestedtext extensively for writing unit tests, where empty values are very common. So I don't see empty values as an edge-case that can be excluded in the name of avoiding a minor clarity issue.

I do see where you're coming from with the arguments that (i) flow-style containers are a shorthand syntax and (ii) it's easier to add features than to remove them. But I think the empty-value syntax is well-justified.

@LewisGaul
Copy link

I can't imagine why users would be surprised about empty values not being allowed - I don't know of any other languages/formats that allow empty values inside bracketed structures without the use of quotes (e.g. this appears to be disallowed in yaml).

I simply disagree that the syntax is intuitive, it's a shame that we don't see eye to eye on this. Maybe I have more of a mind to users who aren't super familiar with programming. E.g. I know of a user who was confused by the need for a space after the '-' for a list in yaml, hence my previously-voiced concern about allowing a leading hyphen in object keys.

The precedent for trailing commas comes exclusively from cases where empty values are not allowed (without quotes), as far as I can tell, which is the case I'm saying is confusing.

Interesting to hear that empty values are common for you, could you link to an example NestedText file? I'd be interested to compare the flow-style to non-flow-style equivalent. With enough persuasive examples I could start to be more convinced. Without, I have no intention of adding this to the Zig implementation, to discourage people from using it.

@KenKundert
Copy link
Owner

I can't imagine why users would be surprised about empty values not being allowed - I don't know of any other languages/formats that allow empty values inside bracketed structures without the use of quotes (e.g. this appears to be disallowed in yaml).

Well, it is not a programming language, but Verilog is the primary language used for representing electronic hardware designs, and it allows empty values in bracketed structures. Specifically, in argument lists you can pass values by name or by order. If you are passing by order you may simply skip a value to indicate that it should take its default value. For example: func(5, , 9). In this case the second argument takes its default value.

The precedent for trailing commas comes exclusively from cases where empty values are not allowed (without quotes), as far as I can tell, which is the case I'm saying is confusing.

Other than NestedText, I can think of no other language that does not support quoting or escaping, and this is a primary feature of NestedText. Certainly you are not suggesting that quoting be added back in? And popular languages such as Python ignore trailing commas. Once you accept the idea that trailing commas are ignored and that quoting is not available, you end up with the current behavior of NestedText. As for whether it is intuitive, that is personal judgement. The current approach is simple and self consistent and largely familiar. I would also argue that it is obvious once you remember that trailing commas are ignored. Consider a few cases:
[]
I think most people would expect this to represent an empty list. Both JSON and Python take it as such.
[a]
I think most people would expect this to be a list that contains a single value a.
[a,]
I think most people would also expect this to be a list that contains a single value a.
[,]
I think most people would expect this to be a list that contains empty values. The question is, does it contain one empty value or two. Once you remind them that [a,] was one value, then they would agree that [,] is a list that contains one empty value.

So this whole debate comes down to whether we ignore trailing commas. As you pointed out earlier, the case for trailing commas is not strong in NestedText as our commas only act as separators on lists and dictionaries that are contained on a single line. However, ignoring trailing commas allows us to distinguish between empty lists and lists that contain an empty value. If you use NestedText to hold code snippets, something that NestedText is particularly good at, those two case are very important. You need to be able to support both, and you need to distinguish them. But say we simply disallow trailing commas. Then consider three cases:

JSON        NestedText Inline Style
[]          []
[""]        not allowed
["",""]     [,]

Now, again, this time allowing trailing commas:

JSON        NestedText Inline Style
[]          []
[""]        [,]
["",""]     [,,]

To me, the second case is simply better. The first has a big hole in it. The only other alternative I can think of is to disallow empty values all together in inline lists, but that seems like too big a restriction.

I am somewhat surprised that this particular aspect of NestedText is causing consternation. The vast majority of users will never encounter inline dictionaries and lists, and those that do will be the more sophisticated users who will not be troubled by this aspect of NestedText. Even among those, few will encounter lists of empty values. But those that do will likely really appreciate the fact that they are available.

I am much more concerned with fragmentation of the spec. NestedText is still very young, and its adoption will be harmed if we bicker over the definition and put out versions that are mutually incompatible. I am okay with minimal implementations of NestedText, like the Go implementation by Byron, because they are substantially simpler and are sufficient to handle the primary use case for NestedText, that of being a simple and clean configuration or data file format for casual users. I only ask that such implementations make it clear that they are subsets, that they implement the base set of features, and that they remain upward compatible. But I think it is damaging to put out implementations that add, remove, or modify language features based on personal preference.

If you do not like this aspect of the language, I encourage you to simply implement your dump function so that it does not generate it while implementing your load function to support it so your version is compatible with all legal NestedTest files.

@kalekundert
Copy link
Contributor

Here's one of my tests that uses a lot of empty values: test_fields.nt. This is for some code that's meant to parse strings like a b c=d into data structures like ['a', 'b'], {'c': 'd'}. Importantly, quotes can be used to encode empty values/values with otherwise illegal characters like space, equals, etc.

This file doesn't quite have any good examples of where empty values would be used with flow style. The early tests (test_word, test_word_err) use empty values, but are too flat to benefit from flow style. The later tests (test_fields, test_fields_list) are nested enough that they probably would benefit from flow-style, but by then I'm no longer testing empty values. But I think it's fair to consider some "hybrid" test cases that could've reasonably appeared in this file:

test_fields:
  -
    given: ""
    expected:
      by_index:
        - 
      by_name:
  -
    given: ""=""
    expected:
      by_index:
      by_name:
        :
  -
    given: "" ""=""
    expected:
      by_index:
        -
      by_name:
        :
test_fields_list:
  -
    given: ""; ""
    expected:
      -
        by_index:
          -
        by_name:
      -
        by_index:
          -
        by_name:
  -
    given: ""=""; ""=""
    expected:
      -
        by_index:
        by_name:
          :
      -
        by_index:
        by_name:
          :

With flow-style:

test_fields:
  -
    given: ""
    expected:
      {by_index: [,], by_name: {}}
  -
    given: ""=""
    expected:
      {by_index: [], by_name: {:}},
  -
    given: "" ""=""
    expected:
      {by_index: [,], by_name: {:}},

test_fields_list:
  -
    given: ""; ""
    expected:
      -
        {by_index: [,], by_name: {}},
        {by_index: [,], by_name: {}},
  -
    given: ""=""; ""=""
    expected:
      -
        {by_index: [], by_name: {:}},
        {by_index: [], by_name: {:}},

I actually think this is a good example of how flow-style can improve a document (the original file was written pre-flow-style).

@LewisGaul
Copy link

Note: maybe we should move this discussion to a separate issue if it's going to drag out?

Well, it is not a programming language, but Verilog is the primary language used for representing electronic hardware designs, and it allows empty values in bracketed structures. Specifically, in argument lists you can pass values by name or by order. If you are passing by order you may simply skip a value to indicate that it should take its default value. For example: func(5, , 9). In this case the second argument takes its default value.

Out of interest, does Verilog allow trailing commas? Does it treat this as an empty value after the comma? I had a quick search and found this, which seems to suggest it treats it as an empty value after the comma (which seems sensible to me), but it also doesn't surprise me that this has caused confusion with the error message!

Other than NestedText, I can think of no other language that does not support quoting or escaping, and this is a primary feature of NestedText.

That's exactly my point - there is no precedent we're leaning on when making the decision here.

Certainly you are not suggesting that quoting be added back in?

I assume it's clear I'm not :)

And popular languages such as Python ignore trailing commas. Once you accept the idea that trailing commas are ignored and that quoting is not available, you end up with the current behavior of NestedText.

This is a reach as far as I'm concerned. I accept that there's no quoting in NestedText. I accept that languages with quoting accept trailing commas in general. I reject that languages without quoting accepting empty values should allow trailing commas (especially when the usual benefit to allowing them - diffs when spanning multiple lines - does not apply!).

However, ignoring trailing commas allows us to distinguish between empty lists and lists that contain an empty value. If you use NestedText to hold code snippets, something that NestedText is particularly good at, those two case are very important. You need to be able to support both, and you need to distinguish them.

To me, the second case is simply better. The first has a big hole in it. The only other alternative I can think of is to disallow empty values all together in inline lists, but that seems like too big a restriction.

I can see the logic here - I understood it the first time you explained it in the other issue. Do you understand that I'm seeing it from the other direction - from the user's perspective?

The current spec says that [,] is a list containing a single empty string. This is despite the comma acting as a separator, but not actually separating anything. When this was being discussed originally I asked many people (programmers and non-programmers, both of which I understand NestedText intends to target) the following question:
"If you accept that a list can contain empty values, would you say this [,] contains a single empty value or two empty values?"
Every single person said it looked like two values. This is why I consider it a fundamental flaw, since it goes against the 'Zen of NestedText':

  • "... easily understood and used by both programmers and non-programmers"
  • "Ideally people can understand it by looking at a few examples and they can use it without without needing to remember any arcane rules"

I then think about how to tackle this, and as per your examples, it seems the only consistent option would be to remove the ability to specify empty values in flow-style. I'm yet to be convinced that empty values in flow-style provides sufficient value to outweigh this entirely non-obvious aspect of the language (and this is a pretty high bar as far as I'm concerned).

If you use NestedText to hold code snippets, something that NestedText is particularly good at, those two case are very important. You need to be able to support both, and you need to distinguish them.

I'm not sure how this snippet was relevant, perhaps you could elaborate?

The vast majority of users will never encounter inline dictionaries and lists, and those that do will be the more sophisticated users who will not be troubled by this aspect of NestedText. Even among those, few will encounter lists of empty values.

I don't see this as a reasonable assumption. When a language feature exists, any user may encounter it (e.g. reading someone else's data file), and there will be users that get confused (even by the most carefully, well-designed features - that's just life 😃). We should seek to minimise this confusion.

I am much more concerned with fragmentation of the spec. NestedText is still very young, and its adoption will be harmed if we bicker over the definition and put out versions that are mutually incompatible.

I completely agree with this - this is why I'm continuing to try so hard to put my case forward! If I didn't care about fragmentation I'd just live with the difference in the zig implementation and move on :)

I only ask that such implementations make it clear that they are subsets, that they implement the base set of features, and that they remain upward compatible.

Yes, I've been meaning to make this clear, and have created LewisGaul/zig-nestedtext#16.

But I think it is damaging to put out implementations that add, remove, or modify language features based on personal preference.

I hope you understand that this is not driven by personal preference. This is driven by what I believe to be the best choice for the language in the long run. I am always open to being persuaded otherwise, but I haven't been so far on this point.

If you do not like this aspect of the language, I encourage you to simply implement your dump function so that it does not generate it while implementing your load function to support it so your version is compatible with all legal NestedTest files.

As I stated in my previous message, I have no plans to implement support for empty values in flow-style. Ideally we would be able to come to an agreement on whether or not there should be language support for them. Otherwise, I would certainly consider adding support if users of my library came to me asking for the feature with convincing use-cases. Currently, I feel there's not enough people involved in the discussion for any of us to really make a fully informed decision. Some research into what users want and what they find confusing could be very helpful in resolving this.

@KenKundert
Copy link
Owner

I hope you understand that this is not driven by personal preference.

Sorry, that was poorly worded. What I meant was 'personal beliefs' rather than 'personal preference'. In particular, our personal beliefs as to what is best for users, developers, and the long term success of NestedText. I think we are all considering each of these things, and nobody knows for sure what is best in each case. What is clear, is that we disagree.

I did misunderstand your proposal. I thought you wanted to eliminate empty lists rather than eliminating empty values in lists. I'm sure that was due to me not reading your responses carefully. Sorry about that.

So to summarize, you feel that allowing empty values in inline lists is confusing and think they should not be allowed, at least in the short term until a larger consensus forms, whereas I think that disallowing empty values in inline lists is too big a restriction. A good example supporting your view is [,], which you say that most people believe holds two empty values whereas in NestedText it only holds one. And I would offer up [-9, -8, -7, -6, -5, -4, -3,- 2, -1, , 1, 2, 3, 4, 5, 6, 7, 8 9], where I think it is obvious that 0 is represented as an empty string, and by disallowing it this compact representation must expand to 19 lines. One reason I don't want to give up the empty values in inline lists is that inline lists are really only suitable when they contain very short values, and the shortest values are the empty values. Another way of saying this is that if you confine yourself to short values, you are more likely to encounter empty values. In Kale's example where he is using NestedText to hold test cases, empty values come up a lot because they are often important special cases that must be tested.

At this point I still believe the current approach as documented in NestedText 2.0 is the best choice, and I assume that you remain unconvinced. So, I'm not sure as to how to proceed.

@LewisGaul
Copy link

Yes, I have more against the trailing comma than empty values in flow-style, but we've established we can't come up with a way to allow all combinations of empty values without them, hence I'm against empty values. Or perhaps I would prefer the wart to be that it's impossible to represent a list containing a single empty string in flow-style than for this confusing syntax of using trailing commas - a list of a single empty value can be represented on a single line even if not using flow-style, and as you say this is expected to be an edge case anyway.

The example of nested flow-style is admittedly fairly convincing (and also isn't solved by my alternative suggestion above). I'm not sure why 0 would be represented by an empty string though.

My general hypothesis is that wanting to represent empty strings should be quite rare, and that it may make more sense to use an alternative representation for some of these cases anyway. If this is the case then having slightly ugly syntax for it doesn't seem like the end of the world. But this is currently no more than a hypothesis, and one that you seem to disagree with.

As for how to proceed, how about this:

  • I will post on the Zig forum seeking some independent feedback on this issue (framed around the implementation of my Zig package).
  • By default I will not change my implementation, but will make this clear in the readme, and ask users to come forward with opinions either way. This can be viewed more as a 'known bug' in my implementation, at least until we have more feedback. I don't expect this to cause significant fragmentation issues, as my implementation is a strict subset.

@KenKundert
Copy link
Owner

Sounds like a good plan.

@LewisGaul
Copy link

The zigforum post is at https://zigforum.org/t/zig-nestedtext-release-0-1-0/383/5

@khs26
Copy link

khs26 commented Jun 14, 2021

Full disclosure: I'm a colleague of Lewis, and mainly have the context he has provided on this issue. Having said that, being his colleague would hopefully make it easier for me to disagree with him if anything!

I'd like to offer another voice in support of not allowing lists containing a single empty value (i.e. [,] -> ["", ""] rather than [,] -> [""]). The primary basis I'd have on this is just the quote from the NestedText docs:

It is easily created, modified, or viewed with a text editor and easily understood and used by both programmers and non-programmers.

In my view, this means that where there are easy analogies to draw with natural languages, they should be used. Natural language uses commas as separators and a trailing comma looks incorrect/incomplete. That means I imagine that non-programmers (and programmers unfamiliar with trailing commas) would infer that a single comma separated two empty values.

@KenKundert
Copy link
Owner

I think everyone is in agreement that using trailing commas to signal empty strings in an inline list is not desirable. But the alternative of simply not supporting empty strings is not desirable either. The disagreement is which is more undesirable. There are now three alternatives, perhaps you can weigh in on what you think is the best approach:

  1. trailing commas signify empty values (current approach):
    • [][]
    • [,][""]
    • [,,]["",""]
    • [a]["a"]
    • [a,]["a"]
    • [a,,]["a",""]
  2. do not support empty strings (Lewis's proposal):
    • [][]
    • [,]unsupported
    • [,,]unsupported
    • [a]["a"]
    • [a,]unsupported
  3. space represents value (new proposal):
    • [][]
    • [ ][""]
    • [,]["",""]
    • [a]["a"]
    • [a,]["a",""]

The new proposal matches what people expect in all cases except perhaps in the case of a list containing a single empty string. In this case, it is distinguished from an empty list by the fact that there is one or more spaces between the brackets. These spaces are taken as a value, and then, as with all values, the leading and trailing spaces are removed, leaving an empty string.

@LewisGaul
Copy link

LewisGaul commented Jun 17, 2021

Just wanted to explicitly add my other suggestion - trailing commas are disallowed but empty values are allowed (i.e. a comma at the end of a flow-style list indicate an empty value follows) , and we just accept that there's no way to represent a list containing a single empty string in flow-style (noting it can be expressed with a single line anyway when not embedded in another flow-style structure).

@KenKundert
Copy link
Owner

So, it is like #3, except that [ ] represents an empty list and [""]` is not representable in an inline list. Is that correct? Let's call this option 4.

Which is your preference?

@LewisGaul
Copy link

Yes, that's correct. I'm actually not sure which I prefer, but personally feel any of the alternatives are better than status quo.

@khs26
Copy link

khs26 commented Jun 18, 2021

I think 3 offers a good compromise between ease-of-understanding and what can be represented. I think that a special case of [ ] -> [""] is worth keeping if it means that all values can be represented.

@kalekundert
Copy link
Contributor

kalekundert commented Jun 18, 2021

I don't like 3. It's very counter-intuitive for [] and [ ] to be different, especially when whitespace is generally ignored in these structures. I think this is a much bigger problem than trailing commas, because:

  • Empty lists are much more common than lists that end with empty strings.
  • There's no precedent for [] and [ ] being different.

I don't like 4 either. Whether or not a value can be represented shouldn't depend on what other values are in the container.

I still think that 1 is clearly the best syntax. It allows empty values to be represented and uses standard, well-known syntax rules. It allows unabiguous cases like [a,,b] or {:}. It allows authors to choose whether it's more clear to use trailing commas or to just not use flow style.

@KenKundert
Copy link
Owner

I have decided to deprecate trailing commas and instead go with the idea of having [] represent an empty list, [ ] represent a list that contains a single empty string, [,] or [ , ] represent a list with two empty values, etc. For consistency reasons this has some impact on dictionaries. Specifically, { } is now an error, as is {a:A,}.

I have updated the documentation, the tests, and the python implementation. I will give some time to let things settle before releasing version 3.0. Hopefully this will be the last enhancement that is not backward compatible.

@KenKundert
Copy link
Owner

The changes have been incorporated into version 3.0, which is now available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants