Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow colon before target rule names #18

Closed
anewton1998 opened this issue May 5, 2016 · 22 comments
Closed

allow colon before target rule names #18

anewton1998 opened this issue May 5, 2016 · 22 comments

Comments

@anewton1998
Copy link
Contributor

anewton1998 commented May 5, 2016

A colleague who has started using JCR for a server testing framework found this issue. Our new relaxed syntax rules allow a colon (':') character before objects and arrays, but not before target rule names. This caused a bit of confusion.

This looks like it should work, but doesn't.

    {
     "net" : {
       "handle"        : arin_string,
       "name"          : arin_string,
       "startAddress"  : arin_string,
       "version"       : arin_string,
       "netBlocks"     : {
         "netBlock"       : {
           "cidrLength"       : arin_string,
           "description"      : arin_string,
           "startAddress"     : arin_string,
           "type"             : arin_string
         }
       }
     }
   }

   arin_string : { "$" : string }

This is what works:

    {
     "net" : {
       "handle"         arin_string,
       "name"           arin_string,
       "startAddress"   arin_string,
       "version"        arin_string,
       "netBlocks"    : {
         "netBlock"      : {
           "cidrLength"        arin_string,
           "description"       arin_string,
           "startAddress"      arin_string,
           "type"              arin_string
         }
       }
     }
   }

   arin_string : { "$" : string }
@codalogic
Copy link
Contributor

My feeling is that the presence or absence of the : tells you whether the type is built-in (like 'string') or is user-defined (like arin_string). So I think we should keep it as is.

Essentially the user can look at it as macro substitution, and the colon is pulled in from where 'arin_string' is defined. If you think of it that way, then after user-defined type substitution you'd end up with:

"handle"        : : { "$" : string }

which is clearly wrong. I think this is something someone has to learn about JCR. Maybe we could make it clearer in the draft.

@anewton1998
Copy link
Contributor Author

But haven't we already bent to the "be more JSON like" by allowing colon before objects and arrays? That's what has caused the issue.

Looking at the base cause of the confusion, it's the hunting for the missing : character, which visually is not easy to find sometimes. Another solution to this problem is more syntactic sugar along the lines of our new = sign:

    {
     "net" : {
       "handle"         -> arin_string,
       "name"           -> arin_string,
       "startAddress"   -> arin_string,
       "version"        -> arin_string,
       "netBlocks"    : {
         "netBlock"      : {
           "cidrLength"        -> arin_string,
           "description"       -> arin_string,
           "startAddress"      -> arin_string,
           "type"              -> arin_string
         }
       }
     }
   }

   arin_string : { "$" : string }

Here target rules are easy to spot as they are preceeded by ->.

@codalogic
Copy link
Contributor

I'm afraid I'm still not feeling the love for it! Allowing : before objects and arrays allows us to start out with a JSON message (mostly) and work from there with minimal editing. Once we're into defining named rules, we're moving well away from pure JSON so it seems reasonable to me that it no longer looks like JSON.

If we did introduce a marker, presumably we'd enable something like:

{
 "net" : {
   -> handle,
   -> name,
   "startAddress"   -> arin_string,
   "version"        -> arin_string,
   "netBlocks"    : {
     "netBlock"      : {
       "cidrLength"        -> arin_string,
       "description"       -> arin_string,
       "startAddress"      -> arin_string,
       "type"              -> arin_string
     }
   }
 }
}

handle = "handle" -> arin_string
name = "name" -> arin_string
arin_string = : { "$" : string }

To me "syntactic sugar" allows me to do something in a more concise way that I can already do in a more formal way. To me, the additional characters just add noise and extra typing.

@anewton1998
Copy link
Contributor Author

We seem to be retreading this ground. :)
My feeling is that now is the time to make any non-compatible syntax changes.

I agree the -> is more typing, but not much more. This is a trade-off between readability and composition. We added = for this reason. Going back to another idea mentioned in email, another potential solution is to require rule names to start with ..

{
 "net" : {
   .handle,
   .name,
   "startAddress"   .arin_string,
   "version"        .arin_string,
   "netBlocks"    : {
     "netBlock"      : {
       "cidrLength"        .arin_string,
       "description"       .arin_string,
       "startAddress"      .arin_string,
       "type"              .arin_string
     }
   }
 }
}

.handle = "handle" .arin_string
.name = "name" .arin_string
.arin_string = : { "$" : string }

@anewton1998
Copy link
Contributor Author

Between the two, I think -> conveys more meaning.

@anewton1998
Copy link
Contributor Author

Hmm... since this issue arose from the confusion of a programmer, perhaps I should run the variations by him and a couple of other programmers to see which works. By variations, I mean the current syntax, and the various proposals here. I'll attempt to see if others are confused by the current syntax first.

@codalogic
Copy link
Contributor

Thinking over night, the following might work:

{
 "net" : {
   $handle,
   $name,
   "startAddress"   : $arin_string,
   "version"        : $arin_string,
   "netBlocks"    : {
     "netBlock"      : {
       "cidrLength"        : $arin_string,
       "description"       : $arin_string,
       "startAddress"      : $arin_string,
       "type"              : $arin_string
     }
   }
 }
}

$handle = "handle" : $arin_string
$name = "name" : $arin_string
$arin_string = : { "$" : string }

The colon would signal the start of a type (rather than it's JSON purpose of separating name and type), but it could be omitted in situations where it was obvious what followed was a type, such as the start of an object / array, or within an array. So you'd have the following kinds of rule:

$a_member_rule = "foo" : "bar"
$a_fixed_string_type = : "bar"   ; Note preceding colon to say this is a type

For arrays you'd have:

$array_1 = : [ : "foo", : "bar" ]   ; Full form
$array_2 = [ : "foo", : "bar" ]   ; Colon preceding [ inferred on finding [
$array_3 = : [ "foo", "bar" ]  ; Colons implied due to being in an array
$array_3 = [ "foo", "bar" ]  ; 2 and 3 combined

That looks like it might work. The idea that the colon is to mark the start of a type definition as opposed to being a separator is quite subtle and might confuse some people. That would come into play when defining something like $a_fixed_string_type. Would it lead to more confusion than we already have? Putting all those $ in would be a lot of work to update the test code! Tweaking the ABNF to allow colon in some contexts but not others might be a tough mental challenge!

@anewton1998
Copy link
Contributor Author

Ok. I just presented these ideas (except my idea with periods) to a room full of programmers, and got some valuable feedback.

The primary thing they said was "valid JSON should be valid JCR".

The liked the idea of '$' over '->', except they weren't wild about it on the left hand side.

So:

$array_1 = : [ : "foo", : "bar" ]   ; confusing
$array_2 = [ : "foo", : "bar" ]   ; confusing
$array_3 = : [ "foo", "bar" ]  ;confusing
$array_4 = [ "foo", "bar" ]  ;not confusing

Getting back to the primitive of the rules, maybe we should do this:

; array rule
a = [ $foo, "foo" ]

;object rule
o = { "foo" : $bar, /fu.*/ : string }

;member rules
m1 = "fiz" : 0..1
m2 = /bi.*/ : string

;value rule
v = float

In other words, we remove the colon from value rules and it appears as require only in member rules.

Of course, that brings us back to this conflict:

m1 = "fiz" : 0..1
v1 = "fiz"

@anewton1998
Copy link
Contributor Author

BTW, there was universal support for adding the equal sign.

@anewton1998
Copy link
Contributor Author

Thinking about this some more, regardless of the $ in rule names, the issue is the colons. And the colons are an issue because of named rules for values.

a = [ "foo" ] ; this is fine
b = "bar" : "baz" ;this is also fine
v = "fiz" ; this is the issue

In other words, our root cause is definition of named values. So perhaps we borrow from Perl and Ruby and use % for defining named values:

v1 = %"fiz"
v2 = %/fu.*/

The % only succeeds the = on value rules.

It makes the definitions of values different from other types of rules, but it is better to define an inconsistency with this rule definition than the placement of colons.

@codalogic
Copy link
Contributor

codalogic commented May 10, 2016

I think if we go with $ in names, then we should have $ both where it's used and where it is defined. If we have:

{ "foo" : $foo }
foo = string

then it raises the question, "are foo and $foo the same? Are they related in some special way?" Whereas, if you have:

{ "foo" : $foo }
$foo = string

there's no doubt that the 2 relate to the same thing. The equals sign makes it clear which role is which as well.

It can be argued that the second one is a bit uglier, but I think clarity is more important.

@codalogic
Copy link
Contributor

codalogic commented May 10, 2016

I think we should make colon the secret ingredient for defining stand-alone string values. So:

$a = [ "foo" ]
$b = "bar" : "baz"
$vi = 12
$vs = : "fiz"

At least you don't have to remember a special random character for what might be quite a rare case. Once you've remembered that the special character is the same as the separator you're good to go. This topic looks like being JCR's Stack Overflow question!

@codalogic
Copy link
Contributor

Maybe we should just accept that we have to put a bit of smarts into the parser to do the right thing for the user? If a string in a definition is followed by a : then it's a member rule, if not it's a regular value:

$b = "bar" : "baz" ;string followed by : -> member rule
$v = "fiz" ; string _not_ followed by : -> string value
$vi = 12

Parslet maybe able to give you that functionality out-the-box. In my C++ code I think I can assume that it's a type until I seem a following : and then move the string value into the member name field.

@codalogic
Copy link
Contributor

Conceivably we could even allow:

{ $name : $value }
$name = "foo"
$value = "bar"

But I think that would be quite a lot of grief for the code, which may reduce the number of implementations that get out there.

@anewton1998
Copy link
Contributor Author

Thanks for suffering through this, Pete.

I think we do have to be considerate of what parsers can and cannot do. A lot of the ABNF parsers generate gobs of code, and good luck figuring out which part of that to contort.

Therefore I think your idea of the colon before the string is our best option:

$vs = :"fiz"
$vr = :/b.*/

I don't know if we should allow a space between the colon and the quote or slash characters. My sense is no.

As for the dollar sign, I think you are right. If it is to be required in one place, it is to be required everywhere. To be honest, I've never liked the bash approach.

@codalogic
Copy link
Contributor

I hadn't connected that bash did it that way.

One detail to check is what happens when we include aliases. We really need to do (so the $ sign indicates where the parsing is about to go):

{ $an_alias.my_rule }

That might suggest that $ is a marker, as opposed to part of the name, which could favor vr = .... But I think aliases are an advanced concept and such users will cope with it, whereas novices might feel uneasy about $vr and vr =, finding $vr and $vr = more intuitive.

As for space after the colon in :/b.*/, I think we should always show it as no space, but be a bit Postel and allow the grammar to tolerate space.

@anewton1998
Copy link
Contributor Author

I agree on both of these issues.

@anewton1998
Copy link
Contributor Author

I've branched master and I'm working on putting in the dollar signs. I'll let you know how far I get.

@codalogic
Copy link
Contributor

Based on my previous attempt, and in the hope it saves you going down the detour I did, you'll need something like:

rule(:rule) { str('$') >> ( rule_name >> spcCmnt? >> str('=') >> spcCmnt? >> rule_def ).as(:rule) }
    #! rule = "$" rule_name spcCmnt? "=" spcCmnt? rule_def
    #!

rule(:rule_name)         { name.as(:rule_name) }
    #! rule_name = name
rule(:target_rule_name)  { (annotations.as(:annotations) >> str('$') >> (ruleset_id_alias >> str('.')).maybe >> rule_name).as(:target_rule_name) }
    #! target_rule_name  = annotations "$" [ ruleset_id_alias "." ] rule_name

i.e. rule and target_rule_name have the $, not rule_name.

@anewton1998
Copy link
Contributor Author

Yes, I think I did just that. The branch is now there. Let me know what you think.

@anewton1998
Copy link
Contributor Author

The $ and = are in the code for jcrvalidator here: https://github.com/arineng/jcrvalidator/tree/dollars_in_rule_names

The tests tell us the Parslet rules are working. I can't see why the matching ABNF wouuldn't.
Will start another issue on the colon, of which there is good news and bad news.

@anewton1998
Copy link
Contributor Author

Now in the checklist in #22.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants