Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding =: in $rule =: integer etc #112

Closed
codalogic opened this issue Jan 12, 2018 · 10 comments
Closed

Avoiding =: in $rule =: integer etc #112

codalogic opened this issue Jan 12, 2018 · 10 comments

Comments

@codalogic
Copy link
Contributor

To capture what was sent by email...

I've been trying to think how we can avoid the ugliness of the : in =: when we use a type.

If I remember rightly, the issue is that if we have:

$rule = "name" ...

or:

$rule = /p\d+/ ...

Without the extra token, it's not clear whether they are names or types.

After some thought, I'd like to propose the following to address the issue:

1 - Remove the string-value type. If you need an explicit string as a type, rely on the string-range regex type. i.e. instead of:

"name" : "Bob"

do:

"name" : /^Bob$/

That way if you see a quoted string, you know it's a name. (But so far, if you see a regular expression, it's not clear if it's a name or type.)

So, 2 - Treat the string in a quoted string name as an implicitly anchored regular expression. We could call it a q-re-string, as a variation of a q-string.

For example, instead of:

$o2 = { "p1" : integer, /^p\d+$/ : integer * }

we do:

$o2 = { "p1" : integer, "p\d+" : integer * }

Because the special regular expression characters ("(){}[]*+?|") rarely appear in object names, this is unlikely to cause a problem. Where it does, they can just be escaped.

With this arrangement, a quoted (regular expression) string is always a name, and a regular expression is always a type.

I think if we do this, the ambiguity between a name and a type is removed. So instead of having to do:

$rule =: /^channel\d+$/

We can just do:

$rule = /^channel\d+$/ 
@codalogic
Copy link
Contributor Author

codalogic commented Jan 12, 2018

What the above doesn't help with is the following situation:

$foo = ( int8 | int16 )   ; Group
$foo =: ( int8 | int16 )  ; type-choice

I'll look closer at this.

I THINK what we can do is change:

rule-def         = member-rule / type-designator rule-def-type-rule /
                array-rule / object-rule / group-rule /
                target-rule-name
rule-def-type-rule = value-rule / type-choice

to:

rule-def         = member-rule / value-rule / explicit-type-choice /
                group-rule / target-rule-name
               ; N.B. array-rule & object-rule already in value-rule
; removed - rule-def-type-rule = value-rule / type-choice

FYI- rule-def-type-rule and value-rule are:

rule-def-type-rule = value-rule / type-choice
value-rule       = primitive-rule / array-rule / object-rule

We'd want to change rule-def anyway to get rid of the :.

@anewton1998
Copy link
Contributor

This proposal means that JCR will no longer be a proper superset of JSON. I also think that people would find writing strings as regular expressions, even simple ones, to be an irritant.

This is also not backwards compatible with the JCR already out there.

I don't think there is a lot we can do here to avoid the issue. Here are some suggestions:

  • change the ABNF so the =: is only required on primitive rule assignments for strings and regex. this reduces some of the friction.
  • allow for an alternate quoting such as $foo = 'bar' and $foo = #^.*#. in combination with the above this offers the lowest friction IMO.

@codalogic
Copy link
Contributor Author

I'd missed the JSON superset bit. Yes, we don't want to lose that.

I like your suggestions. If we call, e.g., 'bar' a quoted regex, (i.e. a q-regex to twin q-string), then we could have:

$o2 = { "p1" : integer, 'p\d+' : integer * }

In that case, something like /^channel\d+$/ is never ambiguous. We can unambiguously do:

$r = /^channel\d+$/

That gets us half way there.

Then, say that, if you want a named rule that corresponds to only a string-value, then you have to use =: notation.

So we have:

$r1 = "name" : "value" ; a member-rule
$r2 = : "value" ; a string-value rule
$r3 = type "value" ; also a string-value rule
$r4 = /^value$/ ; a string-range constrained to a literal value for
                ; those that don't know about :s!

Other primitive-rules don't need the colon, but could include it optionally for backwards compatibility. So we can have either:

$r5 = integer
$r6 =: integer
$r7 = type integer

I realise you're deep in other stuff, and this idea may benefit from stewing a little bit. But to me that looks pretty good, and fixes the biggest annoyance I have with JCR at the moment.

@anewton1998
Copy link
Contributor

anewton1998 commented Jan 12, 2018

I'm still not following the reason for doing a quoted regex. Perhaps my proposal was not clear, and I realize now that re-use of the # (octothorpe) maybe a little problematic given we use that for directives.

My proposal is for string values to be quoted with either " or '. And as some regular expression libraries allow for "quoting" of regular expressions with multiple types, perhaps we could allow a regular expression to be noted by either the / character or the \ or | characters. (see https://stackoverflow.com/questions/2892749/php-regex-delimiters-vs-vs-what-are-the-differences).

So string and regex values could have different quoting, but member names MUST be double quotes and slashes.

Therefore

$r1 = "name" : "value"
$r2 = "name" : 'value' ;same as above $r1
$r3 = 'value' ; a rule assignment for a string value
$r4 = "name" : /^foo.*/
$r5 = "name" : |^foo.*| ;same as $r4
$r6 = |^foo.*| ; a rule assignement for a regex value

I hope I expressed it right this time.

@codalogic
Copy link
Contributor Author

codalogic commented Jan 13, 2018

Allowing "value" and 'value' looks promising.

I think I'd get confused between /^foo.*/ and |^foo.*| and which is used where.

I'd like to avoid having two ways to represent the same thing, if possible. Allowing different quoting of strings has precedent in other languages, so won't be difficult for people to take on board. I'm not so sure allowing both /^foo.*/ and |^foo.*| would seem so natural though.

Looks like we can use any any non-alphanumeric, non-backslash, non-whitespace character (http://us.php.net/manual/en/regexp.reference.delimiters.php). What about using backticks for member names using regular expressions? That looks more 'member name-like' to me, and has a 'processed' connotation. We then get:

$r1 = "name" : "value"
$r2 = "name" : 'value' ;same as above $r1
$r3 = 'value' ; a rule assignment for a string value
$r4 = "name" : /^foo.*/
$r5 = `p\d+` : 'value'
$r6 = /^foo.*/ ; a rule assignement for a regex value

and

$o2 = { "p1" : integer, `p\d+` : integer * }

Or are there too many wildcarded JCR rules out in the wild now?

@anewton1998
Copy link
Contributor

I don't know how many existing projects use a regex for member names. Checking my projects, only one does it in one spot. My suspicion is that it is not a heavily used feature. So let's proceed with the above.

@codalogic
Copy link
Contributor Author

Cool. I'll try coding it up in the validator.

@anewton1998
Copy link
Contributor

In answering Daniel's question on the JSON mailing list, it just occurred to me that another solution to this problem is to simply say that either a primitive assignment or member assignment requires the group syntax.

$r1 = ( "foo" )
;or
$r2 = ( "foo" : "bar" )

In fact, both work today. Just an observation.

@codalogic
Copy link
Contributor Author

I think if we did that it would require another entry in the tip-and-tricks section, which I think would best be avoided.

Also, looking at it, we have the same ambiguity in:

$r1 = ( "foo" )
$r2 = ( "foo" : "bar" )

as:

$r1 = "foo"
$r2 = "foo" : "bar"

(Parslet is doing a lot of back tracking to cover this up.)

With this issue resolved as above, we'd disambiguate the two by doing:

$r1 = ( 'foo' )
$r2 = ( "foo" : "bar" )

I'd like to think about it some more. I might email you about it, and/or start a new issue to avoid the issue tracking diverging.

@codalogic
Copy link
Contributor Author

Resolved by requiring the parser to disambiguate various scenarios. See PR #118.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants