RFC: allow operator suffixes — combining characters and primes #22089

stevengj · 2017-05-26T18:58:13Z

This PR implements something that I had been hoping to do for a while (see #6929 (comment)): custom operators can be defined by appending Unicode combining characters, primes, and sub/superscripts to other operators.

For example, +̂ and +″ are now parsed as binary operators with the same precedence as +.

Rationale: this allows you to define an operator that is clearly a "modified +" (etc.) without having to dig through Unicode for some vaguely appropriate symbol, and without overriding + itself. Also, it is pretty inconceivable that +̂ could be anything other than an infix operator, so it is a choice between supporting it or giving an error, and I don't see why an error would be useful.

(Note: combining characters with operators, e.g. +̂, don't show up properly in some fonts. It should look like

)

stevengj · 2017-05-26T19:04:24Z

It would also be possible to support a whitelist of superscripts and subscripts. For example, if we wanted +⁽¹⁾ or *ₐ to parse as operators.

Unicode already has a couple of oddball examples of such operators, e.g. U+2a27 is ⨧, which we already parse as a +-like operator.

dpsanders · 2017-05-27T02:28:10Z

Great! ⁻¹ would be very nice.

tkelman · 2017-05-31T18:44:48Z

NEWS.md

@@ -4,6 +4,14 @@ Julia v0.7.0 Release Notes
 New language features
 ---------------------

+* `getpeername` on a `TCPSocket` returns the address and port of the remote
+    endpoint of the TCP connection ([#21825]).


bad merge resolution

stevengj · 2017-05-31T19:49:57Z

(Still working on some fixes to this PR. In particular, I'm updating it to blacklist many syntactic "operators" like ? and ' and : from suffixes, so that basically only ordinary binary operators are allowed to have suffixes.)

stevengj · 2017-06-01T12:33:48Z

(The annoyance with supporting superscripts and subscripts is that these codepoints are scattered all over unicode; the only way to detect them is to make a manual table.)

stevengj · 2017-06-01T12:46:29Z

This seems to be the list of the 93 Latin/Greek/math super/subscripts in Unicode, sorted by codepoint: ²³¹ʰʲʳʷʸˡˢˣᴬᴮᴰᴱᴳᴴᴵᴶᴷᴸᴹᴺᴼᴾᴿᵀᵁᵂᵃᵇᵈᵉᵍᵏᵐᵒᵖᵗᵘᵛᵝᵞᵟᵠᵡᵢᵣᵤᵥᵦᵧᵨᵩᵪᶜᶠᶥᶦᶫᶰᶸᶻᶿ ⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿₐₑₒₓₕₖₗₘₙₚₛₜⱼⱽ. Correction: whoops, forgot the numeric subscripts.

stevengj · 2017-06-01T19:29:41Z

Rebased. Tests were green before the rebase, so it should be good to squash+merge once others approve.

stevengj · 2017-06-05T18:02:12Z

@StefanKarpinski, any chance of a decision on this?

stevengj · 2017-06-10T11:40:50Z

Seems like everyone agrees with this change in principle, and it is just a matter of approving the implementation. @JeffBezanson?

stevengj · 2017-06-13T18:38:18Z

Rebased and fixed whitespace problem introduced by last merge.

JeffBezanson · 2017-06-13T20:16:15Z

src/julia-parser.scm

@@ -54,9 +58,25 @@
           (lambda (x)
             (has? t x))))))

+; only allow/strip suffixes for some operators
+(define no-suffix? (Set (append prec-assignment prec-conditional prec-lazy-or prec-lazy-and


Could this be a whitelist instead?

It was a bit easier as a blacklist because otherwise I'd have to split e.g. prec-arrow into a couple of separate lists rather than just explicitly listing -- --> -> here.

JeffBezanson · 2017-06-13T20:18:25Z

src/julia-parser.scm

@@ -68,7 +88,9 @@
                             (pushprec (cdr L) (+ prec 1)))))
                     (pushprec (map eval prec-names) 1)
                     t))
-(define (operator-precedence op) (get prec-table op 0))
+(define (operator-precedence op) (get prec-table
+                                      (maybe-strip-op-suffix op)


Testing for operators can be pretty important for performance. It would be nice to only call maybe-strip-op-suffix for precedence levels that support it.

I was hoping that maybe-strip-op-suffix would be fast enough, because in the common case strip-op-suffix (implemented in C) is a no-op and no-suffix? isn't even called.

Is there any way to benchmark the impact of this?

Try parsing e.g. string(:[$((:(a+b) for i=1:10000)...)])

I tried benchmarking parse(s) for s = string(:[$((:(a+b) for i=1:10000)...)]), and removing the maybe-strip-op-suffix call from operator-precedence makes no detectable difference on my machine, so operator-precedence doesn't seem to be a problem.

However, there seems to be about an 8% slowdown overall in that benchmark compared to before this PR, so there must be something else in this PR that is the culprit.

JeffBezanson · 2017-06-13T20:21:46Z

test/parse.jl

+@test parse("3 +⁽¹⁾ 4") == Expr(:call, :+⁽¹⁾, 3, 4)
+@test parse("3 +₍₀₎ 4") == Expr(:call, :+₍₀₎, 3, 4)
+@test Base.operator_precedence(:+̂) == Base.operator_precedence(:+)
+


Should add some cases of suffixes on operators that don't allow them.

~~Will do.~~ Done.

stevengj · 2017-06-15T17:31:58Z

This PR causes a slight slowdown in the parser. In particular, replacing Set with SuffSet for the various is-prec-foo? functions causes about an 8% slowdown in parse Jeff's artificial benchmark from above, and about a 5% slowdown for a more realistic benchmark (parsing about 20000 lines from base).

Is this a concern? Any suggestions?

JeffBezanson · 2017-06-15T17:40:33Z

Yes, I think we should try using SuffSet only for precedence levels that need it.

JeffBezanson · 2017-06-15T17:41:57Z

Ah, just saw your optimization. That looks good. How much does it help?

stevengj · 2017-06-15T17:53:46Z

@JeffBezanson, unfortunately, that optimization hardly makes a difference (7% slowdown instead of 8%).

stevengj · 2017-06-15T18:05:43Z

I don't really have a good handle on performance optimization for flisp.

stevengj · 2017-07-06T19:47:40Z

Any thoughts on how I can further speed up parsing? Or whether we should just swallow the 5% parsing slowdown on realistic code and worry about parser optimization later?

StefanKarpinski · 2017-07-11T15:02:43Z

I'm all for swallowing the 5% slowdown for this.

stevengj · 2017-07-11T16:05:35Z

Should be ready to merge if we decide we want it.

JeffBezanson · 2017-07-18T15:34:31Z

src/julia-parser.scm

+(define (maybe-strip-op-suffix op)
+  (if (symbol? op)
+      (let ((op_ (strip-op-suffix op)))
+        (if (or (eqv? op op_) (no-suffix? op_))


Use eq? here?

Sure. (Makes < 1% difference in the benchmark.)

JeffBezanson · 2017-07-18T15:40:35Z

src/julia-parser.scm

+  (let ((S (Set l)))
+    (if (every no-suffix? l)
+        S ; suffixes not allowed for anything in l
+        (lambda (op) (S (maybe-strip-op-suffix op))))))


Maybe try splitting l into operators that do/don't support suffixes, and testing (or (no-suff-set op) (suff-set (maybe-strip-op-suffix op))) (depending on which of the sets are non-empty of course).

Tried this; it makes < 1% difference in the benchmark.

StefanKarpinski · 2017-07-18T17:26:36Z

Whoever merges, please remember to squash!

…for key)

stevengj · 2017-07-19T18:04:59Z

Looks like an unrelated stalled build on Travis.

tkelman · 2017-07-20T03:55:14Z

doc/src/manual/variables.md

@@ -96,7 +96,7 @@ Operators like `+` are also valid identifiers, but are parsed specially. In some
 can be used just like variables; for example `(+)` refers to the addition function, and `(+) = f`
 will reassign it. Most of the Unicode infix operators (in category Sm), such as `⊕`, are parsed
 as infix operators and are available for user-defined methods (e.g. you can use `const ⊗ = kron`
-to define `⊗` as an infix Kronecker product).
+to define `⊗` as an infix Kronecker product).  Operators can also be suffixed with modifying marks, primes, and sub/superscripts, e.g. `+̂ₐ″` is parsed as an infix operator with the same precedence as `+`.


would be good to conform to the line length convention of the rest of the file

stevengj · 2017-09-20T17:05:42Z

Bump. Okay to squash/merge?

stevengj added parser Language parsing and surface syntax domain:unicode Related to unicode characters and encodings labels May 26, 2017

tkelman reviewed May 31, 2017

View reviewed changes

stevengj force-pushed the opsuffix branch from 77269fe to 9eb2138 Compare June 1, 2017 12:30

stevengj force-pushed the opsuffix branch from 480c1f0 to 9e6ef2e Compare June 1, 2017 19:29

stevengj force-pushed the opsuffix branch from b56964d to 1d2da0e Compare June 13, 2017 18:37

JeffBezanson reviewed Jun 13, 2017

View reviewed changes

stevengj added the kind:potential benchmark Could make a good benchmark in BaseBenchmarks label Jun 15, 2017

stevengj mentioned this pull request Jul 11, 2017

added some julia parser etc benchmarks JuliaCI/BaseBenchmarks.jl#84

Merged

JeffBezanson reviewed Jul 18, 2017

View reviewed changes

stevengj and others added 9 commits July 19, 2017 09:57

allow operator suffixes: combining characters and primes

b29ff6d

blacklist key syntactic operators from suffixing

aaac3c1

fix incorrect pointer type

c5a32be

allow sub/superscript operator suffixes

cb131e0

add missing subscripts

9e53720

rm unnecessary assertion (only need for value, as in jl_charmap, not …

6a118ef

…for key)

add tests for operators that are not supposed to be suffixable

052b0f2

slight optimization

07a2014

another slight optimization

e976448

stevengj force-pushed the opsuffix branch from 8ee877c to e976448 Compare July 19, 2017 14:47

tkelman reviewed Jul 20, 2017

View reviewed changes

line length

31b7edb

JeffBezanson merged commit d50eac6 into JuliaLang:master Sep 20, 2017

stevengj deleted the opsuffix branch September 20, 2017 17:16

KristofferC mentioned this pull request Sep 20, 2017

wip allow for op suffixes JuliaLang/Tokenize.jl#117

Closed

JeffBezanson mentioned this pull request Nov 10, 2017

new syntax for transpose #21037

Closed

davidagold mentioned this pull request May 5, 2018

Support non-vectorized syntax in @where JuliaData/DataFramesMeta.jl#39

Closed

HarrisonGrodin mentioned this pull request Jul 16, 2018

Parse suffixed operators with chains #28130

Open

This was referenced Aug 4, 2018

Ambiguous syntax with operator suffixes #28441

Open

Allow operator suffixes to the postfix apostrophe #28494

Closed

This was referenced Nov 26, 2018

Infix operators JuliaArrays/LazyArrays.jl#10

Closed

Do not allow THIN SPACE U+2009 as an operator suffix #30158

Merged

stevengj mentioned this pull request Jan 27, 2020

Unicode modifiers for adjoint operator #34507

Closed

RFC: allow operator suffixes — combining characters and primes #22089

RFC: allow operator suffixes — combining characters and primes #22089

Conversation

stevengj commented May 26, 2017 • edited Loading

stevengj commented May 26, 2017 • edited Loading

dpsanders commented May 27, 2017

Choose a reason for hiding this comment

stevengj commented May 31, 2017

stevengj commented Jun 1, 2017

stevengj commented Jun 1, 2017 • edited Loading

stevengj commented Jun 1, 2017

stevengj commented Jun 5, 2017

stevengj commented Jun 10, 2017

stevengj commented Jun 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj Jun 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj Jun 13, 2017 • edited Loading

Choose a reason for hiding this comment

stevengj commented Jun 15, 2017

JeffBezanson commented Jun 15, 2017

JeffBezanson commented Jun 15, 2017

stevengj commented Jun 15, 2017 • edited Loading

stevengj commented Jun 15, 2017

stevengj commented Jul 6, 2017

StefanKarpinski commented Jul 11, 2017

stevengj commented Jul 11, 2017

Choose a reason for hiding this comment

stevengj Jul 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanKarpinski commented Jul 18, 2017

stevengj commented Jul 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj commented Sep 20, 2017 • edited Loading

stevengj commented May 26, 2017 •

edited

Loading

stevengj commented May 26, 2017 •

edited

Loading

stevengj commented Jun 1, 2017 •

edited

Loading

stevengj Jun 13, 2017 •

edited

Loading

stevengj Jun 13, 2017 •

edited

Loading

stevengj commented Jun 15, 2017 •

edited

Loading

stevengj Jul 19, 2017 •

edited

Loading

stevengj commented Sep 20, 2017 •

edited

Loading