support many more Unicode infix operators #6929

stevengj · 2014-05-23T02:43:02Z

This patch adds support for a much larger number of infix operators, based on my comment in #6582: I went through category Sm manually and pulled out a list of code points that seemed (a) unambiguously infix and (b) had a clear analogy to existing operators so that a reasonable precedence choice could be made.

I don't actually provide definitions for any of the new operators, but now they are available for the user to add methods if she wants to.

It also adds the synonym ∛ for cbrt, in analogy to √ for sqrt.

cc: @jiahao, @JeffBezanson

JeffBezanson · 2014-05-23T02:57:10Z

Feels kind of...profligate.

You have some duplicates in the operator lists, where we already had some unicode operators. Won't really cause a problem, but untidy.

JeffBezanson · 2014-05-23T03:00:58Z

We might want full-width operators to be normalized, based on usage in asian scripts
I don't know if ∣ should be banned out-right, but the confusion with | is certainly enough to keep it out of Base.

JeffBezanson · 2014-05-23T03:04:02Z

The n-ary big operators like ⋂ were recently added as identifier characters. I imagine them being used somewhat like ⋂(f(x) for x in y) and not as infix operators, in which case they can just be identifiers.

stevengj · 2014-05-23T03:13:07Z

Sure, I can try to remove some of the most similar entries if we don't want to support easily confusable infix operators (although we don't bother to do this for other identifiers like µ and μ), I guess since this is a whitelist it makes more sense to be choosy here.

(And yes, there are a few harmless duplications that I was too lazy to remove; I wanted to get a general sense of whether we wanted to do this first.)

My feeling is that it is nice to make a rich set of operators available for users to add methods to, even though we won't use most of them in Base, and once you decide to do this it's hard to make a sensible criterion for a "non-profligate" set of operators.

jiahao · 2014-05-23T03:16:08Z

Syntax like {x+1 ∣ x ∈ S} would be sweet though.

JeffBezanson · 2014-05-23T03:18:35Z

I agree with the overall idea. These characters are definitely infix operators, and the only things you can do with them are disallow them or parse them sensibly.

stevengj · 2014-05-23T15:02:28Z

I've updated the patch to remove duplicates and near-duplicates (e.g. operators that differ only in size). I also changed the left/right arrows to prec-arrow rather than prec-assignment.

stevengj · 2014-05-23T15:20:47Z

We should also really modify bin_ops_by_prec in show.jl to somehow get its list from julia-parser.scm, but I feel like that should be a separate PR.

(Not sure what the best way to do that would be, maybe define bin_ops_by_prec in C?)

JeffBezanson · 2014-05-23T15:28:12Z

We can add a C API call to fetch the operator table via scm_to_julia.

stevengj · 2014-05-23T15:32:36Z

Hmm, random Travis error with clang but not gcc. Looks unrelated?

JeffBezanson · 2014-05-23T15:33:18Z

Yes, unrelated but very troubling :)

StefanKarpinski · 2014-05-23T15:41:49Z

I'm curious @stevengj, how you decided which operators got plus-like precedence versus time-like precedence? Some are obvious – ± and ⋅ – but many are not. Since subsequent changes to precedence are likely to break code, these seem like they shouldn't be chosen too cavalierly.

stevengj · 2014-05-23T15:47:39Z

@StefanKarpinski, when it wasn't obvious from the shape, I just went with their documented meaning in the Unicode standard: any operator documented as a product, conjunction, intersection, or division of some kind (e.g. ⋋ is left semidirect product) got times precedence, while any operator documented as an addition/subtraction, logical-or, or union of some kind was given plus-like precedence.

I used this list of category-Sm code points, which helpfully gives the name of each code point.

Operators whose precedence seemed unclear I left out. Did I include any operators whose precedence you found unclear?

StefanKarpinski · 2014-05-23T16:09:18Z

Oh, no – they just weren't all obvious to me, but that seems like a very sane way to do it.

stevengj · 2014-05-23T21:02:03Z

Note that this basically fixes #552.

…hould have * precedence despite looking like a union; change NEWS table to only explicitly list operators that are predefined

stevengj · 2014-05-24T20:44:43Z

Another thing that I was thinking of implementing, possibly in a separate PR, is:

Allow every operator (except for a small blacklist) to have a variant starting with a dot, e.g.. allowing ≪ automatically gives you .≪.
Allow every single-character operator to allow suffixes consisting of combining characters (categories Me and Mn), primes, and possiby a few other characters (sub/superscripts?). e.g. allowing ⊗ automatically gives you ⊗′ and ⊗̃.

Similar to Jeff's remark above, there is no question that e.g. +̂ is an infix operator, so the only things you can do are either to disallow it or to parse it sensibly, and there is no reason that I can see not to parse it sensibly (e.g. the precedence is obvious). Similarly, if we are going to allow < and .<, then it doesn't make sense to me to allow ≪ but not .≪ etcetera.

This should be pretty easy to implement: you simply strip off any . prefix and any allowed suffix before checking whether the operator is in the allowed Set. It still obeys the rule that every prefix of an operator is also an operator, and will simplify the operator list because we no longer need to list .==, .* etcetera explicitly.

~~I took a stab at implementing this in the parser, but I ran into trouble because of an apparent oddity in flisp's string processing:~~ Nevermind, I see that string.char takes a byte index, not a character index, and I'm supposed to step through the string with string.inc.

stevengj · 2014-05-25T02:01:56Z

Okay, I was able to put together a sample implementation of the above suggestion for operators+combining characters. It required a few more changes to the parser, though, so I'll leave it for a separate PR when(?) this one is merged.

JeffBezanson · 2014-05-27T17:57:12Z

I think the "big" N-ary operators should not be infix.

stevengj · 2014-05-27T18:02:59Z

@JeffBezanson, I thought I got rid of the big N-ary operators; which ones did I miss?

JeffBezanson · 2014-05-27T18:10:24Z

U+2A00 (⨀), the big circled operators.

stevengj · 2014-05-27T18:43:38Z

⨀ is not in the list. Looks like I left in ⨁ though; will fix.

JeffBezanson · 2014-05-27T18:47:59Z

Ah, I confused it with U+29BF CIRCLED BULLET. Gotta love unicode...

stevengj · 2014-05-27T18:49:41Z

Does the operator? predicate need to be replaced with a hash table?

JeffBezanson · 2014-05-27T18:53:51Z

Yes, that's a very good idea.

support many more Unicode infix operators

If there is no whitespace between the nearest `\` and the cursor, try to complete a latex symbol or its name *instead* of a Julian name. This allows for interactive discovery of latex names, but whitespace is required for completion of a Julia name. Note that if these completions were instead *appended* to the Julia options, they have to display without the leading \. I found that to be confusing when mixed in with the Julian names. If the word matches a latex name exactly, it replaces it with the symbol. Otherwise, it attempts to complete the latex names. While there are some names that are prefixes to other names, I don't find this to be too jarring. It does effectively "shadow" the longer names, making them harder to discover.

jiahao added the unicode label May 23, 2014

stevengj added the feature label May 23, 2014

support many more Unicode infix operators

97d46a8

eliminate a couple more confusables; multiset multiplication U+228D s…

285a10e

…hould have * precedence despite looking like a union; change NEWS table to only explicitly list operators that are predefined

rm U+2a01 (N-ary circled plus) from infix list

d234b4f

JeffBezanson added a commit that referenced this pull request May 27, 2014

Merge pull request #6929 from stevengj/uni_ops

b78d9b4

support many more Unicode infix operators

JeffBezanson merged commit b78d9b4 into JuliaLang:master May 27, 2014

This was referenced Oct 31, 2014

infix notation for more functions #4498

Closed

Alternative syntax for map(func, x) #8450

Closed

stevengj mentioned this pull request Jan 8, 2015

inconsistent .op parsing for unicode operators #9684

Closed

stevengj mentioned this pull request Sep 28, 2015

RFC: Operators for tensor sum and tensor product in Julia 0.5 #13333

Closed

Ismael-VC mentioned this pull request Jan 3, 2016

Allow users to define "dot" vectorized operators. #14544

Closed

stevengj mentioned this pull request Feb 4, 2016

Operator precedence for unlisted operators & anonymous functions #14933

Closed

stevengj mentioned this pull request May 9, 2016

Vectorization Roadmap #16285

Closed

5 tasks

aaronsheldon mentioned this pull request Jul 28, 2016

Add relational algebra unicode characters #17677

Closed

stevengj mentioned this pull request May 26, 2017

RFC: allow operator suffixes — combining characters and primes #22089

Merged

stevengj deleted the uni_ops branch October 6, 2017 17:07

inkydragon mentioned this pull request Jul 8, 2022

Why is ⥺ not an operator? #45962

Closed

stevengj mentioned this pull request Jan 5, 2024

Adding \mid to the parser and setting its precedence higher than :: as an experimental feature #52756

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support many more Unicode infix operators #6929

support many more Unicode infix operators #6929

stevengj commented May 23, 2014

JeffBezanson commented May 23, 2014

JeffBezanson commented May 23, 2014

JeffBezanson commented May 23, 2014

stevengj commented May 23, 2014

jiahao commented May 23, 2014

JeffBezanson commented May 23, 2014

stevengj commented May 23, 2014

stevengj commented May 23, 2014

JeffBezanson commented May 23, 2014

stevengj commented May 23, 2014

JeffBezanson commented May 23, 2014

StefanKarpinski commented May 23, 2014

stevengj commented May 23, 2014

StefanKarpinski commented May 23, 2014

stevengj commented May 23, 2014

stevengj commented May 24, 2014

stevengj commented May 25, 2014

JeffBezanson commented May 27, 2014

stevengj commented May 27, 2014

JeffBezanson commented May 27, 2014

stevengj commented May 27, 2014

JeffBezanson commented May 27, 2014

stevengj commented May 27, 2014

JeffBezanson commented May 27, 2014

support many more Unicode infix operators #6929

support many more Unicode infix operators #6929

Conversation

stevengj commented May 23, 2014

JeffBezanson commented May 23, 2014

JeffBezanson commented May 23, 2014

JeffBezanson commented May 23, 2014

stevengj commented May 23, 2014

jiahao commented May 23, 2014

JeffBezanson commented May 23, 2014

stevengj commented May 23, 2014

stevengj commented May 23, 2014

JeffBezanson commented May 23, 2014

stevengj commented May 23, 2014

JeffBezanson commented May 23, 2014

StefanKarpinski commented May 23, 2014

stevengj commented May 23, 2014

StefanKarpinski commented May 23, 2014

stevengj commented May 23, 2014

stevengj commented May 24, 2014

stevengj commented May 25, 2014

JeffBezanson commented May 27, 2014

stevengj commented May 27, 2014

JeffBezanson commented May 27, 2014

stevengj commented May 27, 2014

JeffBezanson commented May 27, 2014

stevengj commented May 27, 2014

JeffBezanson commented May 27, 2014