Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tilde automatically quote its arguments #4882

Closed
johnmyleswhite opened this issue Nov 21, 2013 · 40 comments
Closed

Make tilde automatically quote its arguments #4882

johnmyleswhite opened this issue Nov 21, 2013 · 40 comments
Assignees
Labels
kind:breaking This change will break code needs decision A decision on this change is needed
Milestone

Comments

@johnmyleswhite
Copy link
Member

In the past we've talked about making the tilde operator do something special to make statistical functions look nicer. One simple approach would take ex1 ~ ex2 and automatically wrap it in a quote call.

This would allow us to change clunky interfaces like glm(:(y ~ x)) into the nicer glm(y ~ x).

I'd love to see something like this happen for the 0.3 release, since it will allow us to provide a cleaner (and more familiar interface) for a lot of statistical functions. I suspect it wouldn't even be a badly breaking change, since I'm not aware of anyone using the tilde operator except people doing statistics in Julia.

@StefanKarpinski
Copy link
Sponsor Member

I'm on board with this. I think that what it should do is invoke the @tilde macro in the current scope, which can then construct a Formula object or whatever else.

@johnmyleswhite
Copy link
Member Author

What's the current scope defined as? If I give a definition in DataFrames of @tilde, then someone else gives a definition in Package P, and I call glm(y ~ x) in Main, where does it go?

@Keno
Copy link
Member

Keno commented Nov 21, 2013

Depends on the usual scoping rules I guess. Maybe rewrite y ~ x to ~(:(x),:(y)) so that y DataFrames.~ x could become DataFrames.~(:(x),:(y)), though now that I'm writing this out, I'm not sure it's a great idea.

@Keno
Copy link
Member

Keno commented Nov 21, 2013

or rather @~

@StefanKarpinski
Copy link
Sponsor Member

Oops. Sorry.

@toivoh
Copy link
Contributor

toivoh commented Nov 21, 2013

I use ~ in PatternDispatch.jl to make two patterns match the same thing, e.g.

@pattern f(v ~ [x]) = (v,x)  # matches a single-element vector v, binding x to the element

But that usage is restricted to function signatures that are wrapped with the @pattern macro, so if x ~ y is replaced by @tilde(x, y) it shouldn't be a problem for me to adapt the parsing. I guess that the precedence would be the same?

When it comes to how the right @tilde macro would end up in the local scope, I see two possibilities:

  • Dataframes exports @tilde, so you get it with using Dataframes. Using x ~ y without Dataframes (or defining @tilde by some other means) produces an error.
  • Base exports @tilde and implements it to call a tilde function, that Dataframes can overload.

I would lean toward the former. The latter feels needlessly complex and too much like monkey patching.

@JeffBezanson
Copy link
Sponsor Member

I could parse this as Expr(:~, x, y), and then lower that by default to a macro call to @tilde or @~. Or it could just be parsed as a macro call to @~.

@toivoh
Copy link
Contributor

toivoh commented Nov 21, 2013

@JeffBezanson: When would it be lowered? After the AST has been converted from surface syntax?

@kmsquire
Copy link
Member

I would vote for parsing as Expr(:~, x, y), so that (I assume) non-macro uses are possible.

@JeffBezanson
Copy link
Sponsor Member

The macro expander would have to treat Expr(:~, x, y) as a macro call to @~ if it encountered such an expression (i.e. no macro transformed it first).

@toivoh
Copy link
Contributor

toivoh commented Nov 21, 2013

@kmsquire: Non-macro uses should be possible by defining your @tilde macro like e.g.

macro tilde(x,y)
    esc(:( tilde($x, $y) ))
end

which makes @tilde(x,y) reduce to tilde(x, y).

@kmsquire
Copy link
Member

Thanks, Toivoh and Jeff.

@johnmyleswhite
Copy link
Member Author

If people are happy with DataFrames being "in charge" of @tilde and then requiring that other tools like PatternMatching.jl operate at macro time, that seems like an alright solution. I'm a little worried that someone will want to change how it gets interpreted and end up pulling the whole thing down, but that's probably just paranoia.

@toivoh
Copy link
Contributor

toivoh commented Nov 23, 2013

Thinking about this a little more, Debug.jl would need to be able to expand a tilde expression with macroexpand. I think the simplest would be if ~ were just replaced with a @tilde invocation, then the AST could be handled just like now (by Debug.jl and others). I'm not sure what advantage it would bring to introduce a new Expr(:~, ...) type.

@johnmyleswhite
Copy link
Member Author

Sorry for being dense, but Debug.jl uses tilde internally, not doesn't export it? If not, how would I use both DataFrames and Debug?

@toivoh
Copy link
Contributor

toivoh commented Nov 23, 2013

No problem, I realize that I wasn't very clear. Debug.jl doesn't use tilde at all, but I want it to be able to debug code that does. To do that, it has to expand all macros in instrumented code to make sure that it doesn't miss any variable declarations or possible trap points. I guess that the DataFrames definition of @tilde would not generate either, but if some other package defines @tilde differently, it might. It's a corner case, but I hate to leave gotchas in my code that I know about.

@johnmyleswhite
Copy link
Member Author

Thanks for helping me understand. I guess my original concern that people may step on each other's toes still holds, but I'd rather we move forward with something like @tilde than do nothing. It really will make the stats code a lot more enjoyable to write.

@ghost ghost assigned JeffBezanson Nov 25, 2013
@JeffBezanson
Copy link
Sponsor Member

I can add this easily.
What should the associativity be? Should x~y~z be ~(~(x,y),z), ~(x,~(y,z)), or ~(x,y,z)?

@dmbates
Copy link
Member

dmbates commented Jan 10, 2014

In model formulas there are very few cases of multiple tildes and I don't think the issue would arise. I found a use for a y ~ f(x) ~ A + b syntax in R once but I wouldn't design that code the same way in Julia.

If a decision is needed I would vote for x ~ y ~ z being equivalent to ~(x,y,z).

@Keno
Copy link
Member

Keno commented Jan 10, 2014

What are we gonna do with the current use of ~ (boolean negation)?

@JeffBezanson
Copy link
Sponsor Member

It will stay the same.

@StefanKarpinski
Copy link
Sponsor Member

How a out varargs parsing instead of associative binary?

@JeffBezanson
Copy link
Sponsor Member

That's the third option above.

@johnmyleswhite
Copy link
Member Author

Sweet! Thank you!

-- John

On Jan 10, 2014, at 5:25 PM, Jeff Bezanson notifications@github.com wrote:

Closed #4882 via a007350.


Reply to this email directly or view it on GitHub.

@StefanKarpinski
Copy link
Sponsor Member

Oh, right. That then.

@cdsousa
Copy link
Contributor

cdsousa commented Feb 7, 2014

Is this feature somehow reserved? Is it documented?
Is the "overloading" of this macro acceptable for uses other than to create Formula objects, when the DataFrames package is not used?
E.g.,

macro ~(d,k)
    :($d[$(Meta.quot(k))])
end

> mydict = [:x => 123, :y => 456]
> mydict~x
123

I guess the answer is that it must be reserved, but I would like to be sure.

@JeffBezanson
Copy link
Sponsor Member

No, it's not reserved for DataFrames. You simply get whatever definition of @~ is visible.

@johnmyleswhite
Copy link
Member Author

That said, if you use this in a different way, you should probably advertise that your package is not compatible with DataFrames, since it would break things like GLM.

Having worked with this for a while, I think it would be reasonable to have it always return a type that behaves like Formula, which is effectively just a sequence of two-quoted expressions. Then you can easily allow different functions to use multiple dispatch to give different semantics to that Formula type.

@cdsousa
Copy link
Contributor

cdsousa commented Feb 7, 2014

Thanks for the answers. I'm not planning to use it in any way, that was just to clarify my view of the language :) Thanks.

@tkelman
Copy link
Contributor

tkelman commented Jun 6, 2015

Is there a good reason this couldn't have been done as @glm(y ~ x) from the beginning? Macro parsing of ~ is a pretty fishy special case to have hiding in the language IMO.

@ScottPJones
Copy link
Contributor

@tkelman, I brought this up also... and was shot down... but I still think this deserves a breaking change to stick the ~ in a macro precisely as you described instead of it being a special case macro.

@johnmyleswhite
Copy link
Member Author

@tkelman, I would support getting rid of the specialized parsing of ~ in a future Julia release.

@tkelman
Copy link
Contributor

tkelman commented Jun 6, 2015

That is good to know, thanks. Is DataFrames the only package that currently has an implementation of the @~ macro? Looking into it a bit, it looks like you actually want something that creates a Formula type which various other functions like lm would operate on, so it may be better in the end to have a dedicated macro that outputs a formula object rather than changing all the fitting routines into macros. Would need to learn more about how it all works currently.

@johnmyleswhite
Copy link
Member Author

I think every package that does linear regression uses that notation, so it likely affects MixedModels and NLReg as well.

@andyferris
Copy link
Member

I just today found out about this is oddball feature. Have people thought about the future of this, lately?

I wonder if we could have @~ defined in base to go to some overloadable function, so it can be shared between many packages?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jul 1, 2016

@~ is an overloadable function, there's just not much useful for it to dispatch on, so realistically you can only import it from one package at a time.

@andyferris
Copy link
Member

Hmm... OK I was just speculating.

For me - it would be nice to have something both aware of expressions and of the types of things around it. Or have general infix macros, or something.

Otherwise, as a single special case, this seems rather unlike the rest of the language.

@andyferris
Copy link
Member

For me - it would be nice to have something both aware of expressions and of the types of things around it.

E.g. This might be something like a generator: MyType(a ~ b) might do something while YourType(a ~ b) might mean something quite different.

@tkelman
Copy link
Contributor

tkelman commented Jul 1, 2016

We're planning on getting rid of this for 0.6. It requires preparing a more julian implementation of the formula dsl in juliastats packages.

@andyferris
Copy link
Member

Ok thanks for the update, Tony!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:breaking This change will break code needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests