-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formula should include : and * interactions #18
Comments
I'm curious about how to go about this. In the following it seems that
Doesn't this make it harder to use the |
Maybe we'll have to change operators. :: looks like it might work. So would & and %. Here's a list of operators ordered by precedence from julia-parser.scm: (define ops-by-prec
'#((= := += -= *= /= //= .//= .*= ./= |\\=| |.\\=| ^= .^= %= |\|=| &= $= => <<= >>= >>>= ~ |.+=| |.-=|)
(?)
(|\|\||)
(&&)
; note: there are some strange-looking things in here because
; the way the lexer works, every prefix of an operator must also
; be an operator.
(<- -- -->)
(> < >= <= == === != |.>| |.<| |.>=| |.<=| |.==| |.!=| |.=| |.!| |<:| |>:|)
(: |..|)
(+ - |.+| |.-| |\|| $)
(<< >> >>>)
(* / |./| % & |.*| |\\| |.\\|)
(// .//)
(^ |.^|)
(|::|)
(|.|))) |
(Tom, think you hit the close button by mistake! A bit of a GitHub UI quirk...) I concur. I think we should go with & instead of :. |
There is something to be said for supporting R's syntax: it's been around long enough for people to be familiar with it, and the Python people are starting to use it as well. Would this be possible if we instead parsed strings? As soon as I said that, though, it doesn't seem worth it. On the other hand, the number of operations we're talking about is pretty minimal, so people will just need to look up Julia's way of doing it. One direction I think would be cool: extend this notation to also include namespaces of features a la Vowpal Wabbit's sparse format. For example, if you have a sparse, bag-of-words representation for a text document, all of these features could be under the |
Yeah, I don't think a single-character change is a big deal here, and using Julia's parser seems a big enough win that I think we should stick with it. As for namespaces (cool -- I need to actually try VW out sometime!), we'd need a way to define them separate from the formula. Would we want to include something like "colname groups" in the DataFrame? So, you'd somehow define "dims" to be a colname group for "height", "width", and "depth", then you could use "dims" instead of a list of those three column names? That could be useful for other things too. df["dims"] becomes a shorthand for df[["height", "width", "depth"]], and df["predictors"] and df["response"] seem natural things to define, too. So you could then call |
No description provided.
The text was updated successfully, but these errors were encountered: