-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Property interface via macros @Select
and @Compute
#39
Conversation
One can pick more than one column with `getproperties`. The output type is configurable - defaulting to NamedTuple. This begins to form the basis of a "properties" interface. We will need a few more convenience functions yet, like a generic "select" that is friendly for columnar storage.
The relevant details are more visible in the second commit |
2 similar comments
Codecov Report
@@ Coverage Diff @@
## master #39 +/- ##
==========================================
+ Coverage 60.12% 61.58% +1.45%
==========================================
Files 5 6 +1
Lines 311 479 +168
==========================================
+ Hits 187 295 +108
- Misses 124 184 +60
Continue to review full report at Codecov.
|
765597f
to
32d05e6
Compare
32d05e6
to
4bee895
Compare
@select
and @calc
@select
and @compute
OK, I've now implemented automatic wrapping and unwrapping of functions in the What this means is that AcceleratedArrays.jl accelerations will now fire by default. For example filter(@compute(isequal($a, 100)), table)
@select(a, b, isless($a, $b)).(table)
findall(@compute($position ∈ Sphere(centre, radius)), pointcloud) may all potentially use secondary acceleration indices. (Which means once the docs for this is done I can can get back to finishing the implemenation of accelerations for What this means is that AcceleratedArrays.jl accelerations will now fire by default. For example filter(@compute(isless($a, 100)), table)
@select(a, b, isequal($a, $b)).(table)
findall(@compute($position ∈ Sphere(centre, radius)), pointcloud) may all potentially use secondary acceleration indices. (Which means once the docs for this is done I can can get back to finishing the implemenation of accelerations for |
Of course the worst thing about this PR is the clash with |
|
Oh, this is very interesting, I like it. The use of One thing I've been finding slightly frustrating is the syntax of What you've got here looks tantalizingly like it could almost solve those problems, is this is part of the plan? I must admit I'm not super keen on |
|
Thanks for the feedback. I too am happier with Regarding
Finally... what's the "plan" for joins? OK, here's a wide-open space. We could use some keyword arguments to clean it up. We could defer to Query.jl macros (or similar) to call But I do have one secret dream. I like the idea of using the "full" power of relational algebra. Let me paint a picture. Imagine a @Relate($a == $b) This set is infinte but can be joined on one or more tables with columns Users could put together powerful queries like this table ⨝ @Relate($a > 100) # filter for table.a .> 100
table ⨝ @Relate($a == $b) ⨝ table2 # Join table1 with table2 matching the table1.a column with the table.b column
pointcloud ⨝ polygon # return points inside polygon There is also a dual operator to Not sure how practical any of that would be to actually use, but damn, things like |
Cool well stated. That's a lot to think about.
True, I think
It would be beautiful to be able to write this. Though I'm not sure about some annoying practicalities like two tables having the same column names for different things etc. Having the macro does allow some insight into the expression as presumably required for accelerating the join. |
Yeah, this is the bit I was referrering to - the user probably needs to buy into natural joins before developing the data model.
Exactly. Introspection of |
Oh right. So columns need always to have consistent names in tables to be joined. I feel like this would simply be too inflexible to be practical, even if you name all columns with a prefix of the entity name (so that foreign keys can naturally match the key in the source table). I wonder whether there's some middle ground of partially automated column renaming which would make this work neatly. |
To me, it would be ideal if the join operator |
@select
and @compute
@Select
and @Compute
Let's try this out on master. Feedback from users very welcome. |
OK, here is a preview of my solution to the properties interface. This replaces #38 and I think this branch will remove the plural-getproperties stuff.
For user-facing tools, there are macros
@Select
and@calc
@Compute
which return functions. That's right - these aren't direct operations, let's call them "higher-order macros" :)They are designed to act on any container that support
getproperty
. The@compute
macro is more-or-less convenient syntax for building a simple anonymous function. You use$
to indicate any input property and all other parts of the expression are evaluated as written.I'd like to think of a better name forIn the backend this creates a@calc
, so ideas very welcome.Compute
object which is a type ofFunction
that knows what property names it requires (useful info for columnar-storage optimizations,still WIPnow done).The
@Select
macro returns an object with a number of properties, possibly simply replicated, and sometimes they are calculations of their own. Here's a preview:Generally it's a
name = function_expression
pair but you can just nominate a symbol to replicate. This creates aSelect
object which is a type ofFunction
that generally containsGetProperty
orCompute
objects (again, column names are known forpotentialimplemented columnar-storage optimizations).This PR does contain columnar-storage optimizations for
GetProperties
in thecolumnops.jl
file, from #38 as well asCompute
andSelect
(we automatically pre-project tables so that iteration works on fewer columns). I don't think we'll needgetproperties
for anything in the end, so I will probablydeletenot export that. But right now I gotta go to bed.Now - how to use on a
Table
? Well, you have two options, you can manipulate the table directly, as in@Select(...)(t)
, which performs a transformation on columns as entire arrays. Or you can broadcast this over the rows, as in@Select(...).(t)
ormap(@Select(...), t)
, and the result can be globbed back into a table (the former is still WIP).cc @quinnj compared to what I see in TableOperations.jl, I see this as being more generic/fundamental about properties rather than tables, but still preserving the information critical for columnar-based storage optimizations.
Todos:
mapreduce
, and maybe tofilter
(andfindall
).@select
not clash with Query.jl (renamed to@Select
)