Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map no longer relies on inference #137

Merged
merged 6 commits into from Mar 26, 2018
Merged

map no longer relies on inference #137

merged 6 commits into from Mar 26, 2018

Conversation

piever
Copy link
Collaborator

@piever piever commented Mar 20, 2018

I've addressed the last issues from #135 (allowing the user to give a mix of NamedTuples with different fields that gets stored as Vector{NamedTuples}) and update map to use the new collect_columns.

@piever
Copy link
Collaborator Author

piever commented Mar 21, 2018

Note that with the new collect_columns all sorts of new lazy iterator techniques become possible. For example

IndexedTables._map(i ->@NT(sum =  i.x+i.y, diff = i.x-i.y), Iterators.filter(f, t))

(we should maybe rename _map to map_rows and export it).

or equivalently

collect_columns(@NT(sum =  i.x+i.y, diff = i.x-i.y) for i in t if f(i))

which is a nice Columns comprehension. We may even use [...] or some variation of it as a short-hand for that, though I'm not 100% sure which method would need to be overloaded for that.

Pinging @davidanthoff on this one as I think it was one of his ideas to start implementing this kind of syntax.

@piever
Copy link
Collaborator Author

piever commented Mar 21, 2018

Even better, we could consider defining table(iterable; kwargs...) as table(collect_columns(iterable); kwargs...) but we should figure out the best way to do so without clashing with the IterableTables definitions.

@shashi
Copy link
Collaborator

shashi commented Mar 22, 2018

What happens when table is called with a tuple or named tuple of vectors then? Tuples are iterable too

@davidanthoff
Copy link
Contributor

Hehe, this is starting to look more and more like Query.jl ;) Which really is nothing other than chained lazy iterators for everything (at least for the one backend we have right now).

About the table function: if you look at the current implementation, I think the call to TableTraitsUtils.create_columns_from_iterabletable is probably pretty similar to collect_columns? With the key difference of course that my implementation requires eltype, whereas yours doesn't. I'll have to change my implementation eventually to mimic yours when I will purge the dependency on inference in Query.jl (and if you don't mind I will probably just copy your code from here).

I think you should be able right now to just replace the whole function body with the following and it probably will just work with the whole iterable tables machinery:

function table(iter; kwargs...)
    return table(collect_columns(IteratorInterfaceExtensions.getiterator(iter)); kwargs...)
end

I haven't tried that, but I think the designs are essentially equivalent, so I'm optimistic it will just work. The motivation for the getiterator indirection is described here.

Things might get a bit more complicated if IndexedTables moves over to Missing. My best guess is still that julia 1.0 won't fix all the issues with Missing, and if that is the case I will keep the iterable tables story on DataValue so that Query.jl continues to work. In that case you would have to handle the case where a source has DataValue columns and do some conversions. Probably not really tricky, just something to keep in mind.

I'm also in the process of adding two more interfaces to TableTraits.jl. Both of them are described in http://www.david-anthoff.com/jl4ds/latest/tabletraits.html. For now they should make certain scenarios much faster (initially probably mostly the file IO story, but we'll have to see). Eventually it might make sense to add support for those to the table function, but that should also be really easy and straightforward.

@piever
Copy link
Collaborator Author

piever commented Mar 22, 2018

@shashi the method to collect an iterable would be the one with the lowest dispatch preference, so it won't affect other methods. We may even consider whether we want a Columns(iterable) as well.

@davidanthoff the two advantages I see for the change are:

  • no need for eltype
  • we would accept also iterables of Tuples

Nice job on the column based interface, we should definitely support it, it seems very useful also for IndexedTables to DataFrames conversion.

Feel free to copy paste this code (or change as needed) for IterableTables.

@piever
Copy link
Collaborator Author

piever commented Mar 22, 2018

I've added the map_rows function, which can take more than one argument after the function:

julia> map_rows(tuple, 1:3, ["a","b","c"])
3-element Columns{Tuple{Int64,String}}:
 (1, "a")
 (2, "b")
 (3, "c")

I think it may come in handy later for join.

Should be good to go on my side, I'll do the table iterator change in a separate PR.

@piever
Copy link
Collaborator Author

piever commented Mar 22, 2018

@davidanthoff For the table construction from iterable, can I consider IteratorInterfaceExtensions stable and use isiterable?

@shashi shashi merged commit ab19fa3 into master Mar 26, 2018
@shashi shashi deleted the pv/noinf branch March 26, 2018 07:08
@shashi
Copy link
Collaborator

shashi commented Mar 26, 2018

🚀

@davidanthoff
Copy link
Contributor

Oh, missed this question. Yes, IteratorInterfaceExtensions.jl is stable, I don't see any changes for that coming.

@piever piever mentioned this pull request Apr 5, 2018
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants