reduce on GroupedDTable of DTable of DataFrames returns NamedTuple #65

schlichtanders · 2024-03-07T14:14:50Z

I think it should be returning a DataFrame, preserving the inner type

here an example

using Distributed
# add two further julia processes which could run on other machines
addprocs(2, exeflags="--threads=2")
# Distributed.@everywhere execute code on all machines
@everywhere using Dagger  # needed for all_processors
# Dagger uses both Threads and Machines as processes
Dagger.all_processors()

using DTables, DataFrames, CSV

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
files = [url, url, url, url, url]

d = DTable(DataFrame ∘ CSV.File ∘ download, files)
g = DTables.groupby(d, :species)
r = reduce(+, g, cols=[:sepal_width])
fetch(r)
# returns
# (species = String15["virginica", "setosa", "versicolor"], result_sepal_width = [743.5, 856.9999999999998, 692.4999999999995])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce on GroupedDTable of DTable of DataFrames returns NamedTuple #65

reduce on GroupedDTable of DTable of DataFrames returns NamedTuple #65

schlichtanders commented Mar 7, 2024

reduce on GroupedDTable of DTable of DataFrames returns NamedTuple #65

reduce on GroupedDTable of DTable of DataFrames returns NamedTuple #65

Comments

schlichtanders commented Mar 7, 2024