Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce on GroupedDTable of DTable of DataFrames returns NamedTuple #65

Open
schlichtanders opened this issue Mar 7, 2024 · 0 comments

Comments

@schlichtanders
Copy link

I think it should be returning a DataFrame, preserving the inner type

here an example

using Distributed
# add two further julia processes which could run on other machines
addprocs(2, exeflags="--threads=2")
# Distributed.@everywhere execute code on all machines
@everywhere using Dagger  # needed for all_processors
# Dagger uses both Threads and Machines as processes
Dagger.all_processors()

using DTables, DataFrames, CSV

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
files = [url, url, url, url, url]

d = DTable(DataFrame  CSV.File  download, files)
g = DTables.groupby(d, :species)
r = reduce(+, g, cols=[:sepal_width])
fetch(r)
# returns
# (species = String15["virginica", "setosa", "versicolor"], result_sepal_width = [743.5, 856.9999999999998, 692.4999999999995])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant