Input from cluster memory #68

nils-braun · 2020-11-02T10:50:26Z

This PR allows to input published datasets from the cluster memory.

This means, if you have published a dataset, e.g. with a distributed client

client.publish_dataset(my_table=df)

you can now register it with

CREATE TABLE some_name WITH (location='my_table', format='memory')

and start querying it as normal.

…uster-memory

mrocklin · 2020-11-02T14:35:48Z

This is a nice intersection of an old Dask feature (published datasets) and the Dask-SQL context. I'm curious if there are things we should consider doing on the Dask side that we could learn from the context approach.

…

codecov-io · 2020-11-02T14:37:22Z

Codecov Report

Merging #68 into main will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main       #68   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           31        31           
  Lines         1196      1200    +4     
  Branches       156       157    +1     
=========================================
+ Hits          1196      1200    +4

Impacted Files	Coverage Δ
dask_sql/physical/rel/custom/create.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e22282...b354d62. Read the comment docs.

nils-braun · 2020-11-02T17:08:41Z

Thanks for commenting, @mrocklin (I am always amazed how fast you find out about what is going on in the ecosystem).

I am not sure if these things can be compared so easily because the context object is local to the process it is running on (or, if running a SQL server, still local to the application but can be controlled via SQL) and the published datasets are always shared among the full cluster.

nils-braun added 8 commits November 1, 2020 09:47

Create tables from cluster memory

3cb3c30

Merge remote-tracking branch 'origin/main' into feature/input-from-cl…

7eeea97

…uster-memory

Use pytest for all the tests instead of unittest

0bdc0d3

Merge branch 'feature/use-pytest' into feature/input-from-cluster-memory

a3fdf07

Add a test

307ec50

Faster tests and stylefix

3c7ac10

Merge branch 'feature/use-pytest' into feature/input-from-cluster-memory

ce858fe

Merge remote-tracking branch 'origin/main' into feature/input-from-cl…

f1d61cf

…uster-memory

Information and docs on data loading

b354d62

nils-braun merged commit 194e56f into main Nov 3, 2020

nils-braun deleted the feature/input-from-cluster-memory branch November 3, 2020 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input from cluster memory #68

Input from cluster memory #68

nils-braun commented Nov 2, 2020

mrocklin commented Nov 2, 2020 via email

codecov-io commented Nov 2, 2020 •

edited

nils-braun commented Nov 2, 2020

Input from cluster memory #68

Input from cluster memory #68

Conversation

nils-braun commented Nov 2, 2020

mrocklin commented Nov 2, 2020 via email

codecov-io commented Nov 2, 2020 • edited

Codecov Report

nils-braun commented Nov 2, 2020

codecov-io commented Nov 2, 2020 •

edited