csv merge #1121

llllllllll · 2015-06-10T19:44:01Z

closes: #1119

This pr also allows you to change the suffixes of the left and right columns in a Join similar to pandas.merge

llllllllll · 2015-06-10T20:08:49Z

All tests passing locally, I will rerun

cpcloud · 2015-06-10T20:36:53Z

hm what was the issue?

llllllllll · 2015-06-10T20:39:55Z

I am not sure, I think travis may have just had a hiccup because the results were not loading.

cpcloud · 2015-06-11T00:50:20Z

can you add a test for merging two csv files?

llllllllll · 2015-06-11T16:14:08Z

blaze/compute/core.py

    expr2, d2 = swap_resources_into_scope(expr, d)
    if pre_compute_:
-        d3 = dict([(e, pre_compute_(expr2, dat, **kwargs))
+        d3 = dict([(e, pre_compute_(e, dat, **kwargs))


This is what was causing the wrong shape to be applied to the rest of the data types, can someone make sure that this is the correct fix

looking now

it looks like your fix is correct. before your fix the code is mapping each subexpression to the same pre_compute result (and it's actually recomputing expr2 every time). after your fix each subexpression in expr2 gets mapped to the correct pre_computed result.

i suspect this was never caught because in many cases e == expr2

any chance i could get you to kill those brackets around the comprehension?

Also allows Join to take a suffixes argument like pandas.

…ol to the rest

llllllllll · 2015-06-11T19:09:54Z

merge on success?

cpcloud · 2015-06-11T19:13:04Z

hm this is also happening here: https://github.com/quantopian/blaze/blob/csv-merge/blaze/compute/core.py#L170-L172, let me poke around for a bit

llllllllll · 2015-06-11T19:15:06Z

Other than the place I just changed, is there ever a case that passes in e instead of the top level expr?

cpcloud · 2015-06-11T19:17:10Z

not sure what you mean. passes in e to pre_compute_?

llllllllll · 2015-06-11T19:18:59Z

sorry, is there a place that passes the expr out of the scope vs the top level expr itself, basically a place that does what I changed it to do.

cpcloud · 2015-06-11T20:52:38Z

blaze/compute/core.py

    expr2, d2 = swap_resources_into_scope(expr, d)
    if pre_compute_:
-        d3 = dict([(e, pre_compute_(expr2, dat, **kwargs))


@mrocklin Are these the correct arguments to pre_compute? For expressions with multiple leaves like Join this does the wrong thing because it calls into with the dshape of the first leaf that it finds. According to the pipeline docs, pre_compute operates on the leaves of the expression, which is more inline with @llllllllll change here to pre_compute each expression in scope.

It looks like most cases of pre_compute don't care about the expression being passed in so changing the input expr to be the leaf would probably be safe. At this point we would probably want to just kill the expression as an input though.

i'd like to leave the expression in the input for now and consider removing in release 0.8.2. @llllllllll go ahead and change the other pre_compute_ call to be similar to this one

then merge on passing

csv merge

llllllllll closed this Jun 10, 2015

llllllllll reopened this Jun 10, 2015

cpcloud added the enhancement label Jun 11, 2015

cpcloud added this to the 0.8.1 milestone Jun 11, 2015

llllllllll reviewed Jun 11, 2015
View reviewed changes

Joe Jevnik added 9 commits June 11, 2015 12:29

ENH: Allow csv merging.

844f2e3

Also allows Join to take a suffixes argument like pandas.

TST: Adds tests for join suffixes

c54dd44

ENH: Adds join suffix support for sql

1552461

DOC: Adds the suffixes argument

a739682

MAINT: remove unneeded change to test_sql_compute.py

5282d1b

TST: Fix doctest and py2.6 compat

834a6dd

BUG: compute with dict of symbols would apply shape of the first symb…

3337517

…ol to the rest

TST: Adds a test for merging csv files

11a1248

STY: drop intermediate list

e3d4c6a

llllllllll force-pushed the csv-merge branch from 8795a01 to e3d4c6a Compare June 11, 2015 19:03

cpcloud reviewed Jun 11, 2015
View reviewed changes

BUG: Pass the leaf to pre_compute

8f720ee

llllllllll added a commit that referenced this pull request Jun 12, 2015

Merge pull request #1121 from quantopian/csv-merge

a062788

csv merge

llllllllll merged commit a062788 into blaze:master Jun 12, 2015

llllllllll deleted the csv-merge branch June 12, 2015 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv merge #1121

csv merge #1121

llllllllll commented Jun 10, 2015

llllllllll commented Jun 10, 2015

cpcloud commented Jun 10, 2015

llllllllll commented Jun 10, 2015

cpcloud commented Jun 11, 2015

llllllllll Jun 11, 2015

cpcloud Jun 11, 2015

cpcloud Jun 11, 2015

cpcloud Jun 11, 2015

cpcloud Jun 11, 2015

llllllllll Jun 11, 2015

llllllllll commented Jun 11, 2015

cpcloud commented Jun 11, 2015

llllllllll commented Jun 11, 2015

cpcloud commented Jun 11, 2015

llllllllll commented Jun 11, 2015

cpcloud Jun 11, 2015

mrocklin Jun 12, 2015

cpcloud Jun 12, 2015

cpcloud Jun 12, 2015

csv merge #1121

csv merge #1121

Conversation

llllllllll commented Jun 10, 2015

llllllllll commented Jun 10, 2015

cpcloud commented Jun 10, 2015

llllllllll commented Jun 10, 2015

cpcloud commented Jun 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llllllllll commented Jun 11, 2015

cpcloud commented Jun 11, 2015

llllllllll commented Jun 11, 2015

cpcloud commented Jun 11, 2015

llllllllll commented Jun 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment