New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv merge #1121

Merged
merged 10 commits into from Jun 12, 2015

Conversation

Projects
None yet
3 participants
@llllllllll
Member

llllllllll commented Jun 10, 2015

closes: #1119

This pr also allows you to change the suffixes of the left and right columns in a Join similar to pandas.merge

@llllllllll

This comment has been minimized.

Member

llllllllll commented Jun 10, 2015

All tests passing locally, I will rerun

@llllllllll llllllllll closed this Jun 10, 2015

@llllllllll llllllllll reopened this Jun 10, 2015

@cpcloud

This comment has been minimized.

Member

cpcloud commented Jun 10, 2015

hm what was the issue?

@llllllllll

This comment has been minimized.

Member

llllllllll commented Jun 10, 2015

I am not sure, I think travis may have just had a hiccup because the results were not loading.

@cpcloud cpcloud added the enhancement label Jun 11, 2015

@cpcloud cpcloud added this to the 0.8.1 milestone Jun 11, 2015

@cpcloud

This comment has been minimized.

Member

cpcloud commented Jun 11, 2015

can you add a test for merging two csv files?

expr2, d2 = swap_resources_into_scope(expr, d)
if pre_compute_:
d3 = dict([(e, pre_compute_(expr2, dat, **kwargs))
d3 = dict([(e, pre_compute_(e, dat, **kwargs))

This comment has been minimized.

@llllllllll

llllllllll Jun 11, 2015

Member

This is what was causing the wrong shape to be applied to the rest of the data types, can someone make sure that this is the correct fix

This comment has been minimized.

@cpcloud

cpcloud Jun 11, 2015

Member

looking now

This comment has been minimized.

@cpcloud

cpcloud Jun 11, 2015

Member

it looks like your fix is correct. before your fix the code is mapping each subexpression to the same pre_compute result (and it's actually recomputing expr2 every time). after your fix each subexpression in expr2 gets mapped to the correct pre_computed result.

This comment has been minimized.

@cpcloud

cpcloud Jun 11, 2015

Member

i suspect this was never caught because in many cases e == expr2

This comment has been minimized.

@cpcloud

cpcloud Jun 11, 2015

Member

any chance i could get you to kill those brackets around the comprehension?

This comment has been minimized.

@llllllllll

llllllllll Jun 11, 2015

Member

sure thing

@llllllllll llllllllll force-pushed the quantopian:csv-merge branch from 8795a01 to e3d4c6a Jun 11, 2015

@llllllllll

This comment has been minimized.

Member

llllllllll commented Jun 11, 2015

merge on success?

@cpcloud

This comment has been minimized.

Member

cpcloud commented Jun 11, 2015

hm this is also happening here: https://github.com/quantopian/blaze/blob/csv-merge/blaze/compute/core.py#L170-L172, let me poke around for a bit

@llllllllll

This comment has been minimized.

Member

llllllllll commented Jun 11, 2015

Other than the place I just changed, is there ever a case that passes in e instead of the top level expr?

@cpcloud

This comment has been minimized.

Member

cpcloud commented Jun 11, 2015

not sure what you mean. passes in e to pre_compute_?

@llllllllll

This comment has been minimized.

Member

llllllllll commented Jun 11, 2015

sorry, is there a place that passes the expr out of the scope vs the top level expr itself, basically a place that does what I changed it to do.

expr2, d2 = swap_resources_into_scope(expr, d)
if pre_compute_:
d3 = dict([(e, pre_compute_(expr2, dat, **kwargs))

This comment has been minimized.

@cpcloud

cpcloud Jun 11, 2015

Member

@mrocklin Are these the correct arguments to pre_compute? For expressions with multiple leaves like Join this does the wrong thing because it calls into with the dshape of the first leaf that it finds. According to the pipeline docs, pre_compute operates on the leaves of the expression, which is more inline with @llllllllll change here to pre_compute each expression in scope.

This comment has been minimized.

@mrocklin

mrocklin Jun 12, 2015

Member

It looks like most cases of pre_compute don't care about the expression being passed in so changing the input expr to be the leaf would probably be safe. At this point we would probably want to just kill the expression as an input though.

This comment has been minimized.

@cpcloud

cpcloud Jun 12, 2015

Member

i'd like to leave the expression in the input for now and consider removing in release 0.8.2. @llllllllll go ahead and change the other pre_compute_ call to be similar to this one

This comment has been minimized.

@cpcloud

cpcloud Jun 12, 2015

Member

then merge on passing

llllllllll added a commit that referenced this pull request Jun 12, 2015

@llllllllll llllllllll merged commit a062788 into blaze:master Jun 12, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@llllllllll llllllllll deleted the quantopian:csv-merge branch Jun 12, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment