Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv merge #1121

Merged
merged 10 commits into from
Jun 12, 2015
Merged

csv merge #1121

merged 10 commits into from
Jun 12, 2015

Conversation

llllllllll
Copy link
Member

closes: #1119

This pr also allows you to change the suffixes of the left and right columns in a Join similar to pandas.merge

@llllllllll
Copy link
Member Author

All tests passing locally, I will rerun

@llllllllll llllllllll closed this Jun 10, 2015
@llllllllll llllllllll reopened this Jun 10, 2015
@cpcloud
Copy link
Member

cpcloud commented Jun 10, 2015

hm what was the issue?

@llllllllll
Copy link
Member Author

I am not sure, I think travis may have just had a hiccup because the results were not loading.

@cpcloud cpcloud added this to the 0.8.1 milestone Jun 11, 2015
@cpcloud
Copy link
Member

cpcloud commented Jun 11, 2015

can you add a test for merging two csv files?

expr2, d2 = swap_resources_into_scope(expr, d)
if pre_compute_:
d3 = dict([(e, pre_compute_(expr2, dat, **kwargs))
d3 = dict([(e, pre_compute_(e, dat, **kwargs))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what was causing the wrong shape to be applied to the rest of the data types, can someone make sure that this is the correct fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like your fix is correct. before your fix the code is mapping each subexpression to the same pre_compute result (and it's actually recomputing expr2 every time). after your fix each subexpression in expr2 gets mapped to the correct pre_computed result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suspect this was never caught because in many cases e == expr2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance i could get you to kill those brackets around the comprehension?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing

@llllllllll
Copy link
Member Author

merge on success?

@cpcloud
Copy link
Member

cpcloud commented Jun 11, 2015

hm this is also happening here: https://github.com/quantopian/blaze/blob/csv-merge/blaze/compute/core.py#L170-L172, let me poke around for a bit

@llllllllll
Copy link
Member Author

Other than the place I just changed, is there ever a case that passes in e instead of the top level expr?

@cpcloud
Copy link
Member

cpcloud commented Jun 11, 2015

not sure what you mean. passes in e to pre_compute_?

@llllllllll
Copy link
Member Author

sorry, is there a place that passes the expr out of the scope vs the top level expr itself, basically a place that does what I changed it to do.

expr2, d2 = swap_resources_into_scope(expr, d)
if pre_compute_:
d3 = dict([(e, pre_compute_(expr2, dat, **kwargs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin Are these the correct arguments to pre_compute? For expressions with multiple leaves like Join this does the wrong thing because it calls into with the dshape of the first leaf that it finds. According to the pipeline docs, pre_compute operates on the leaves of the expression, which is more inline with @llllllllll change here to pre_compute each expression in scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like most cases of pre_compute don't care about the expression being passed in so changing the input expr to be the leaf would probably be safe. At this point we would probably want to just kill the expression as an input though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like to leave the expression in the input for now and consider removing in release 0.8.2. @llllllllll go ahead and change the other pre_compute_ call to be similar to this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then merge on passing

llllllllll added a commit that referenced this pull request Jun 12, 2015
@llllllllll llllllllll merged commit a062788 into blaze:master Jun 12, 2015
@llllllllll llllllllll deleted the csv-merge branch June 12, 2015 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

join fails, but pandas merge is successful
3 participants