-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csv merge #1121
csv merge #1121
Conversation
All tests passing locally, I will rerun |
hm what was the issue? |
I am not sure, I think travis may have just had a hiccup because the results were not loading. |
can you add a test for merging two csv files? |
expr2, d2 = swap_resources_into_scope(expr, d) | ||
if pre_compute_: | ||
d3 = dict([(e, pre_compute_(expr2, dat, **kwargs)) | ||
d3 = dict([(e, pre_compute_(e, dat, **kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what was causing the wrong shape to be applied to the rest of the data types, can someone make sure that this is the correct fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like your fix is correct. before your fix the code is mapping each subexpression to the same pre_compute
result (and it's actually recomputing expr2
every time). after your fix each subexpression in expr2
gets mapped to the correct pre_compute
d result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suspect this was never caught because in many cases e == expr2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance i could get you to kill those brackets around the comprehension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing
Also allows Join to take a suffixes argument like pandas.
merge on success? |
hm this is also happening here: https://github.com/quantopian/blaze/blob/csv-merge/blaze/compute/core.py#L170-L172, let me poke around for a bit |
Other than the place I just changed, is there ever a case that passes in |
not sure what you mean. passes in |
sorry, is there a place that passes the expr out of the scope vs the top level expr itself, basically a place that does what I changed it to do. |
expr2, d2 = swap_resources_into_scope(expr, d) | ||
if pre_compute_: | ||
d3 = dict([(e, pre_compute_(expr2, dat, **kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrocklin Are these the correct arguments to pre_compute
? For expressions with multiple leaves like Join
this does the wrong thing because it calls into
with the dshape of the first leaf that it finds. According to the pipeline docs, pre_compute
operates on the leaves of the expression, which is more inline with @llllllllll change here to pre_compute
each expression in scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like most cases of pre_compute
don't care about the expression being passed in so changing the input expr to be the leaf would probably be safe. At this point we would probably want to just kill the expression as an input though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd like to leave the expression in the input for now and consider removing in release 0.8.2. @llllllllll go ahead and change the other pre_compute_
call to be similar to this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then merge on passing
closes: #1119
This pr also allows you to change the suffixes of the left and right columns in a
Join
similar topandas.merge