Join type promotion #1218

llllllllll · 2015-08-27T18:33:17Z

edit: this pr also does generic type promotion of joined fields.

This also fixes an issue that could occur when joining on a single column name passed as a string when other column names were substrings of the join column.

cpcloud · 2015-08-27T20:07:57Z

blaze/expr/collections.py

+        on_left = self.on_left
+        on_right = self.on_right
+
+        right_params = self.rhs.schema[0].parameters[0]


you can do this with self.rhs.measure.fields, which IMO is a bit more clear

there doesn't always appear to be a measure attribute on this type.

hm weird, is that the cause of the build failure?

not sure, I just installed pyspark to run this locally; however, I am running into new issues when trying to extend this to handle the join of int32 and int64.

I can ping you when I get something working.

Ok. I'd install spark 1.3 for now, because I haven't worked up a PR to add support for 1.4 yet.

cpcloud · 2015-08-27T20:10:15Z

blaze/expr/collections.py

+            [name, extract_option(dt)
+             if isinstance(dt, Option) and
+             not isinstance(right_params[n], Option) else dt]
+            for n, (name, dt) in enumerate(self.lhs.schema[0].parameters[0])


same here, I think this should be self.lhs.measure.fields

llllllllll · 2015-08-27T20:57:45Z

The build is going to fail until conda gets updated with the datashape pr. Getting 1 error locally where the order of columns are flipped. test_graph_double_join in the python compute tests. Not sure what is causing this. If you have a chance, some extra eyes would be apreciated @cpcloud.

edit: After walking through the test manaully, I believe that it was making an incorrect assertion. I updated the test and added a comment with some intermediate step to make it easier for people to validate this later.

llllllllll · 2015-08-27T22:22:10Z

Should be ready to go now.

cpcloud · 2015-08-27T22:29:08Z

blaze/compute/tests/test_python_compute.py

+                       ('A', 1, 5),
+                       ('F', 6, 1),
+                       ('F', 6, 2),
+                       ('F', 6, 4)])


was this just wrong before?

ah i see your comment

where is the line that fixes the column ordering here?

I actually did not intend on fixing this, it sort of fell out of the other changes. That is why I walked through the join manually to assure myself I still had the correct answer. I think what fixed this were the checks on collections.py:446 and 450. This is because we were doing an inclusion check on a field name against a string and 'a' is in 'name' but should not have been selected.

Ah makes sense

cpcloud · 2015-09-01T19:00:48Z

blaze/expr/collections.py

-                 if name not in self.on_right]
+        right_types = keymap(
+            dict(zip(on_right, on_left)).get,
+            dict(self.rhs.schema[0].parameters[0]),


was it not possible to use self.rhs.measure here?

no, something didn't have a measure attribute, not sure what the types were here

just tried this, you can do self.rhs.dshape.measure.dict and remove the dict() call around it

cpcloud · 2015-09-01T21:44:43Z

blaze/compute/tests/test_python_compute.py

+    #  [3, C, 5],
+    #  [6, F, 1],
+    #  [6, F, 2],
+    #  [6, F, 4]]


why is the joined key all the way over to the left? is that just how the join dshape pops out?

when we construct the pairs we emit joined + left + right

Also, I the shape of the measure in the join puts the joined keys first

cpcloud · 2015-09-04T17:46:37Z

blaze/expr/collections.py

+    right_types = listpack(types_of_fields(on_right, rhs))
+    if len(left_types) != len(right_types):
+        raise ValueError(
+            'Length of on_left=%d not equal to lenght of on_right=%d' % (


small typo lenght should be length

Join type promotion

ENH: Allows join to join from an optional to non-optional

4eab599

llllllllll force-pushed the join-option-types branch from af6a0a6 to 6b5f4af Compare August 27, 2015 18:34

DOC: add nullable join to whatsnew

6b5f4af

cpcloud reviewed Aug 27, 2015
View reviewed changes

llllllllll added the wip label Aug 27, 2015

cpcloud reviewed Aug 27, 2015
View reviewed changes

llllllllll force-pushed the join-option-types branch from 70cd992 to c083b79 Compare August 27, 2015 20:43

ENH: Join promotes types

c083b79

llllllllll added 3 commits August 27, 2015 17:55

TST: incorrect test for python join

ce484ac

ENH: allows join to promote types.

be0aaa1

DOC: whatsnew for join promotion

5de6db3

llllllllll changed the title ~~Join option types~~ Join type promotion Aug 27, 2015

llllllllll removed the wip label Aug 27, 2015

cpcloud reviewed Aug 27, 2015
View reviewed changes

cpcloud added the bug label Aug 27, 2015

cpcloud added this to the 0.8.3 milestone Aug 27, 2015

BUG: handle the case where the keys are in a different order

463edd6

llllllllll force-pushed the join-option-types branch from d0397d1 to 6894ac4 Compare August 28, 2015 16:49

MAINT: remove unneeded lambda

6894ac4

llllllllll force-pushed the join-option-types branch 2 times, most recently from f8b46df to 7171cf3 Compare September 1, 2015 18:05

llllllllll added 2 commits September 1, 2015 14:05

BUG: pass the correct args to pair_assemble

7171cf3

ENH: make pair_assemble on_left and on_right optional

b295a6b

cpcloud reviewed Sep 1, 2015
View reviewed changes

MAINT: access the measure attr instead of schema.paramaters

a454af8

cpcloud reviewed Sep 1, 2015
View reviewed changes

cpcloud reviewed Sep 4, 2015
View reviewed changes

llllllllll added 4 commits September 4, 2015 13:48

MAINT: typo in error

5727856

ENH: all exceptions are valueerror

18086dc

ENH: test computation of type promotion

06ac016

TST: update tests

5607b92

llllllllll added a commit that referenced this pull request Sep 4, 2015

Merge pull request #1218 from quantopian/join-option-types

1f10384

Join type promotion

llllllllll merged commit 1f10384 into blaze:master Sep 4, 2015

llllllllll deleted the join-option-types branch September 4, 2015 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Join type promotion #1218

Join type promotion #1218

llllllllll commented Aug 27, 2015

cpcloud Aug 27, 2015

llllllllll Aug 27, 2015

cpcloud Aug 27, 2015

llllllllll Aug 27, 2015

llllllllll Aug 27, 2015

cpcloud Aug 27, 2015

cpcloud Aug 27, 2015

llllllllll commented Aug 27, 2015

llllllllll commented Aug 27, 2015

cpcloud Aug 27, 2015

cpcloud Aug 27, 2015

cpcloud Sep 3, 2015

llllllllll Sep 3, 2015

cpcloud Sep 4, 2015

cpcloud Sep 1, 2015

llllllllll Sep 1, 2015

cpcloud Sep 1, 2015

cpcloud Sep 1, 2015

llllllllll Sep 1, 2015

llllllllll Sep 1, 2015

cpcloud Sep 4, 2015

Join type promotion #1218

Join type promotion #1218

Conversation

llllllllll commented Aug 27, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llllllllll commented Aug 27, 2015

llllllllll commented Aug 27, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment