-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coalesce and cast #1409
coalesce and cast #1409
Conversation
3e008de
to
d290fc6
Compare
6e0ca18
to
0b6e761
Compare
ping @kwmsmith I would like someone to look this one over because I am adding some new expressions. |
freshly rebased against master |
@kwmsmith I would like to merge this soon |
If you don't have any comments I plan on merging by the end of tomorrow |
class Cast(Expr): | ||
"""Cast an expression to a different type. | ||
|
||
This is only an expression time operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a brief Examples
section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@llllllllll Apologies for taking so long to get to this -- I've been pinned down on a number of internal things lately. This looks good to me; my only question is on the names; new users may be confused on the distinction between Because of the possibly confusing or non-obvious names, it's important that we have good docstrings with good examples. |
No worries, I just wanted to make sure didn't get forgotten. I completely agree about the naming and documentation issue. I will go through and try to add some good examples and maybe some prose docs about the difference between cast and coerce. If you have an alternative name for |
Taking a page from Numpy / pandas, which have |
numpy astype actually does conversions though, the equivalent in numpy is |
For bz.compute(df.int_column.cast('float32')) Since |
I guess it is closer to view which is a C style cast. Really this is just a type level function for allowing us to use different expressions when blaze gets the type wrong. The use case I have for this is when there are bugs in the blaze type system I need to be able to fix them in my app code before I have time to fix them in blaze. For example, reductions over var returning A instead of ?A |
Yes, I totally understand the usecase for correcting blaze / datashape's option / non option types. That makes sense. It's for the other cases when the user tries to cast from integer to floats, for example, and it has no effect in the computed results that gets more confusing. Is dealing with option types the only usecase here, or are there others? Because my preference would be to keep this limited to handling casts between option / non-option types of the same underlying type for now, and only expand it to more general casts if there's a clear usecase for that. |
The use case is for any time that the blaze type system is incorrect. This was happening before with reductions on timedeltas giving back floats. This was corrected but but I would like to have the freedom to work around this class of issue in the future. |
I'm OK with this for the option / non-option case, of course, and I might envision other cases as well ( My preference would be to only allow I'm OK with merging this as long as we're able to restrict the cases for |
Cast is a type level function that reinterpretes a node as a different type. This is like c cast, or c++ reinterpret_cast. This allows users to correct mistakes in the blaze type system. Coalesce is a binary operator much like sql coalesce. It picks the first non null element.
updates tests for coalesce
Added more tests, including some xfail tests around promotion of mixed shaped records which I think should work in the future. @kwmsmith tests are passing |
def test_coalesce(sqla): | ||
t = symbol('t', discover(sqla)) | ||
assert ( | ||
odo(compute(coalesce(t.B, -1), {t: sqla}), list) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency, we've been using compute(expr, sql_data, return_type=list)
for simple cases like this rather than an explicit odo
call.
@llllllllll just some consistency nits, otherwise +1. As I mentioned above, in the future, we may need to restrict Also, please make a separate issue to add sections for |
I like the |
LGTM. |
Thanks for looking at this |
merged on cli |
cast
is needed when blaze is incorrect about the types of an expression. For example, the case I opened earlier about reductions ofvar * A
being typedA
instead of?A
when the collection could be empty. To make it easy to work around this we should be able to explicitly overwrite the type.coalesce
is useful for a lot of queries.