New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coerce expression #1137
Coerce expression #1137
Changes from 7 commits
642d48f
8b70f87
6995f92
b70d389
6913fde
d86b590
f042885
6f31e02
79b2d57
bb66bb8
a5f595c
f114eb2
b577e50
477ddb0
0866811
2dbcfa9
6798096
397e94f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,7 +22,7 @@ | |
__all__ = ['Expr', 'ElemWise', 'Field', 'Symbol', 'discover', 'Projection', | ||
'projection', 'Selection', 'selection', 'Label', 'label', 'Map', | ||
'ReLabel', 'relabel', 'Apply', 'Slice', 'shape', 'ndim', 'label', | ||
'symbol'] | ||
'symbol', 'Coerce', 'coerce'] | ||
|
||
|
||
_attr_cache = dict() | ||
|
@@ -705,6 +705,21 @@ def dshape(self): | |
return dshape(self._dshape) | ||
|
||
|
||
class Coerce(Expr): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we add a docstring here with some examples? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep will do |
||
__slots__ = '_hash', '_child', 'to' | ||
|
||
@property | ||
def schema(self): | ||
return self.to | ||
|
||
@property | ||
def dshape(self): | ||
return DataShape(*(self._child.shape + (self.schema,))) | ||
|
||
def __str__(self): | ||
return '%s.coerce(to=%r)' % (self._child, str(self.schema)) | ||
|
||
|
||
def apply(expr, func, dshape, splittable=False): | ||
return Apply(expr, func, datashape.dshape(dshape), splittable) | ||
|
||
|
@@ -757,13 +772,18 @@ def ndim(expr): | |
return len(shape(expr)) | ||
|
||
|
||
def coerce(expr, to): | ||
return Coerce(expr, dshape(to) if isinstance(to, _strtypes) else to) | ||
|
||
|
||
dshape_method_list.extend([ | ||
(lambda ds: True, set([apply])), | ||
(iscollection, set([shape, ndim])), | ||
(lambda ds: iscollection(ds) and isscalar(ds.measure), set([coerce])) | ||
]) | ||
|
||
schema_method_list.extend([ | ||
(isscalar, set([label, relabel])), | ||
(isscalar, set([label, relabel, coerce])), | ||
(isrecord, set([relabel])), | ||
]) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ | |
|
||
import pytest | ||
|
||
from datashape import dshape, var, datetime_ | ||
from datashape import dshape, var, datetime_, float32 | ||
|
||
from blaze.expr import symbol, label, Field | ||
|
||
|
@@ -136,3 +136,15 @@ def test_hash_to_different_values(): | |
from blaze.expr.expressions import _attr_cache | ||
assert (expr, '_and') in _attr_cache | ||
assert (expr2, '_and') in _attr_cache | ||
|
||
|
||
def test_coerce(): | ||
s = symbol('s', var * float32) | ||
assert str(s.coerce('int64')) == "s.coerce(to='int64')" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we check the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point |
||
|
||
|
||
@pytest.mark.xfail(raises=AttributeError, reason='Should this be valid?') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the question here if we should be able to cast down from float64 to float32? I think the idea of casting records like this is pretty cool. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The question is: Is casting an entire set of rows on a subset of columns (possibly all the columns as well) useful? How often is it the case that you want to change the type of a set of columns? I honestly don't know. I know that I do this in one particular case when I'm shoving a DataFrame that contains strings into a bcolz ctable in Python3.4. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be nice because there are some places that I have tz-aware datetimes in dataframes so the type is object and I would like to coerce all of the columns to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the case of tz-aware datetimes they are cast to object because pandas doesn't have a way to manage a contiguous array of tz-aware datetimes (yet). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh actually, I see what you mean ... they might be inferred as object, but you want whatever you're sending them to to see that they are datetimes |
||
def test_coerce_record(): | ||
s = symbol('s', 'var * {a: int64, b: float64}') | ||
expr = s.coerce('{a: float64, b: float32}') | ||
assert str(expr) == "s.coerce(to='{a: float64, b: float32}')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the
coerce
function be available as a standalone function from the top level package or do you think this is most useful as a methodThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll just leave it as a method for now and remove it from
__all__