Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coerce expression #1137

Merged
merged 18 commits into from
Jun 25, 2015
Merged

Coerce expression #1137

merged 18 commits into from
Jun 25, 2015

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Jun 16, 2015

closes #1050

assert str(s.coerce('int64')) == "s.coerce(to='int64')"


@pytest.mark.xfail(raises=AttributeError, reason='Should this be valid?')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the question here if we should be able to cast down from float64 to float32? I think the idea of casting records like this is pretty cool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is: Is casting an entire set of rows on a subset of columns (possibly all the columns as well) useful? How often is it the case that you want to change the type of a set of columns? I honestly don't know. I know that I do this in one particular case when I'm shoving a DataFrame that contains strings into a bcolz ctable in Python3.4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice because there are some places that I have tz-aware datetimes in dataframes so the type is object and I would like to coerce all of the columns to datetime instead of object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of tz-aware datetimes they are cast to object because pandas doesn't have a way to manage a contiguous array of tz-aware datetimes (yet).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually, I see what you mean ... they might be inferred as object, but you want whatever you're sending them to to see that they are datetimes

@llllllllll
Copy link
Member

Can you add a series coerce?

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2015

yep, just haven't gotten to it yet

@@ -705,6 +705,21 @@ def dshape(self):
return dshape(self._dshape)


class Coerce(Expr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a docstring here with some examples?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep will do

@@ -22,7 +22,7 @@
__all__ = ['Expr', 'ElemWise', 'Field', 'Symbol', 'discover', 'Projection',
'projection', 'Selection', 'selection', 'Label', 'label', 'Map',
'ReLabel', 'relabel', 'Apply', 'Slice', 'shape', 'ndim', 'label',
'symbol']
'symbol', 'Coerce', 'coerce']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the coerce function be available as a standalone function from the top level package or do you think this is most useful as a method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll just leave it as a method for now and remove it from __all__

@llllllllll
Copy link
Member

Just had one comment on the visibility of the coerce function; but, this looks good to me

cpcloud added a commit that referenced this pull request Jun 25, 2015
@cpcloud cpcloud merged commit 27fadde into blaze:master Jun 25, 2015
@cpcloud cpcloud deleted the coerce branch June 25, 2015 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sum(boolean) doesn't work on sql but it's exposed as an attribute on bool columns
2 participants