Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String broadcastables #1462

Merged
merged 4 commits into from Mar 29, 2016

Conversation

@kwmsmith
Copy link
Member

commented Mar 28, 2016

This starts the process of expanding Blaze's string column support to include upper and lower. This is useful since having first-class (and optimized) support for common string operations is useful for the string-munging pain points that users hit.

I'm marking these as part of the "experimental" API currently, since I'm not wild about the str_upper, str_lower, etc. naming scheme. I'd like to find a better naming system for these if we can.

We have immediate need for upper and lower, so I'm putting these in for 0.10.

Regarding naming schemes: I like the Pandas' style df.col.str.upper().str.replace(...). We could expand that to include df.col.dt.datetimemethod() for datetimes as well.

We'll have to think about which string and datetime methods we want to support, for which backends, and what are the semantics when the method in question returns multiple values. All of that is outside the scope for this PR.

This PR also deprecates strlen and adds str_len for consistency.

kwmsmith added some commits Mar 25, 2016

Add lower() and upper() string methods.
Just on pandas backend for now.
Expand string methods to include upper and lower.
Implement for Pandas and SQL.

@kwmsmith kwmsmith added this to the 0.10 milestone Mar 28, 2016

@kwmsmith

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2016

ping @ltransom

@necaris

This comment has been minimized.

Copy link

commented Mar 29, 2016

FWIW 👍 for the Pandas style of df.col.str.*, especially because that lets us do more with df.col.dt.*, df.col.int.*, and so on.

@llllllllll

This comment has been minimized.

Copy link
Member

commented Mar 29, 2016

I am also +1 on the str and dt descriptor namespaces.

@kwmsmith kwmsmith merged commit 6961844 into blaze:master Mar 29, 2016

@kwmsmith kwmsmith deleted the kwmsmith:string-broadcastables branch Mar 29, 2016

@sandhujasmine

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2016

Tested on postgres test data in blaze-benchmarks. No issues.

@sandhujasmine

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2016

Created an epic to capture this feature but I'm not able to create the epic and add issue to it. Might be a permissions issue.

Here's the high level epic: #1476
Here's one issue for the fist bullet point: #1475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.