String broadcastables #1462

kwmsmith · 2016-03-28T19:53:35Z

This starts the process of expanding Blaze's string column support to include upper and lower. This is useful since having first-class (and optimized) support for common string operations is useful for the string-munging pain points that users hit.

I'm marking these as part of the "experimental" API currently, since I'm not wild about the str_upper, str_lower, etc. naming scheme. I'd like to find a better naming system for these if we can.

We have immediate need for upper and lower, so I'm putting these in for 0.10.

Regarding naming schemes: I like the Pandas' style df.col.str.upper().str.replace(...). We could expand that to include df.col.dt.datetimemethod() for datetimes as well.

We'll have to think about which string and datetime methods we want to support, for which backends, and what are the semantics when the method in question returns multiple values. All of that is outside the scope for this PR.

This PR also deprecates strlen and adds str_len for consistency.

Just on pandas backend for now.

Implement for Pandas and SQL.

kwmsmith · 2016-03-28T20:46:03Z

ping @ltransom

necaris · 2016-03-29T15:26:05Z

FWIW 👍 for the Pandas style of df.col.str.*, especially because that lets us do more with df.col.dt.*, df.col.int.*, and so on.

llllllllll · 2016-03-29T15:33:02Z

I am also +1 on the str and dt descriptor namespaces.

sandhujasmine · 2016-03-29T22:25:05Z

Tested on postgres test data in blaze-benchmarks. No issues.

sandhujasmine · 2016-04-08T17:39:32Z

Created an epic to capture this feature but I'm not able to create the epic and add issue to it. Might be a permissions issue.

Here's the high level epic: #1476
Here's one issue for the fist bullet point: #1475

kwmsmith added 2 commits March 25, 2016 10:34

Add lower() and upper() string methods.

21f57f9

Just on pandas backend for now.

Expand string methods to include upper and lower.

6674a0c

Implement for Pandas and SQL.

kwmsmith added this to the 0.10 milestone Mar 28, 2016

kwmsmith added 2 commits March 28, 2016 14:53

Merge branch 'master' into string-broadcastables

2568b56

Update whatsnew [ci skip]

7e6701c

kwmsmith added easy new expression api design strings sql pandas labels Mar 28, 2016

kwmsmith merged commit 6961844 into blaze:master Mar 29, 2016

kwmsmith deleted the string-broadcastables branch March 29, 2016 22:11

sandhujasmine mentioned this pull request Apr 18, 2016

Add str_cat() to pandas and sql to concatenate string columns #1479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String broadcastables #1462

String broadcastables #1462

kwmsmith commented Mar 28, 2016

kwmsmith commented Mar 28, 2016

necaris commented Mar 29, 2016

llllllllll commented Mar 29, 2016

sandhujasmine commented Mar 29, 2016

sandhujasmine commented Apr 8, 2016

String broadcastables #1462

String broadcastables #1462

Conversation

kwmsmith commented Mar 28, 2016

kwmsmith commented Mar 28, 2016

necaris commented Mar 29, 2016

llllllllll commented Mar 29, 2016

sandhujasmine commented Mar 29, 2016

sandhujasmine commented Apr 8, 2016