str_cat with Pandas `.str.cat()` interface #1496

kwmsmith · 2016-04-28T16:11:24Z

Ensured that chaining .str_cat().str_cat() rolls up concatenation ops in SQL query.
Will be refactored as .str.cat() when .str accessor namespace is implemented.
Tested against Pandas / SQL / postgres backends.

GH 1476

Ad-hoc testing with postgres. Still need automated tests.

- input error checking in str_cat - clean up compute_up() functions

GH 1476 _child attribute had special meaning in blaze. For this function, we only need a 'lhs' and 'rhs' argument so replace '_child' and 'col' with 'lhs' and 'rhs' where the data in 'rhs' concatenates to the data in 'lhs' elementwise.

GH 1476 also raise an exception if 'sep' kwarg is not a string

GH 1476

GH 1476 Table needs at least two string columns for good test so added a new table 'accounts2' for this test.

Reformat the example in docstring to see if that fixes the error thrown by doctest.

- use existing DataFrame (dfbig) for previous tests - add a fixture to append row will null values and test str_cat()

GH 1476 If a row in either column has a NULL, str_cat() returns None which is consistent with pandas for na_rep kwarg set to None; current default behavior. todo: add na_rep as a kwarg.

GH 1476 - if lhs or rhs arguments contain Nulls, then output of str_cat() will also contain Null values.

GH 1476 Since str_cat() is a binary operation, we should be able to chain it to concat more than one String columns - similar to pandas.

The table/DataFrame needs three string columns so updated test variables as well.

- Handles the cases where str_cat operates on expression with WHERE clause: eg: s = symbol('s', ds2) t = s[s.amount <= 200] c1 = t.comment.str_cat(t.sex, sep=' -- ') - pulled out manipuating data arrays into separate function - added a test - chaining str_cat() with WHERE clause is not consistent with pandas yet - also unable to add (Select, Select) in existing compute_up decorator - it fails

The exception raised when trying to concat string columns from different tables is not yet implemented fully for all use cases in the refactored code. For now, marked a test as xfail.

llllllllll · 2016-04-28T16:13:25Z

blaze/compute/sql.py

@@ -1185,7 +1188,7 @@ def compute_up(t, s, **kwargs):
 string_func_names = {
    # <blaze function name>: <SQL function name>
    'str_upper': 'upper',
-    'str_lower': 'lower',
+    'str_lower': 'lower'


If we are making a style edit, should this just be:

string_func_names = {'str_upper': upper, 'str_lower': lower}

Uses `reconstruct_select()` properly.

kwmsmith · 2016-04-28T18:23:16Z

Thanks for the pointers @llllllllll -- reconstruct_select() was exactly what I was looking for; it's been a while since I've touched the sql stuff.

sandhujasmine · 2016-04-28T19:07:57Z

blaze/compute/tests/test_sql_compute.py

+@pytest.mark.parametrize("expr",
+                         [t.name.str_cat(t_str_cat.comment),
+                          t.name.str_cat(t_str_cat.comment.str_cat(t_str_cat.name))])
+def test_str_cat_runtime_exception(expr):


Are we checking that the str_cat operates on the same table? If not, then we should take this test out - I saw a comment that we need to do this check more consistently throughout other blaze operations so perhaps remove the test?

Jasmine Sandhu added 30 commits April 11, 2016 10:38

BUG: Add missing import for warnings

7cdfeac

ENH: Add str_cat to concatenate string columns

5b7dd25

GH 1476

ENH: Implement str_cat() for pandas. Add tests.

c7f6d0d

GH 1476

ENH: Implement str_cat() for SQL backend.

66126d8

Ad-hoc testing with postgres. Still need automated tests.

Separate str_cat() function from StrCat() class

cbeab10

- input error checking in str_cat - clean up compute_up() functions

ENH: '_child' not needed - change it to 'lhs'

4570fdc

GH 1476 _child attribute had special meaning in blaze. For this function, we only need a 'lhs' and 'rhs' argument so replace '_child' and 'col' with 'lhs' and 'rhs' where the data in 'rhs' concatenates to the data in 'lhs' elementwise.

TST: Move str_cat() exception tests to test_strings

a39322f

GH 1476 also raise an exception if 'sep' kwarg is not a string

BUG: Output is String since dtype may not be fixed length String

a0c042e

GH 1476

TST: Add test_str_cat() to check SQL

a7c775c

GH 1476 Table needs at least two string columns for good test so added a new table 'accounts2' for this test.

Fix: [doctest] failure in docstring. Reformat and try again.

262af14

Reformat the example in docstring to see if that fixes the error thrown by doctest.

Fix docstring to fix [doctest] CI error

9b19c27

Fix docstring to fix [doctest] CI error

4ce427c

Fix output in docstring to correct [doctest] CI error

bb6c56d

TST: test str_cat() against postgres backend

653a184

Use blaze expr to label the SQLA instead of the data

201be5b

Remove extraneous parenthesis

3945819

TST: Add str_cat() test for null values in DataFrame

7978147

- use existing DataFrame (dfbig) for previous tests - add a fixture to append row will null values and test str_cat()

Make str_cat() for NULL values consistent with Pandas str_cat

84ed5e1

GH 1476 If a row in either column has a NULL, str_cat() returns None which is consistent with pandas for na_rep kwarg set to None; current default behavior. todo: add na_rep as a kwarg.

TST: Modify str_cat to include Null values in rows

afebbae

TST: Simplify test and fix incorrect assertion

42ed91c

BUG: Fix schema resulting from str_cat()

58a1688

GH 1476 - if lhs or rhs arguments contain Nulls, then output of str_cat() will also contain Null values.

TST: Test str_cat() schema; use fixtures for exception tests

b223865

TST: Update test for new SQL that handles NULL values

4721c0c

Rename input args so they are consistent with StrCat

a34dece

TST: Test for schema and shape; rename test.

e6b63f1

ENH: Implement chaining str_cat() operation

e1e8240

GH 1476 Since str_cat() is a binary operation, we should be able to chain it to concat more than one String columns - similar to pandas.

TST: Rename test variables to more meaningful names

321ba19

TST: Add test for chaining str_cat() feature

bf8e80c

The table/DataFrame needs three string columns so updated test variables as well.

DOC: Behavior for null entires for str_cat

deca1c4

Raise exception to concat (str_cat) columns from different tables

33b8a0a

Jasmine Sandhu and others added 6 commits April 22, 2016 13:32

BUG: Fix runtime error checking for different input types

fcf2e92

TST: No exception if symbols point to same resource

a71d71b

TST: Fix test 'no exceptions if symbols point to same resource'

1afc8a4

WIP: Update decorate to remove duplicate code; mark test as xfail

65d76b8

The exception raised when trying to concat string columns from different tables is not yet implemented fully for all use cases in the refactored code. For now, marked a test as xfail.

Reimplement str_cat on top of + operator for SQL backend.

0416294

kwmsmith added new expression api design strings sql pandas postgresql labels Apr 28, 2016

kwmsmith added this to the 0.11 milestone Apr 28, 2016

llllllllll reviewed Apr 28, 2016
View reviewed changes

kwmsmith added 3 commits April 28, 2016 12:50

Refactor compute_up for StrCat.

5c1805f

Uses `reconstruct_select()` properly.

Pass through encoding in StrCat _dshape().

5e32275

Style and formatting tweak.

daa0ae4

sandhujasmine reviewed Apr 28, 2016
View reviewed changes

Remove unnecessary tests.

d03c9c0

kwmsmith modified the milestones: 0.10.1, 0.11 Apr 28, 2016

kwmsmith mentioned this pull request Apr 28, 2016

Add str_cat() to pandas and sql to concatenate string columns #1479

Closed

kwmsmith added 2 commits April 28, 2016 15:42

Merge branch 'master' into feature-cat

243e076

Update whatsnew [ci skip]

26d2d9a

kwmsmith merged commit 0094aa4 into blaze:master May 2, 2016

kwmsmith deleted the feature-cat branch May 2, 2016 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

str_cat with Pandas `.str.cat()` interface #1496

str_cat with Pandas `.str.cat()` interface #1496

kwmsmith commented Apr 28, 2016

llllllllll Apr 28, 2016

kwmsmith commented Apr 28, 2016

sandhujasmine Apr 28, 2016

str_cat with Pandas .str.cat() interface #1496

str_cat with Pandas .str.cat() interface #1496

Conversation

kwmsmith commented Apr 28, 2016

llllllllll Apr 28, 2016

Choose a reason for hiding this comment

kwmsmith commented Apr 28, 2016

sandhujasmine Apr 28, 2016

Choose a reason for hiding this comment

str_cat with Pandas `.str.cat()` interface #1496

str_cat with Pandas `.str.cat()` interface #1496