Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str_cat with Pandas .str.cat() interface #1496

Merged
merged 42 commits into from
May 2, 2016
Merged

Conversation

kwmsmith
Copy link
Member

  • Ensured that chaining .str_cat().str_cat() rolls up concatenation ops in SQL query.
  • Will be refactored as .str.cat() when .str accessor namespace is implemented.
  • Tested against Pandas / SQL / postgres backends.

ping @sandhujasmine

Jasmine Sandhu added 30 commits April 11, 2016 10:38
Ad-hoc testing with postgres. Still need automated tests.
- input error checking in str_cat
- clean up compute_up() functions
GH 1476
_child attribute had special meaning in blaze. For this function, we
only need a 'lhs' and 'rhs' argument so replace '_child' and 'col' with
'lhs' and 'rhs' where the data in 'rhs' concatenates to the data in
'lhs' elementwise.
GH 1476
also raise an exception if 'sep' kwarg is not a string
GH 1476
Table needs at least two string columns for good test so added a new
table 'accounts2' for this test.
Reformat the example in docstring to see if that fixes the error thrown
by doctest.
- use existing DataFrame (dfbig) for previous tests
- add a fixture to append row will null values and test str_cat()
GH 1476
If a row in either column has a NULL, str_cat() returns None which is
consistent with pandas for na_rep kwarg set to None; current default
behavior.

todo: add na_rep as a kwarg.
GH 1476 - if lhs or rhs arguments contain Nulls, then output of
str_cat() will also contain Null values.
GH 1476
Since str_cat() is a binary operation, we should be able to chain it to
concat more than one String columns - similar to pandas.
The table/DataFrame needs three string columns so updated test variables
as well.
Jasmine Sandhu and others added 6 commits April 22, 2016 13:32
- Handles the cases where str_cat operates on expression with WHERE
  clause: eg:
    s = symbol('s', ds2)
    t = s[s.amount <= 200]
    c1 = t.comment.str_cat(t.sex, sep=' -- ')

- pulled out manipuating data arrays into separate function
- added a test
- chaining str_cat() with WHERE clause is not consistent with pandas yet
- also unable to add (Select, Select) in existing compute_up decorator - it fails
The exception raised when trying to concat string columns from different
tables is not yet implemented fully for all use cases in the refactored
code. For now, marked a test as xfail.
@@ -1185,7 +1188,7 @@ def compute_up(t, s, **kwargs):
string_func_names = {
# <blaze function name>: <SQL function name>
'str_upper': 'upper',
'str_lower': 'lower',
'str_lower': 'lower'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are making a style edit, should this just be:

string_func_names = {'str_upper': upper,
                     'str_lower': lower}

@kwmsmith
Copy link
Member Author

Thanks for the pointers @llllllllll -- reconstruct_select() was exactly what I was looking for; it's been a while since I've touched the sql stuff.

@pytest.mark.parametrize("expr",
[t.name.str_cat(t_str_cat.comment),
t.name.str_cat(t_str_cat.comment.str_cat(t_str_cat.name))])
def test_str_cat_runtime_exception(expr):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we checking that the str_cat operates on the same table? If not, then we should take this test out - I saw a comment that we need to do this check more consistently throughout other blaze operations so perhaps remove the test?

@kwmsmith kwmsmith merged commit 0094aa4 into blaze:master May 2, 2016
@kwmsmith kwmsmith deleted the feature-cat branch May 2, 2016 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants