Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cbook.is_sequence_of_strings knows string objects #4358

Merged
merged 1 commit into from Apr 30, 2015

Conversation

has2k1
Copy link
Contributor

@has2k1 has2k1 commented Apr 20, 2015

Issue
cbook.is_sequence_of_strings cannot tell of a pandas series is contains strings. Pandas series are numpy arrays of type object.

Problem
cbook.is_sequence_of_strings uses cbook.is_string_like to dismiss strings as not sequences of strings. However, numpy string arrays of dtype=object are string-like since they support string concatenation.

Solution
Do not dismiss an ndarray if it appears string-like, go ahead and peak inside.

For reviewer

This is primarily how pandas stores a sequence of strings.

>>> import pandas as pd
>>> import matplotilb.cbook as cbook
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo matplotilb -> matplotlib

@tacaswell tacaswell added this to the next point release milestone Apr 21, 2015
**Issue**
`cbook.is_sequence_of_strings` cannot tell of a pandas series
is contains strings. Pandas series are numpy arrays of type
`object`.

**Problem**
`cbook.is_sequence_of_strings` uses `cbook.is_string_like` to dismiss
strings as not sequences of strings. However, numpy string arrays
of `dtype=object` are string-like since they support string
concatenation.

**Solution**
Do not dismiss an `ndarray` if it appears string-like, go ahead and
peak inside.
@has2k1
Copy link
Contributor Author

has2k1 commented Apr 27, 2015

I have done the revisions though a little bit "niggly" is that is_sequence_of_strings returns True for all empty "iterables". Considering that you can have empty arrays of dtype('S[num]'), discriminating on that and zero-length would make for a somewhat "petulant" solution.

@tacaswell
Copy link
Member

It looks like that is the current behavior so it is probably ok to leave it. In practical terms, this is probaly acceptable as anything that loops over the iterable expecting strings will work correctly (as it will never loop).

@tacaswell
Copy link
Member

attn @cpcloud is this in fact a reasonable way to handle this from the pandas pov?

@cpcloud
Copy link

cpcloud commented Apr 28, 2015

@tacaswell You can do this with pd.lib.infer_string_array or pd.lib.is_unicode_array.

In [9]: s = pd.Series(list('abc'))

In [10]: su = pd.Series(list(u'abc'))

In [11]: pd.lib.is_string_array(s.values)
Out[11]: True

In [12]: pd.lib.is_unicode_array(s.values)
Out[12]: False

In [13]: pd.lib.is_string_array(su.values)
Out[13]: False

In [14]: pd.lib.is_unicode_array(su.values)
Out[14]: True

@@ -783,8 +783,12 @@ def is_sequence_of_strings(obj):
"""
if not iterable(obj):
return False
if is_string_like(obj):
return False
if is_string_like(obj) and not isinstance(obj, np.ndarray):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does is_string_like(series of str) do? Does it loop through the array checking to see if each element is a str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_string_like(obj) checks if string concatenation is possible. i.e If you can add a string to it, then it is "string like".

is_string_like(series of str) is indeed "string like".

Defn: is_string_like

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can not import pandas directly in core matplotlib for both technical
and policy reasons.

On Tue, Apr 28, 2015, 11:31 Phillip Cloud notifications@github.com wrote:

In lib/matplotlib/cbook.py
#4358 (comment):

@@ -783,8 +783,12 @@ def is_sequence_of_strings(obj):
"""
if not iterable(obj):
return False

  • if is_string_like(obj):
  •    return False
    
  • if is_string_like(obj) and not isinstance(obj, np.ndarray):

What does is_string_like(series of str) do? Does it loop through the
array checking to see if each element is a str?


Reply to this email directly or view it on GitHub
https://github.com/matplotlib/matplotlib/pull/4358/files#r29254616.

@cpcloud
Copy link

cpcloud commented Apr 28, 2015

Ah, right. This fix seems reasonable to me.

tacaswell added a commit that referenced this pull request Apr 30, 2015
FIX : cbook.is_sequence_of_strings behave better on array/pd.series of strings
@tacaswell tacaswell merged commit b091e26 into matplotlib:master Apr 30, 2015
@tacaswell
Copy link
Member

@has2k1 Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants