New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cbook.is_sequence_of_strings knows string objects #4358
Conversation
This is primarily how pandas stores a sequence of strings. | ||
|
||
>>> import pandas as pd | ||
>>> import matplotilb.cbook as cbook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo matplotilb -> matplotlib
**Issue** `cbook.is_sequence_of_strings` cannot tell of a pandas series is contains strings. Pandas series are numpy arrays of type `object`. **Problem** `cbook.is_sequence_of_strings` uses `cbook.is_string_like` to dismiss strings as not sequences of strings. However, numpy string arrays of `dtype=object` are string-like since they support string concatenation. **Solution** Do not dismiss an `ndarray` if it appears string-like, go ahead and peak inside.
I have done the revisions though a little bit "niggly" is that |
It looks like that is the current behavior so it is probably ok to leave it. In practical terms, this is probaly acceptable as anything that loops over the iterable expecting strings will work correctly (as it will never loop). |
attn @cpcloud is this in fact a reasonable way to handle this from the pandas pov? |
@tacaswell You can do this with In [9]: s = pd.Series(list('abc'))
In [10]: su = pd.Series(list(u'abc'))
In [11]: pd.lib.is_string_array(s.values)
Out[11]: True
In [12]: pd.lib.is_unicode_array(s.values)
Out[12]: False
In [13]: pd.lib.is_string_array(su.values)
Out[13]: False
In [14]: pd.lib.is_unicode_array(su.values)
Out[14]: True |
@@ -783,8 +783,12 @@ def is_sequence_of_strings(obj): | |||
""" | |||
if not iterable(obj): | |||
return False | |||
if is_string_like(obj): | |||
return False | |||
if is_string_like(obj) and not isinstance(obj, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does is_string_like(series of str)
do? Does it loop through the array checking to see if each element is a str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_string_like(obj)
checks if string concatenation is possible. i.e If you can add a string to it, then it is "string like".
is_string_like(series of str)
is indeed "string like".
Defn: is_string_like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we can not import pandas directly in core matplotlib for both technical
and policy reasons.
On Tue, Apr 28, 2015, 11:31 Phillip Cloud notifications@github.com wrote:
In lib/matplotlib/cbook.py
#4358 (comment):@@ -783,8 +783,12 @@ def is_sequence_of_strings(obj):
"""
if not iterable(obj):
return False
- if is_string_like(obj):
return False
- if is_string_like(obj) and not isinstance(obj, np.ndarray):
What does is_string_like(series of str) do? Does it loop through the
array checking to see if each element is a str?—
Reply to this email directly or view it on GitHub
https://github.com/matplotlib/matplotlib/pull/4358/files#r29254616.
Ah, right. This fix seems reasonable to me. |
FIX : cbook.is_sequence_of_strings behave better on array/pd.series of strings
@has2k1 Thanks! |
Issue
cbook.is_sequence_of_strings
cannot tell of a pandas series is contains strings. Pandas series are numpy arrays of typeobject
.Problem
cbook.is_sequence_of_strings
usescbook.is_string_like
to dismiss strings as not sequences of strings. However, numpy string arrays ofdtype=object
are string-like since they support string concatenation.Solution
Do not dismiss an
ndarray
if it appears string-like, go ahead and peak inside.For reviewer
dtype=object
to be not string-like but this is probably more destructive asis_string_like
appears over 100 times in the code base, vsis_sequence_of_strings
which is used only two times.