New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plotting datetime values from Pandas dataframe #5550
Comments
I'd consider trying to do a PR on this (it'd be my first!). As a general question, what's the preferred "style" for dealing with Pandas support within matplotlib? In particular, is it better to catch the |
We have a strict do not import pandas rule (same goes for scipy) so using attn @TomAugspurger any advice on how to reliably get the first element? I suspect that is is fall-out from @dopplershift 's changes to make mpl work with pint which is less aggressive about casting inputs to numpy arrays. @arc-jim The quick workaround is to use |
How about: try:
x = x[0]
except (KeyError):
x = x.iloc[0]
except (TypeError, IndexError):
pass |
Haven't looked closely at what's going on, but @8one6 is correct that |
|
Actually, How about: try:
x = x[0]
except (KeyError):
x = x.values[0]
except (TypeError, IndexError):
pass |
Maybe this gets it past the units handling, but won't it then fail later? Instead, would it be better to check whether an input has a |
Is there a statement somewhere about how the matplotlib community has decided to handle Pandas datastructures, in general? Particularly with the newest release, the docs make reference to supporting Pandas structures (or at least Pandas-like structures): for example, here. However, there are still suggestions that users use only I don't really mind either approach. But I think consistency would help people write code that avoids corner cases. Is the example here just scratching the surface? Specifically, it would be great to know what the "right answer" is here: is the commitment to supporting "labelled data" real, in which case matplotlib will need to be patched here (and likely elsewhere), or should users know that labelled data handling is a bit of a "buyer beware" situation (not fully supported) and thus to ensure maximum stability, they should always "strip" their data before plotting. |
No, there is no such statement that I am aware of. This is an area that needs attention. I think the prevailing opinion is that we want to make mpl "just work" when possible, but at the same time we don't want to add dependencies on packages like pandas, and we don't want to add more than small amounts of code to handle their inputs. Unfortunately, this leads to the ambiguous situation you highlight. |
For reference, here is a non-date-related case where a very similar thing seems to be causing errors for a SO user. |
Regarding the labeled data, please be careful to keep that seperated from pandas. The promise on labeled data is that the The problem here is that pandas made the choice to give the convenient api to index-based slicing not to positional based slicing on Probably the safest way to get the first element is to do el = next(iter(obj)) which so long as the object is more ndarray like than dict/mapping like should do the right thing. It is marginally slower (40ns vs 170ns for lists and 86ns vs 225ns for arrays), but it does the right thing with array-like data structures that have non-standard PR coming. |
Yeah, no question, the fact that I'm not sure exactly all the situations that matplotlib needs this idea of a "first element". Maybe what you have above works well for |
|
@TomAugspurger By the time we got to these parts of the code we should always have a |
pd.Series prefer indexing via searching the index to positional indexing. This method will get the first element of any iterable. It will advance a generater, but they did not work previously anyway. Closes matplotlib#5550
@8one6 Can you make a new issue for the hist based issue? It is an entirely different code path (and a bit more work to clean up). |
xref #5557 |
pd.Series prefer indexing via searching the index to positional indexing. This method will get the first element of any iterable. It will advance a generater, but they did not work previously anyway. Closes matplotlib#5550
pd.Series prefer indexing via searching the index to positional indexing. This method will get the first element of any iterable. It will advance a generater, but they did not work previously anyway. Closes matplotlib#5550
This appears to be a new issue in 1.5.0.
The script below attempts to plot two 2-D graphs whose X and Y values are Pandas series. The issue seems to occur when pyplot is passed a datetime column which doesn't contain an index of value 0 - note how the second dataframe contains only odd indices (1, 3, 5, etc.)
Stack trace:
Looks like matplotlib's dates.py module is attempting to access the first value in the datetime series - but when passed as a pandas series, x[0] represents the value at index 0, and a) might not exist, and b) isn't necessarily the first value in the series! Possible fix might be to catch IndexErrors and attempt an x.iloc[0] call instead.
Script above worked fine in 1.4.3, but fails specifically in 1.5.0.
The text was updated successfully, but these errors were encountered: