Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyplot.scatter() does not cycle colors #3041

Closed
wavexx opened this issue May 6, 2014 · 29 comments
Closed

pyplot.scatter() does not cycle colors #3041

wavexx opened this issue May 6, 2014 · 29 comments

Comments

@wavexx
Copy link
Contributor

wavexx commented May 6, 2014

It looks like MPL is inconsistent in color cycling.
I would expect scatter() to cycle the current color as plot() does.

@tacaswell tacaswell added this to the v1.5.x milestone May 17, 2014
@tacaswell tacaswell modified the milestones: v1.4.0, v1.5.x May 17, 2014
@tacaswell
Copy link
Member

Sorry for this not getting a response for so long.

Tagged an 1.4.0, but it might slip. I am not very enthusiastic about adding this (I think of scatter as in a different class that the other plotting tools as it can be colormapped).

@efiring
Copy link
Member

efiring commented May 17, 2014

I agree; is there any compelling use case for having scatter cycle the color? Sufficiently compelling to change long-established behavior?

@wavexx
Copy link
Contributor Author

wavexx commented May 18, 2014

I'm using scatter() to plot independent variables, as such I expect each call to have it's own color automatically. I came to expect this behavior after using R mostly (both base graphics and ggplot).

In fact, I would expect every "high level" plotting command to cycle colors by default, since cycling colors or choosing a specific color from the RC index (from the user's perspective) is not convenient.

But it would be perfectly fine for me to have another command for having scatter plots. It's a shame that I have to use another library on top of MPL just for such convenience.

@tacaswell
Copy link
Member

If you are not varying the size of the markers your are far better off using plot with no line than using scatter.

If you are writing loops anyway throwing an extra variable in is not a huge deal (and your not using the pyplot interface in a script are you? ;) ).

@Tillsten
Copy link
Contributor

I use pyplot all the time, even in scripts :). Only very few methods using the color cycle at the moment (plot and hist, anything else?) and improving this would be nice, but i think this a 2.0 change due to
backward incompatibility.

@tacaswell
Copy link
Member

@Tillsten The danger of using pyplot in scripts is that if the state machine gets out of sync you can get some really strange results. There is a genre of questions on SO that consist of 'the state machine hates me!'

fig, ax = plt.subplots(1, 1)
ax.plot(...)

is less typing, gives you better control, and gives you access to more functions (as pyplot does not wrap everything).

@Tillsten
Copy link
Contributor

I know, but a) the state is really small: current axis and figure b) i even use pylab 😱, so it is less typing. Note, this is for making figures only, for complicated cases and interaction i change to the oo-interface.

But i complete agree, that beginner should learn learn the oo-interface!

@pelson
Copy link
Member

pelson commented May 20, 2014

But i complete agree, that beginner should learn learn the oo-interface!

Or just understand what the difference is. It's not so complicated. When I teach mpl, I introduce the Axes methods, and then demonstrate that pyplot is just a think wrapper to those. At the end of it, newbie Python developers can describe the pros&cons of pyplot, which I think is a result... (http://nbviewer.ipython.org/github/SciTools/courses/blob/master/course_content/matplotlib_intro.ipynb).

@wavexx
Copy link
Contributor Author

wavexx commented May 21, 2014

I think we're losing the focus on the issue here.
My argument is that high-level plotting functions should cycle colors by default.
So far, hist() and plot() cycle colors, but nothing else.

scatter() seems high-level enough to me, especially considering that an useful addition to scatter would be the option to have some jitter, which would not be very logical on plot().

I see the argument of backward compatibility, but I was hoping there was some direction in cleaning the API a bit. If not now, a new relase?

The choice of colors is not uniform (box plots for instance do not respect the color index), there's no decent way to get/set the current index color conveniently, and generally speaking I keep having to look at the documentation all the time due to the lack of consistency here and there.

@Tillsten
Copy link
Contributor

@wavexx I think we all agreeing, that more functions should use color cycling. But there
is probably a lot of code depending on it, breaking it with 1.4 is not worth it imo. One
way address this in the future could be adding an additional rcParam use_color_cycle and defaulting this to true in the following release.

@wavexx
Copy link
Contributor Author

wavexx commented May 21, 2014

I like this suggestion. Though if we introduce this variable, I'd like to know some things:

  1. What other functions could benefit from color cycling? Otherwise we will end-up in the same situation as someone else finds that X should also color-cycle be we cannot break the API again.

  2. We need a public method to get/set the current color index.

  3. We also need a convenient method to get a valid color from an index.

I never specify colors directly on the various calls. I copy the values myself from the RC into a variable, and keep reusing the variable to have consistent colors. I'm not sure if an integer (as opposed to float) could be distinguished on it's type() and be accepted as a valid color, in the form of rc[color_cycle][index % len(rc[color_cycle])] as gnuplot does. I find this behavior very convenient, especially when producing consistent plots for the same publication, and changing style later is easier.

@tacaswell
Copy link
Member

Regarding jitter, that has come up before and it is very unlikely that will get added. matplotlib should only do deterministic transforms to the plotted data. I view adding jitter as boarder-line data falsification and completely unacceptable in a scientific context.

The problem with the way we do it now (keeping the color cycle information in the axes) is that every plotting function needs to have this logic. I think providing a class like:

from six import callable
from itertools import cycle

class color_cycler(object):
    def __init__(self, ax, color_cycle):
        self._ax = ax
        self._cc = color_cycle
        self._iter = iter(cycle(color_cycle))

    def __getattr__(self, fname):
        print fname
        if not (hasattr(ax, fname) and callable(getattr(ax, fname))):
            raise AttributeError(
                "no function named {fn} in axes".format(fn=fname))
        wrap_fun = getattr(ax, fname)
        def _inner(*args, **kwargs):
            if 'color' not in kwargs:
                kwargs['color'] = next(self._iter)

            wrap_fun(*args, **kwargs)

        return _inner

    def reset_cycle(self):
        self._iter = iter(cycle(self._cc))

    def muck_with_cycle(self, arg):
        # step backwards or something, haven't thought this out very well
        pass

and stripping this logic from axes/axes methods is a more maintainable path.

@wavexx
Copy link
Contributor Author

wavexx commented May 21, 2014

On 05/21/2014 04:41 PM, Thomas A Caswell wrote:

Regarding jitter, that has come up before and it is very unlikely that
will get added. |matplotlib| should only do deterministic transforms to
the plotted data. I view adding jitter as boarder-line data
falsification and completely unacceptable in a scientific context.

On one-dimensional data, when comparing groups where data is exactly
overlapped, and jitter is added to the perpendicular axis of the value,
there is no data falsification going on (the true value is not changed),
and looks more distinguished than using different markers if you have to
compare 3 groups.

As for determinism, you could even set a fixed seed.

That being said, I used jittered plots maybe three times in my career.

@tacaswell
Copy link
Member

There is a slight typo in the code above ax -> self._ax in a couple places.

This likes you do things like

fig, ax = plt.subplots(1, 1)
cc = color_cycler(ax, ['r', 'b', 'g'])
cc2 = color_cycler(ax, ['k', 'c', 'g'])

cc.plot(np.random.rand(5))
cc2.plot(range(5))

cc.plot(np.random.rand(5))
cc2.plot(range(5)[::-1])

cc.scatter(np.random.rand(5), np.random.rand(5))
cc2.plot(np.random.rand(5), np.random.rand(5), 'x')

and then you can do cute things like just drop in pyplot

fig, ax = plt.subplots(1, 1)
cc3 = color_cycler(plt, ['r', 'b', 'g'])
cc3.plot(range(5))
cc3.plot(range(5)[::-1])
cc3.plot(np.random.rand(5), np.random.rand(5), 'x')
cc3.scatter(np.random.rand(5), np.random.rand(5), color='k')

or even drop in a pandas object:

fig, ax = plt.subplots(1, 1)
import pandas as pd
s = pd.Series(np.arange(15))
cc4 = color_cycler(s, ['k', 'r'])
cc4.plot(linewidth=5)
cc4.plot(marker='o', linestyle='none')
cc4.plot(marker='x', linestyle='none')

@Tillsten
Copy link
Contributor

I don't like the api too much, ich would prefer to have the colorcycler as an simple axis attribute.

fig, ax = plt.subplots(1, 1)
c = ax.color_cycler # gets default cc
#change it 
ax.color_cycler = color_cycler(s, ['k', 'r'])

@WeatherGod
Copy link
Member

Just to chime in a bit... don't forget about the possibility to cycle other
properties such as markers and line styles.

Thomas's example is intriguing, but it has a fundamental drawback (in my
view). It changes the cycling semantics from a default operation to an
explicit operation. Now, maybe this wouldn't be as bad as one might think,
but I also don't really like the fact that color_cycler() would be an
opaque wrapper around an axes object. It would make introspection and
debugging difficult, and also abuses duck-typing because now cc.set_xlim()
wouldn't work, but cc.hist() would.

On Thu, May 22, 2014 at 7:46 AM, Till Stensitzki
notifications@github.comwrote:

I don't like the api too much, ich would prefer to have the colorcycler as
an simple axis attribute.

fig, ax = plt.subplots(1, 1)
c = ax.color_cycler # gets default cc
#change it
ax.color_cycler = color_cycler(s, ['k', 'r'])


Reply to this email directly or view it on GitHubhttps://github.com//issues/3041#issuecomment-43877335
.

@tacaswell
Copy link
Member

Changing the cycling from implicit to explicit was the goal. The color_cycler could just as easily be handed an iterable of dictionaries which are used to (optionally) update the kwargs on the way through (and could probably be really cute and keep catching exceptions and dropping kwargs until the call worked which would get cc.set_xlim() to work).

To be clear, this was an idea I had so I wrote it up. It seems to work, but I am not advocating this to replace the current color cycle (yet), but it does solve the problem of the OP.

It is probably a better path to put that logic in a decorator which we can then wrap functions in, and add respect_color_cycle and a advance_color_cycle to enable/disable paying attention to it and advancing the cycle on a per-call basis.

@WeatherGod
Copy link
Member

A decorator! Why didn't I think of that before?! This would allow
developers to specify which plotting functions are cyclable, which
properties are cyclable, and helps to keep the mess in the Axes class
manageable by not having an explicit color cycle object for each plotter!
It also would do a nice job to move the cycling logic out of the plotter.

@tcaswell, you are a genius!!! I might have some time tonight to slap
together a proof of concept.

On Thu, May 22, 2014 at 5:02 PM, Thomas A Caswell
notifications@github.comwrote:

Changing the cycling from implicit to explicit was the goal. The
color_cycler could just as easily be handed an iterable of dictionaries
which are used to (optionally) update the kwargs on the way through (and
could probably be really cute and keep catching exceptions and dropping
kwargs until the call worked which would get cc.set_xlim() to work).

To be clear, this was an idea I had so I wrote it up. It seems to work,
but I am not advocating this to replace the current color cycle (yet), but
it does solve the problem of the OP.

It is probably a better path to put that logic in a decorator which we can
then wrap functions in, and add respect_color_cycle and a
advance_color_cycle to enable/disable paying attention to it and
advancing the cycle on a per-call basis.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3041#issuecomment-43943590
.

@tacaswell
Copy link
Member

@WeatherGod Given that I had decorators on my mind for your comment on #3070 I think you did think of this before.

I don't think we want to try to fit this in for 1.4, re-tagging as 1.5.x

@tacaswell tacaswell modified the milestones: v1.5.x, v1.4.0 May 22, 2014
@WeatherGod
Copy link
Member

That is true, but I just never thought about using decorators for color
cycling. This is an example of the "many eyes" concept.

There is going to be some difficulties with recognizing and handling
multiple datasets in a single call, but it shouldn't be impossible. I agree
that such work should go into 1.5.

@wavexx
Copy link
Contributor Author

wavexx commented May 23, 2014

I like the overall concept, and using decorators.
This could easily be extended to line style as well, which would make generating B/W plots easier.

@tacaswell
Copy link
Member

We should also pull @olgabot (prettyplot), @JanSchultz (yhat ggplot), @mwaskom (seaborn), and @tonyyu (style sheets) in on this conversation. I any work on this should make their lives easier so getting their input sooner rather than later would be good. I also think that the discussions of how/if to change rcparams (ex #2637 ) should be folded in with this.

How many of you all are going to be at scipy?

@wavexx Sorry for completely hi-jacking your (seemingly simple) issue to be 'refactor all the things!'

@tacaswell
Copy link
Member

Again meant to ping @JanSchulz, sorry for the noise.

@mwaskom
Copy link

mwaskom commented May 23, 2014

I won't be at scipy.

Whether or not scatter should cycle, I would at the very least advocate for the default color being configurable somehow.

@WeatherGod
Copy link
Member

I will be at SciPy, and my vote is to transition to fixing the default for
scatter so that it can cycle colors in the future. Btw, in case it hasn't
been mentioned yet, the workaround is to pass a Python "None" to the color
argument, IIRC.

On Fri, May 23, 2014 at 11:32 AM, Michael Waskom
notifications@github.comwrote:

I won't be at scipy.

Whether or not scatter should cycle, I would at the very least advocate
for the default color being configurable somehow.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3041#issuecomment-44026172
.

@jankatins
Copy link
Contributor

We don't use color cycling but either specify our own colors (which uses mpl color cycling in some themes) or use just one default color.

What would be nice is to specify more than one color/point style (linestlye/...) during the ax.something(...) call, as we currently have to do multiple calls and do a groupby with the data for such things. But that's out of scope in this bugreport and also probably completly different than mpl works right now.

@tacaswell
Copy link
Member

@JanSchulz I think #4258 + discussion at end of #3818 will get you towards what you want.

I am closing this issue as it has been quiet for over a year and what of this thread will be addressed is being is being addressed elsewhere.

@flipdazed
Copy link

flipdazed commented Jul 10, 2016

This works, taken straight from my own code.

from matplotlib import colours
import random
clist = [i for i in colors.ColorConverter.colors if i != 'w']
colour = (i for i in random.sample(clist, len(clist)))

I randomise the names because life is too short to have the same colours also I filter out w which stands for u'wheredmydatago'

import matplotlib.pyplot as plt
from matplotlib import colors
import numpy as np
import random

x = np.random.random(20)
test_list = [(x, x**i) for i in range(1,5)]

fig = plt.figure(figsize=(8, 8)) # make plot
ax =[]
ax.append(fig.add_subplot(111))
clist = [i for i in colors.ColorConverter.colors if i != 'w']
colour = (i for i in random.sample(clist, len(clist)))
for xy in test_list: 
    c = next(colour)
    ax[0].scatter(*xy, marker='o', color = c, label=c)
ax[0].legend(loc='best', shadow=True, fancybox=True)
plt.show()

Finally if you are feeling like a risk taker then replace as clist = [c for c in colors.cnames if c not in ['white', 'snow']] and you might just get a plot coloured by u'palegoldenrod

As a side comment it will break if you have more lines than colours. I leave that to someone else to figure out as I don't require it.

@tacaswell tacaswell modified the milestones: 2.0 (style change major release), 2.1 (next point release) Jul 10, 2016
@tacaswell
Copy link
Member

This actually went in to 2.0 via #6291 You should checkout the 2.0.0b1 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants