Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix in stackplot (Issue #22393): Removes artifacts when input height is zero #22424

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tlkaufmann
Copy link

@tlkaufmann tlkaufmann commented Feb 7, 2022

PR Summary

As detailed in Issue #22393, when the height of an input to stackplot is zero, the anti-aliasing of the underlying fill_between calls create artifacts (thin lines). These thin lines are not removed when setting lw=0 but are gone when setting aa=False, therefore definitely being an unwanted anti-aliasing issue.

Minimal example

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)
x = np.arange(10)
y0 = np.linspace(0, 1, 10)
y1 = np.linspace(0, 1, 10)
y2 = [0, 0, 0.25, 0, 0, 0, 0.25, 0, 0, 0]
y3 = np.linspace(0, 1, 10)

colors=['grey', 'blue', 'red', 'blue']

y = np.stack([y0, y1, y2, y3])
im = plt.stackplot(x, y, colors=colors, lw=0)

before the fix
image

after the fix
image

PR Checklist

Note:
There currently does not exist a test for stackplot, so I didn't add anything.

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening your first PR into Matplotlib!

If you have not heard from us in a while, please feel free to ping @matplotlib/developers or anyone who has commented on the PR. Most of our reviewers are volunteers and sometimes things fall through the cracks.

You can also join us on gitter for real-time discussion.

For details on testing, writing docs, and our review process, please see the developer guide

We strive to be a welcoming and open project. Please follow our Code of Conduct.

@tlkaufmann tlkaufmann changed the title Bugfix in stackploot: Removes artifacts when input height is zero Bugfix in stackplot: Removes artifacts when input height is zero Feb 7, 2022
@tlkaufmann tlkaufmann changed the title Bugfix in stackplot: Removes artifacts when input height is zero Bugfix in stackplot (Issue https://github.com/matplotlib/matplotlib/issues/22393): Removes artifacts when input height is zero Feb 7, 2022
@tlkaufmann tlkaufmann changed the title Bugfix in stackplot (Issue https://github.com/matplotlib/matplotlib/issues/22393): Removes artifacts when input height is zero Bugfix in stackplot (Issue #22393): Removes artifacts when input height is zero Feb 7, 2022
@@ -41,6 +41,9 @@ def stackplot(axes, x, *args,
size of each layer. It is also called 'Streamgraph'-layout. More
details can be found at http://leebyron.com/streamgraph/.

hide_empty : bool, optional
If set, hides inputs where they have zero height
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a note about the kwargs this is incompatible with.

@@ -107,9 +110,17 @@ def stackplot(axes, x, *args,
first_line = center - 0.5 * total
stack += first_line

edgecolor = kwargs.pop('edgecolor', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to detect if the user passed any of these three kwargs (k in kwargs) and raise if they have. As implemented this will silently disregard user input which is most frustrating things that a library can do ("I know I passed edgecolor in, I can see it right there in the code, why isn't doing anything!?!") and is something that we should do our utmost to avoid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. I included an error message if that is the case and enclodes the kwargs.pop calls in an if-term.

@tacaswell tacaswell added this to the v3.6.0 milestone Feb 7, 2022
@tacaswell
Copy link
Member

I expect that the option of chopping up each level of the stack into N disconnected fill_betweens is not an option because it will break back-compatibility of the return type?

I am very concerned about the "coupled kwargs" here. That is, the meaning (or if we pay attention to!) user input depends on the values of other kwargs. This sort of coupling is (in my experiance) one of the things that makes APIs hard to use and remember (because you not only have to remember what the inputs are, you have to remember exactly how they affect each other).

@@ -107,9 +111,22 @@ def stackplot(axes, x, *args,
first_line = center - 0.5 * total
stack += first_line

if hide_empty:
if any(k in kwargs for k in ['edgecolor', 'interpolate', 'where']):
raise ValueError('hide_empty is not compatible with edgecolor, '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth putting in the code to sort out which of the conflicting keys it is. The more information we can give the user in the error message the easier it will be for them to fix the problem, and (hopefully) the happier they will be :)

raise ValueError('hide_empty is not compatible with edgecolor, '
'interpolate or where')
else:
edgecolor = kwargs.pop('edgecolor', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably get away without doing the pop and putting in the implicit defaults here by making another dictionary (hide_kwargs or something like that)and then callingfill_between` as

axes.fill_between(..., **kwargs, **hide_kwargs)

and conditionally put things in hide_kwargs which means we can have the conditional a smaller number of places in the code (I think 3 total, once at the top for error checking, once for the base, and once in the loop rather than as 6 ternary invocations + error checking).

I do not feel super strongly about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like the idea of having a hide_kwargs dict as the if ... else ... calls I wrote are rather ugly. But I don't think it would work here because where and edgecolor are unique to every iteration of the loop so that won't work for them.

It will only work for interpolate but then I thought it was better to treat all 3 arguments the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can update / make a new dictionary under the same conditional in the loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that makes sense.

I think at this point though I favor the solution of allowing where to be a list (see below). Are you also fine with that?

@tlkaufmann
Copy link
Author

@tacaswell

I am very concerned about the "coupled kwargs" here.
...

I agree with you on that. The arguments are definitely convoluted now.

I think the issue is that the real problem lies is that the anti-aliasing creates artifacts when there is a zero-width polygon. This can be fixed by setting where, interpolate and edgecolor but that is somewhat of a "hacky" fix.

So ideally, the user should never worry about hide_empty.

@tlkaufmann
Copy link
Author

The issue seems to be quite well known and appears in a bunch of different issues (see e.g. #9574). This is not a matplotlib error but rather a bug on the side of the renderer.
Usually this is solved by setting lw > 0 and edgecolor='face'. However, in the example shown in the description this won't work.

So there won't be a way to fix the underlying issue.

@tlkaufmann
Copy link
Author

I think I have an idea how to solve this issue and also circumvent the whole interfering kwargs problem:

If we allow where to be a list, we can pass it element-wise to fill_between. interpolate=True and edgecolor='face' can be passed as normal.

It would look like:

    where = kwargs.pop('where', None)
    if hasattr(where, '__len__'):
        if len(where) != len(stack):
            raise ValueError("where has to have the same length as y")
    else:
        where = len(stack) * [where]

and then

    coll = axes.fill_between(x, stack[i, :], stack[i + 1, :],
                             where=where[i],
                             facecolor=color, label=next(labels, None),
                             **kwargs)

@tacaswell, are you happy with this solution?

@tacaswell
Copy link
Member

are you happy with this solution?

It will have to be done carefully to not break the case where someone passed in a 1d where and expects it to be re-used for each of the fill_betweens. We have a number of APIs that will do this automatic broadcasting / inference so it would fit in well with the rest of the library and user expectations, but that also means we are aware of how messy they can get ;)


Would it be easier to check for 0s in the input rather than for equality in the stack?

@tlkaufmann
Copy link
Author

Yep that sounds good. Could you guide me towards an implementation of the automatic broadcasting / inference? Def gonna be messy if I try to come up with something myself :D


Checking for 0s in the input is certainly easier. I would leave this to the user, so that they can do call stackplot like plt.stackplot(x, y, colors=colors, where=(y!=0)), interpolate=True, edgecolor='face'.
Since this is now kind of "hidden" I would maybe add this example somewhere? Maybe in examples/lines_bars_and_markers/stackplot_demo.py?

@tacaswell
Copy link
Member

Given that we do not have to worry about unit support on where, you may be able to directly use np.broadcast_to(where, np.shape(y)).

@tlkaufmann
Copy link
Author

Ok so I have overhauled this whole PR as discussed above. stackplot now has one extra parameter where that is passed to elemten-wise to the individual calls of fill_between.
Where can be either a bool, an array of length N or an array of shape (M, N): M being the number of tracks and N their length.

I think this solution is way better than the old one. Now we don't have competing kwargs, and where is used in a similar way to facecolor and label. And original issue (#22393) is solved when calling stackplot(x, y, where=(y!=0), color=color, edgecolor='face', interpolate=True)

@tlkaufmann
Copy link
Author

@tacaswell, do you mind having a look at the recent changes? I think that could be a small PR that doesn't break anything else but solves the issue

@QuLogic QuLogic added the status: needs workflow approval For PRs from new contributors, from which GitHub blocks workflows by default. label Mar 1, 2022
@tacaswell tacaswell force-pushed the stackplot_PR branch 2 times, most recently from 67e3753 to f47d057 Compare April 30, 2022 23:12
@tacaswell
Copy link
Member

@tlkaufmann I apologize that this fell off my radar!

I took the liberty of rebasing and squashing this down to one commit (twice ❤️ flake8). If you want to push additional commits to this branch, before you do anything else do:

git remote update
git checkout stackplot_PR
# where YOUR_REMOTE_NAME is the name of the remote that points to your fork
git reset --hard YOUR_REMOTE_NAME/stackplot_PR   

which will discard (😱) any local commits and replace them with the current state of this branch. If you do not do this the old commits will be resurrected!

I'm going to leave one comment about the docstring which my request some additional work.

regions from being filled. The filled regions are defined by the
coordinates `x[where]`. Can be either a single bool, an array of shape
(N,) or an array of shape (M, N).
Should be used together with `interpolate=True`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if interpolate=True is set and raise/warn or if the user does not pass this, correctly track interpolate?

Is this comment still relevant or is this no longer needed with the new simpler approach?

@tacaswell tacaswell removed the status: needs workflow approval For PRs from new contributors, from which GitHub blocks workflows by default. label Apr 30, 2022
@tlkaufmann
Copy link
Author

@tacaswell , thanks for getting back to me.]

Rebasing was a good idea as the git history was kind of a mess :D I added a warning to check whether interpolate=True is set if where is not None and also resorted the imports in lib/matplotlib/tests/test_axes.py to fit with the main branch as my IDE automatically re-ordered these.

@tlkaufmann
Copy link
Author

The errors in the above checks seem to be irrespective of my changes

@tacaswell
Copy link
Member

I took the liberty of rebasing this.

@tacaswell
Copy link
Member

This probably needs a whats new and/or an example of how to use this to get the desired effect.

tacaswell
tacaswell previously approved these changes Dec 16, 2022
@@ -58,6 +58,15 @@ def stackplot(axes, x, *args,
data : indexable object, optional
DATA_PARAMETER_PLACEHOLDER

where: bool or array of bool, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling with the where parameter. While it is reasonable solve the issue internally by passing where to fill_between, is there any reasonable value for where other than y!=0?

  • The generality of fill_between (where=) is not needed. It does not make sense to not plot regions in a stack plot where there are finite values. That would leave gaps.
  • I assume where needs the shape of y; i.e. must be 2D. Not having a resolution in x does not make sense. Using the same where array for all datasets should be the very exception; usually you need.

It seems like a simpler high-level API such as a boolean parameter clip_zero_regions should be enough, and that would internally use where=(y!=0).

@tacaswell tacaswell modified the milestones: v3.7.0, v3.8.0 Dec 18, 2022
@tacaswell
Copy link
Member

Pushed to 3.8 as Tim makes a good point about the API and the tests are failing.

@tacaswell tacaswell dismissed their stale review December 18, 2022 20:35

API concerns raised + failing tests.

@jklymak
Copy link
Member

jklymak commented Jan 26, 2023

@tlkaufmann any interest in taking this up. It seems your implementation was close, but maybe we do not want to expose all of the complication of where, which seems reasonable to me. I'll move to draft until we hear from you, and I'll mark as orphaned, but feel free to unmark and move out of draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for author
Development

Successfully merging this pull request may close these issues.

None yet

6 participants