Violin Plots #2996

solvents · 2014-04-19T23:34:42Z

This PR implements violin plots.

Violin plots can be used to represent the distribution of sample data. They are similar to box plots, but use a kernel density estimation function to present a smooth approximation of the data sample used.

We wrote this in response to issue #2873 as part of a software engineering course project (the same project as PR #2961's). It's designed to have a call signature similar to boxplot, but some customization options remain to be added.

tacaswell · 2014-04-20T02:04:03Z

Wow, awesome (again)!

@phobson Can you take a look at this (as who I have labeled as the current box-plot expert)?

From skimming the code it looks like you dealt with the scipy dependency by just ripping out the bits you needed?

solvents · 2014-04-20T02:40:19Z

Yep,
We had a couple members talking to some stats experts about how they might implement kde from the ground up, but I believe they decided it was either too much work or would be sub-par for the amount of time we had to work on it.

cimarronm · 2014-04-20T14:55:51Z

I had never even heard of violin plots. Very interesting...good stuff!

One thing I noticed from your example plot is that the top of the violin has some space between the error bar tick mark and the distribution while the bottom does not.

solvents · 2014-04-20T16:34:19Z

lib/matplotlib/axes/_axes.py

+            max_val = kde.dataset.max()
+            mean = np.mean(kde.dataset)
+            median = np.median(kde.dataset)
+            coords = np.arange(min_val, max_val, (max_val - min_val)/points)


You're right. Looks like an off by "1" error. I think changing this line to

coords = np.linspace(min_val, max_val, points)

would probably do the trick.

mdboom · 2014-04-20T22:08:29Z

I haven't gone through the code, but just wanted to say: 👏!

phobson · 2014-04-21T14:41:02Z

I glanced at the code last night and everything seemed in order and well thought out. Tests, docs, and examples look good.

Since Tom and Mike seem excited about this, it's indoibtedly more appropriate for matplotlib than I initially thought.

That said, here are some things I think we should consider:

My main concern is that violin plots are currently available in both seaborn and statsmodels. While seaborn does required scipy, it is pip-installable from source on a basic Windows set up. So that barrier to entry is pretty low. I'm concerned about fragmentation here, both with the plotting code and the KDE implementation. Off the top of my head, this means that a new python user will be able to choose between scipy, sklearn, statsmodels, and this stripped down version in matplotlib for KDE.

While I'm sure the KDE implementation is fine, what if the scipy team finds a bug? Keeping on top of scipy's implementation and tests is something we'll need to be diligent about.

Both statmodels and pandas now have a policy of directing PRs of more advanced visualizations to seaborn and python-ggplot. That might not apply to us, but I think that policy is conducive to making a more seamless and cohesive environment for new users.

TL;DR
This looks like a great PR. I'd squash the commits if possible before merging.

tacaswell · 2014-04-21T15:02:41Z

re seaborn and python-ggplot (and @olgabot prettyplot), I think we need to work with them to start pulling more of the 'basic' advanced plots back down into matplotlib.

Would it be possible to re-factor this the way that boxplot is now? ie separate the code that draws the pretty lines and the code that turns raw data -> something that the drawing code can deal with.

It looks like L6888 - L6902 (https://github.com/matplotlib/matplotlib/pull/2996/files#diff-7e79bc5b4cdd21de353697e9ada248b7R6888) could be extracted into a separate function which generates a list/dict of the parameters that the fill call on L6904 takes + the mean/max/min... scalars used farther down. This would then give the higher level libraries (seaborn/ggplot) a more powerful building block and hopefully cut down on duplicated effort.

This is another facet of a simmering discussion about how much computation matplotlib should be doing in the plotting functions.

solvents · 2014-04-21T21:18:21Z

How squashed should it be? I have it at 28 by collapsing sequential commits from the same user, but I could make it into one big commit if that would be better.

solvents · 2014-04-22T23:32:36Z

@phobson I went with 22 commits after squashing, let me know if I should do them all. Even written from scratch, the kde code is another thing that could require maintenance down the road, but it seems like an alright fit in mlab (and of course, we needed it for violin plots). Maybe it will end up in a dependency down the road.

@tacaswell I took a stab at re-factoring violinplot, as you suggested. It uses axes.vplot to produce the plot, cbook.vp_stats to arrange the data, and axes.violinplot to put the two together. I dealt with the dependency between the data arranging code and mlab by using a method parameter in cbook.vp_stats. I'm wondering if vp_stats could be generalized to do what bxp_stats does too, given a different method, but I didn't want to make this PR any bigger.

tacaswell · 2014-04-23T01:37:38Z

That is exactly what I had in mind.

That looks like a reasonable number of commits, squash much more and you would start dropping people from the work log which is not cool.

I think vplot is a bad name (but do not have a better suggestion).

phobson · 2014-04-23T01:54:01Z

I would do violin_stats, violin, violinplot.

Does it make sense to allow people to specify a KDE computation method/function/lambda, provided it returns suitable output? The default (None) would of course fallback to the KDE implementation in this PR, but it might be nice to allow someone complete control over the kernal shape and size and just whichever implementation they prefer (e.g., sklearn, scipy, statsmodels).

solvents · 2014-04-23T03:04:03Z

I'm happy with those names.

Specifying a method is supported through a parameter to vp_stats (which I'll rename violin_stats). Are you thinking as a parameter to violinplot as well?

Also, I separated some code that converts 1d objects into 2d objects from bxp_stats as it was useful in violin_stats as well, but now I'm wondering if np.atleast_2d would work instead?

WeatherGod · 2014-04-23T13:07:33Z

Do watch out with np.atleast_2d(). IIRC, older version of numpy had that
function turning masked arrays into regular arrays.

On Tue, Apr 22, 2014 at 11:04 PM, Per Parker notifications@github.comwrote:

I'm happy with those names.

Specifying a method is supported through a parameter to vp_stats (which
I'll rename violin_stats). Are you thinking as a parameter to violinplotas well?

Also, I separated some code that converts 1d objects into 2d objects from
bxp_stats as it was useful in violin_stats as well, but now I'm wondering
if np.atleast_2d would work instead?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2996#issuecomment-41119982
.

phobson · 2014-04-23T13:40:18Z

@solvents Yeah. Well, I was thinking that users could supply a lambda for function that called say, statsmodels KDE to compute the values. But right when I went to bed last night I realized that's a pretty dumb suggestion. As long as the "schema" of the output of violin_stats is well documented, people can generate that themselves however they wish. Problem solved.

solvents · 2014-04-23T13:51:34Z

Ah, I didn't know atleast_2d could do that. Will leave it as is.

My last thought on the current specification for the returned dictionary is by adding a key called, say, extra_lines, it might be possible to push violin into rendering some more advanced plots (eg. bean plots). It's easy enough to add optional parameters to the dictionary though so it could be left as a possible extension.

solvents · 2014-04-23T18:01:55Z

Renamed methods, updated whats_new and CHANGELOG

tacaswell · 2014-04-23T22:25:23Z

Why did you merge master into your branch? Our typical procedure is to rebase feature branches on top of current master.

solvents · 2014-04-23T22:41:10Z

Ah, alright. Should I just drop the merge commit, or do I do the rebase?

tacaswell · 2014-04-23T22:45:38Z

Unless there is a conflict you need to address, yes.

solvents · 2014-04-23T22:50:17Z

There were a couple of conflicts but they were trivial, not sure if that counts.

tacaswell · 2014-04-23T22:55:25Z

For the github website merge to work the merge needs to be clean. That probably means you need to rebase (which if the conflicts are trivial and I assume in CHANGELOG/whats_new.rst) will be easy.

solvents · 2014-04-23T23:13:22Z

OK, sorry about that. Used a rebase this time.

tacaswell · 2014-04-23T23:15:33Z

No worries, it's your first PR.

Also, please don't be discouraged by the amount of feed back you are getting, it means people are excited about your work ;).

tacaswell · 2014-05-16T02:28:00Z

@solvents Could you rebase this again?

I hope your exams went well.

solvents · 2014-05-16T14:35:02Z

Rebased. I'm not sure if the Travis results are related to anything we did.

Exams went great, thanks for asking!

tacaswell · 2014-05-17T04:38:02Z

lib/matplotlib/axes/_axes.py

+        if vert:
+            fill = self.fill_betweenx
+            rlines = self.hlines
+            blines = self.vlines


What do 'r' and 'b' stand for? I think this code is clear, I am just curious.

I think they are supposed to be right-angle and base... perp_lines and par_lines would probably make more sense.

tacaswell · 2014-05-23T14:12:14Z

Sorry, miss-typed @JanSchulz

jankatins · 2014-05-24T09:44:34Z

I don't think we have any API consideration. As we do our own API we fiddle the data until it fits matplotlib. As long as it has the same signature as boxplots we are probably fine.

CC: @has2k1 to correct me :-)

One requirement we have for faceting (and which I think our code is currently broken) is that we have a way to specify "gaps", meaning that a violin (or boxplot/bar) is not present in one facet/axis (but is in the other facets/axes).

So, it is possible to specify a "gap" in the violins? Either by saying position=[1,2,3,5,6] (gap at 4) or by specifying position=[1,2,3,4,5,6], data=[[...],[...],[...],[],[...],[...] (empty list at 4) or whatever you feel fine :-))

…unction to accept new KDE. Added basic violinplot demo in examples

…and medians.

…density is now referred to as gaussian_kde and exists as a class in mlab. Fixed list comp position bug and updated examples

Fixed several style issues.

Added comments for test cases referencing the origins.

Conflicts: lib/matplotlib/tests/test_mlab.py

Some course related tests were added. These are removed later.

…linplot and updated demo

…st_mlab.py Fixed a syntax error in python 3 and fixed up some violinplot tests. Fixed some style problems. Removed course-related test from list of tests.

…uncated. Updated test images and reran boilerplate.py

…lot data for drawing), axes.violin (draws pre-arranged violin plot data), and axes.violinplot (uses cbook.violin_stats to draw violin plots via axes.violin) Updated whats_new.rst. Updated CHANGELOG.

solvents · 2014-05-24T22:00:24Z

Gaps can be specified by using the positions argument, the way you've written it there. I suspect having an empty data column would raise an error.
@tacaswell I think the last commit addresses the changes you suggested.

has2k1 · 2014-05-24T22:41:42Z

I think the computation separation will work for python-ggplot. Time permitting, I will go a head and implement it on top of this branch just to make sure.

Violin Plots

tacaswell added this to the v1.4.0 milestone Apr 20, 2014

solvents reviewed Apr 20, 2014
View reviewed changes

tacaswell reviewed May 17, 2014
View reviewed changes

solvents and others added 22 commits May 24, 2014 16:47

Added functionality for plotting the actual violin bodies to violinplot.

d9f71fb

Added ksdensity (KDE) function in mlab and adapted _axes violinplot f…

da40c9d

…unction to accept new KDE. Added basic violinplot demo in examples

Added statistics lines to the violinplot function.

b9d7e03

Added horizontal violin plot feature compatible with means, extremas …

3c4c619

…and medians.

Fixed bug for regression test matplotlib#1181 in scipy unit tests; ks…

e6f1b38

…density is now referred to as gaussian_kde and exists as a class in mlab. Fixed list comp position bug and updated examples

Reverted 69b304c.

e8b3041

Re-added points parameter to violinplot.

91a33e8

Fixed several style issues.

Merging for pep8 compliance for mlab's GaussianKDE

690c770

Added violinplot tests.

b98b591

updated version of test_axes.py with violinplot tests

2fa212a

Finished violin plot tests with horizontal feature

14e6402

Initial tests for ksdensity

8761ed4

re worked the tests in test_mlab to work with the modified GaussianKDE

ef89a14

Fixed some style issues. Added violinplot to boilerplate.py

0555ef1

Added comments for test cases referencing the origins.

Added silverman and scott tests.

6277624

implementations of the evaluate tests

39c66a9

Conflicts: lib/matplotlib/tests/test_mlab.py

Fixed kde for invalid bw_methods, misc test formatting.

ca7d605

Some course related tests were added. These are removed later.

Renamed vp_coverage baseline images, added bw_method paramater to vio…

11aee86

…linplot and updated demo

Updated kde tests in test_mlab.py to reflect changes to imports in te…

f189325

…st_mlab.py Fixed a syntax error in python 3 and fixed up some violinplot tests. Fixed some style problems. Removed course-related test from list of tests.

Fixed an issue in violinplot where the top end of violins would be tr…

ad0bcd4

…uncated. Updated test images and reran boilerplate.py

Refactored axes.violinplot into cbook.violin_stats (arranges violin p…

b8db6d2

…lot data for drawing), axes.violin (draws pre-arranged violin plot data), and axes.violinplot (uses cbook.violin_stats to draw violin plots via axes.violin) Updated whats_new.rst. Updated CHANGELOG.

Updated violinplot. Removed pdf and svg test images.

01c3176

tacaswell added a commit that referenced this pull request May 26, 2014

Merge pull request #2996 from solvents/master

687286a

Violin Plots

tacaswell merged commit 687286a into matplotlib:master May 26, 2014

QuLogic removed the status: needs revision label Nov 24, 2016

story645 mentioned this pull request May 7, 2020

We have Violinplots but no kdeplot. #17341

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Violin Plots #2996

Violin Plots #2996

solvents commented Apr 19, 2014

tacaswell commented Apr 20, 2014

solvents commented Apr 20, 2014

cimarronm commented Apr 20, 2014

solvents Apr 20, 2014

mdboom commented Apr 20, 2014

phobson commented Apr 21, 2014

tacaswell commented Apr 21, 2014

solvents commented Apr 21, 2014

solvents commented Apr 22, 2014

tacaswell commented Apr 23, 2014

phobson commented Apr 23, 2014

solvents commented Apr 23, 2014

WeatherGod commented Apr 23, 2014

phobson commented Apr 23, 2014

solvents commented Apr 23, 2014

solvents commented Apr 23, 2014

tacaswell commented Apr 23, 2014

solvents commented Apr 23, 2014

tacaswell commented Apr 23, 2014

solvents commented Apr 23, 2014

tacaswell commented Apr 23, 2014

solvents commented Apr 23, 2014

tacaswell commented Apr 23, 2014

tacaswell commented May 16, 2014

solvents commented May 16, 2014

tacaswell May 17, 2014

solvents May 17, 2014

tacaswell commented May 23, 2014

jankatins commented May 24, 2014

solvents commented May 24, 2014

has2k1 commented May 24, 2014

Violin Plots #2996

Violin Plots #2996

Conversation

solvents commented Apr 19, 2014

tacaswell commented Apr 20, 2014

solvents commented Apr 20, 2014

cimarronm commented Apr 20, 2014

solvents Apr 20, 2014

Choose a reason for hiding this comment

mdboom commented Apr 20, 2014

phobson commented Apr 21, 2014

tacaswell commented Apr 21, 2014

solvents commented Apr 21, 2014

solvents commented Apr 22, 2014

tacaswell commented Apr 23, 2014

phobson commented Apr 23, 2014

solvents commented Apr 23, 2014

WeatherGod commented Apr 23, 2014

phobson commented Apr 23, 2014

solvents commented Apr 23, 2014

solvents commented Apr 23, 2014

tacaswell commented Apr 23, 2014

solvents commented Apr 23, 2014

tacaswell commented Apr 23, 2014

solvents commented Apr 23, 2014

tacaswell commented Apr 23, 2014

solvents commented Apr 23, 2014

tacaswell commented Apr 23, 2014

tacaswell commented May 16, 2014

solvents commented May 16, 2014

tacaswell May 17, 2014

Choose a reason for hiding this comment

solvents May 17, 2014

Choose a reason for hiding this comment

tacaswell commented May 23, 2014

jankatins commented May 24, 2014

solvents commented May 24, 2014

has2k1 commented May 24, 2014