Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first draft of geom ribbon #63

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JorySchossau
Copy link

@JorySchossau JorySchossau commented Apr 12, 2020

[feature request] geom_ribbon
So this is sort of a feature request, but I also took a first stab at implementing it and, behold, here is a working version. Discussion and suggestions very welcome. I still don't understand lots of ggplotnim, like there are some things I am confused about the convention like why ggplot() uses color and fill and everything else changes it to color and fillColor? Or how should the interfacing with the arraymancer backend be done correctly and for consistency? You'll see my messy block of code about this!

Weird things to look for:

  • Added aesyMin, aesyMax and fill them if their aesthetics exist through isSome.
  • Removed the ribbon's line by default ltNone.
  • Repeating myself with some point-positioning code before adding points to end and beginning of linePoints

Example

let
  x = linspace(0.0, 15.0, 100)
  y = x.mapIt(pow(sin(it), 2.0))
  upperError = y.mapIt(it+0.2+rand(0.3))
  lowerError = y.mapIt(it-0.4-rand(0.2))
  df = seqsToDf(x, y, upperError, lowerError)

ggplot(df, aes("x", "y")) +
geom_ribbon(aes(yMin="lowerError", yMax="upperError"), alpha=some(0.3)) +
geom_line() +
ggsave("result.pdf")

@Vindaar
Copy link
Owner

Vindaar commented Apr 13, 2020

First of all thanks a lot! This is great and was definitely missing. It's a really great idea to just insert the min values at the beginning and the max values at the end to avoid having to do weird things to actually fill the resulting polygon!

I think the drawing code can be simplified significantly, but the rest is mostly fine as it is. I'll comment the code specifically.

I still don't understand lots of ggplotnim, like there are some things I am confused about the convention

To be fair, this probably often due to two reasons:

  • things actually don't make sense and are just place holders (e.g. not having a fillColor arg for geom_bar; I saw your branch) or the fact that the settings of geoms don't comply with the names of the corresponding aesthetics (e.g. the aesthetic is fill, but the setting for the equivalent of fill is fillColor)
  • big parts are still heavy WIP and haven't been cleaned up yet.

So given that I'm really happy to see that you were able to do this in the first place. It means to me that probably the code isn't as horrible anymore as it was, haha.

like why ggplot() uses color and fill and everything else changes it to color and fillColor?

Do you mean just the fact I mentioned above? That is the setting being called fillColor instead of fill while the aes is called fill? Or do you mean why internally fill is actually referred to as fillColor?

The former, as I said, is just inconsistent and should be changed.

The latter is by design, because I consider the name fill to be rather inaccurate. It's nice for the user API because int the context it gets the point across, but in random parts of the code something like result.fill would be hard to understand. That's also just the naming ginger uses.

Or how should the interfacing with the arraymancer backend be done correctly and for consistency?

Do you mean the whole when defined(defaultBackend): ... else: ... parts? Well, for the moment that is pretty ugly indeed. Essentially you have to make sure that both code paths define the same symbols. These when parts are only required where the API to access data from the old DataFrame differs from the arraymancer DataFrame.

So for instance in ggplot_drawing where you define aesyMin and aesyMax it's the correct way to handle that unfortunately. To be honest I don't envision to have the old data frame around for long. So that code duplication shouldn't be here with us for too long. If it does turn out to stay, I'll rather extend the DF API of the old data frame, to have procs that mimic the new API (even if the naming is then a little whack).

edit: I accidentally pressed Ctrl+Enter when I wasn't done yet...

@Vindaar
Copy link
Owner

Vindaar commented Apr 13, 2020

Aside from the mentioned changes, it'd be great if you could make your example a recipe, by adding it to recipes.org.

To fully add a recipe you also have to:

  • make sure the recipe saves its plot as a PNG to the media/recipes directory
  • add the resulting recipes .nim file to the recipes directory (just as a reference)
  • add the recipe to be run to recipes/runRecipes.nim
  • add it to tests/tCompareRecipes.nim, which requires:
    • the resulting png file has to be added to the media/expected directory

Especially if you've never written Org files or used ntangle this might be a little weird, so don't worry about it in that case.

And finally a changelog entry would be nice. And feel free to tag the PR and your github tag in the changelog, which will make it easier to thank the contributors when a new full release is done.

If you don't want to do any of the above, it's fine and I'll do it after merging.

And again: thanks a lot for your efforts!

const RibbonDefaultStyle = Style(lineWidth: 1.0,
lineType: ltNone,
color: transparent,
fillColor: grey20)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose default fillColor of
const grey70 = color(0.7, 0.7, 0.7),
which you could add above these default styles definitions. The other consts are defined in ginger at the moment. This is pretty annoying, which is why I wanted to change it anyways. At some point it's probably wise to put all those consts for colors (and possibly way more?) into this repository in its own file or something.

With this default grey it would look nice even without alpha.

Comment on lines +499 to +584
if fg.geom.aes.yMax.isSome:
aesyMax = evaluate(fg.geom.aes.yMax.unsafeGet.col, df)
else:
var xT = df[$fg.xcol].toTensor(Value)
var yT = df[$fg.ycol].toTensor(Value)
var aesyMax, aesyMin: Tensor[Value]
if fg.geom.aes.yMin.isSome:
aesyMin = evaluate(fg.geom.aes.yMin.unsafeGet.col, df).toTensor(Value)
if fg.geom.aes.yMax.isSome:
aesyMax = evaluate(fg.geom.aes.yMax.unsafeGet.col, df).toTensor(Value)
if fg.geom.kind == gkRibbon:
if not (fg.geom.aes.yMin.isSome and fg.geom.aes.yMax.isSome):
echo "WARNING: using geom ribbon requires min and max aesthetics!"
for i in 0 ..< df.len:
if styles.len > 1:
style = mergeUserStyle(styles[i], fg.geom.userStyle, fg.geom.kind)
# get current x, y values, possibly clipping them
p = getXY(view, df, xT, yT, fg, i, theme, xOutsideRange,
yOutsideRange, xMaybeString = true)
if viewMap.len > 0:
# get correct viewport if any is discrete
viewIdx = getView(viewMap, p, fg)
locView = view[viewIdx]
if needBinWidth:
# potentially move the positions according to `binPosition`
binWidths = calcBinWidths(df, i, fg)
moveBinPositions(p, binWidths, fg)
pos = getDrawPos(locView, viewIdx,
fg,
p = p,
binWidths = binWidths,
df, i,
prevVals)
if fg.geom.kind != gkRibbon: # we handle the ribbon points separately below
p = getXY(view, df, xT, yT, fg, i, theme, xOutsideRange,
yOutsideRange, xMaybeString = true)
if viewMap.len > 0:
# get correct viewport if any is discrete
viewIdx = getView(viewMap, p, fg)
locView = view[viewIdx]
if needBinWidth:
# potentially move the positions according to `binPosition`
binWidths = calcBinWidths(df, i, fg)
moveBinPositions(p, binWidths, fg)
pos = getDrawPos(locView, viewIdx,
fg,
p = p,
binWidths = binWidths,
df, i,
prevVals)
case fg.geom.position
of pkIdentity:
case fg.geom.kind
of gkLine, gkFreqPoly: linePoints.add pos
of gkRibbon:
# add yMax points to end of linePoints
# and add yMin points to beginning of linePoints
block: # namespace hygiene required maybe for template from getXY?
# add yMax point as next point
p = getXY(view, df, xT, aesyMax, fg, i, theme, xOutsideRange,
yOutsideRange, xMaybeString = true)
if viewMap.len > 0:
# get correct viewport if any is discrete
viewIdx = getView(viewMap, p, fg)
locView = view[viewIdx]
if needBinWidth:
# potentially move the positions according to `binPosition`
binWidths = calcBinWidths(df, i, fg)
moveBinPositions(p, binWidths, fg)
pos = getDrawPos(locView, viewIdx,
fg,
p = p,
binWidths = binWidths,
df, i,
prevVals)
linePoints.add pos
block:
# add yMin point as very first point
p = getXY(view, df, xT, aesyMin, fg, i, theme, xOutsideRange,
yOutsideRange, xMaybeString = true)
if viewMap.len > 0:
# get correct viewport if any is discrete
viewIdx = getView(viewMap, p, fg)
locView = view[viewIdx]
if needBinWidth:
# potentially move the positions according to `binPosition`
binWidths = calcBinWidths(df, i, fg)
moveBinPositions(p, binWidths, fg)
pos = getDrawPos(locView, viewIdx,
fg,
p = p,
binWidths = binWidths,
df, i,
prevVals)
linePoints.insert(pos,0)
else: locView.draw(fg, pos, p.y, binWidths, df, i, style)
of pkStack:
case fg.geom.kind
of gkLine, gkFreqPoly: linePoints.add pos
of gkLine, gkFreqPoly, gkRibbon: linePoints.add pos
Copy link
Owner

@Vindaar Vindaar Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of these changes, I propose the following. We already have readErrorBar, which handles xMin, yMin, ... etc. for error bar plots. This is essentially all that's required. Therefore, the code in this proc can remain as is, instead of the blocks from 540 to 580 and line 584 we just write:

      of gkRibbon: linePoints.addRibbonData(locView, pos, df, i, fg)

where addRibbonData is a proc you define above:

proc addRibbonData(linePoints: var seq[Coord],
                   view: Viewport,
                   pos: Coord,
                   df: DataFrame,
                   idx: int,
                   fg: FilledGeom) =
  ## adds the error bars for the ribbon to the `linePoints`.
  ## TODO: handle stacked positions properly?
  template toC1(val: float, axKind: AxisKind): Coord1D =
    block:
      var res: Coord1D
      case axKind
      of akX:
        res = Coord1D(pos: val,
                      scale: view.xScale,
                      axis: akX,
                      kind: ukData)
      of akY:
        res = Coord1D(pos: val,
                      scale: view.yScale,
                      axis: akY,
                      kind: ukData)
      res

  let (xMin, xMax, yMin, yMax) = readErrorData(df, idx, fg)
  var posMin: Coord
  var posMax: Coord
  posMin.x = if xMin.isSome: toC1(xMin.unsafeGet, akX) else: pos.x
  posMin.y = if yMin.isSome: toC1(yMin.unsafeGet, akY) else: pos.y
  posMax.x = if xMax.isSome: toC1(xMax.unsafeGet, akX) else: pos.x
  posMax.y = if yMax.isSome: toC1(yMax.unsafeGet, akY) else: pos.y
  linePoints.add posMax
  linePoints.insert(posMin, 0)

And remove the added changes at the beginning of the proc.

This has the added benefit of also supporting ribbons for xMin, xMax too (and even weird combinations, which will result in some funky plots I assume), and instead of adding checks along the lines of only x or y related procs, I'd either add:

  • a check for either xMin, xMax / yMin, yMax to geom_ribbon
  • just explain in the doc string for geom_ribbon that the user should either use this or that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change requires two small changes in different files.

  • in ggplot_types.nim you need to extend the geomKind field for the FilledGeom to include gkRibbon to the gkErrorBar case, so that xMin etc. fields are defined for gkRibbon.
  • in postprocess_scales.nimin fillOptFields you have to do the same, add gkRibbon to the gkErrorBar case so that these fields are actually filled and we don't have to access the aes of the raw geom.

@JorySchossau
Copy link
Author

JorySchossau commented Apr 13, 2020

Thanks for the detailed feedback - just what I was looking for.

It's a really great idea to just insert the min values at the beginning and the max values at the end to avoid having to do weird things to actually fill the resulting polygon!

Hah, I appreciate the positivity, but I'm not so sure, since insertions are probably quite costly (looking) yep, it's a while loop O(n) for each insertion. I'll probably refactor to resize linePoints and then only fill the back with the reversed min values all at once at the end. Or something. There are options.

I think the drawing code can be simplified significantly, but the rest is mostly fine as it is. I'll comment the code specifically.

Great! That's what I was hoping for.

So given that I'm really happy to see that you were able to do this in the first place. It means to me that probably the code isn't as horrible anymore as it was, haha.

True! But it took me way longer than I thought just to make sense of what was going on. It would be nice to have a rough guide for each of how to add geoms, shapes, themes, etc. but I understand the code is in flux so it's not yet worth the effort of making a quickly outdated guide.

Do you mean just the fact I mentioned above? That is the setting being called fillColor instead of fill while the aes is called fill? Or do you mean why internally fill is actually referred to as fillColor?
The former, as I said, is just inconsistent and should be changed.
The latter is by design, because I consider the name fill to be rather inaccurate. It's nice for the user API because int the context it gets the point across, but in random parts of the code something like result.fill would be hard to understand. That's also just the naming ginger uses.

Yes, the ggplot2 inconsistency, and that it was being followed only sometimes. I'm totally on board with using color and fillColor since that inconsistency 'feature' of ggplot2 always bothered me, not to mention it made it harder to teach!

Or how should the interfacing with the arraymancer backend be done correctly and for consistency?

Do you mean the whole when defined(defaultBackend): ... else: ... parts? Well, for the moment that is pretty ugly indeed. Essentially you have to make sure that both code paths define the same symbols. These when parts are only required where the API to access data from the old DataFrame differs from the arraymancer DataFrame.

No, the when defined for backends makes sense and I don't see a way around it. In retrospect after reading your suggestions, I understand now that with an earlier implementation where I was indeed trying to use readErrorBar I was getting confused because I didn't understand that geoms were variants, and so couldn't untangle why accessing the aes data was so difficult, chalking it up to some arcane principle of the backend. After resolving it I didn't go back and try readErrorBar again! oops. I'm still unfamiliar with how arraymancer tensor handling works and will have to read up on it.

Question on backends:
I noticed rather late that compiling with the defaultBackend is quite a bit faster, which helps the debugging cycle be less annoying. Is there some way to decrease compile times when using arraymancer? Maybe linking to a shared library, or some such? I'm looking at 20+ seconds on my poor laptop. Refactoring my nim example into a separate module called from a dumb entrypoint that had already included ggplotnim helped somewhat.

I would propose default fillColor of...

I'll do that.

make your example a recipe, by adding it to recipes.org

I'll do that - org files and tangling don't look too complicated.

Thanks again for responding so quickly and giving detailed feedback. I'll get to this on the weekend at the latest.

@Vindaar
Copy link
Owner

Vindaar commented Apr 13, 2020

Hah, I appreciate the positivity, but I'm not so sure, since insertions are probably quite costly (looking) yep, it's a while loop O(n) for each insertion. I'll probably refactor to resize linePoints and then only fill the back with the reversed min values all at once at the end. Or something. There are options.

Oh well sure, the implementation using insert isn't the most performant that's true. I was however referring to the idea of essentially building a seq that contains an open loop of the yMin, yMax data, like:

yMax/x[0]--------------------------------yMax/x[^1]
                                                  |
                                                  |
                                                  |
yMin/x[0]--------------------------------yMin/x[^1]                             

which is neat, because then the filling of the polygon is natural.

Feel free to optimize it, otherwise I'll think of something.

It would be nice to have a rough guide for each of how to add geoms, shapes, themes, etc. but I understand the code is in flux so it's not yet worth the effort of making a quickly outdated guide.

So lately I feel like the current design has proven to be performant enough, while being relatively elegant in terms of DRY violations etc. (which doesn't imply simple though).
This means I could think about writing something along those lines. I had in mind to write a document that explains the GgPlot creation, collection of scales, postprocessing and drawing phases anyways. If I do that correctly I can refer to that in simple explanations for such how-tos.

Yes, the ggplot2 inconsistency, and that it was being followed only sometimes. I'm totally on board with using color and fillColor since that inconsistency 'feature' of ggplot2 always bothered me, not to mention it made it harder to teach!

I suppose there's some argument to be had about improving some things from ggplot2, which may not be intuitive. But for the time being I'd rather stay compliant with it. And if only because I feel like Hadley Wickham (and of course other ggplot2 contributors) know a lot more about these things than I do. If I start to make changes, because I believe I know better might lead to bad decisions in the long run.

where I was indeed trying to use readErrorBar I was getting confused because I didn't understand that geoms were variants

It probably doesn't help that there's a Geom and a FilledGeom object. So yeah, sorry for the lack of documentation about the internals. :) In those cases though, please feel free to open an issue and just ask about these things. I'll gladly provide detailed explanations about things, if I know a person is actually interested. That would hopefully avoid a lot of time wasting for people.

I noticed rather late that compiling with the defaultBackend is quite a bit faster, which helps the debugging cycle be less annoying. Is there some way to decrease compile times when using arraymancer? Maybe linking to a shared library, or some such? I'm looking at 20+ seconds on my poor laptop.

This was indeed something I thought about originally when I wrote the first data frame implementation. It wasn't a major aspect that changed my mind on whether to use arraymancer or not for it, but I knew from the code for my PhD that arraymancer does unfortunately take some time to compile.
Maybe @mratsim can chime in. I haven't checked, but I can imagine what might help if in arraymancer we could have smaller submodules that can be imported completely independently. E.g. just import arraymancer_tensor (Note: import arraymancer.tensor or import arraymancer/tensor doesn't work, iirc).
While we do have dead code elimination I believe Nim still compiles more files than actually necessary when I compile ggplotnim now.
That would certainly be the easiest way to improve compilation times. However, 20 seconds really seems like a long time. But that's only for the first compilation, no? The ones afterwards should be significantly quicker?

I'll get to this on the weekend at the latest.

Yeah, no rush!

And also, feel free to ping me on Nim's IRC channel or just send me a message on gitter.

@mratsim
Copy link

mratsim commented Apr 23, 2020

I actually tried to allow import arraymancer/tensor/tensor but I think I need to reorganize that and fight nimble a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants