Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goals of the API #1

Open
nical opened this issue Sep 23, 2018 · 40 comments
Open

Goals of the API #1

nical opened this issue Sep 23, 2018 · 40 comments

Comments

@nical
Copy link

nical commented Sep 23, 2018

The shape of a good 2D rendering API really depends on the things it wants to be particularly good at.
For example:

Low level / high level

Should the primitives be simple "fill a solid color path", "stroke a circle" type commands with the optimization of the drawing commands left up to the user, or should it be a high level abstraction that internally optimizes the drawing commands, performs culling, supports complex masking and clipping, filter effects, skeletal animation, etc.?

Immediate / retained

  • Some APIs are designed around making it easy to render a single frame and start over from scratch at the next frame (Cairo, Skia, D2D, and the whole immediate mode family). Those were typically designed to render somewhat static UIs and documents, and are usualy very convenient to use (but hard to make good use of the GPU with).
  • Some use more retained models that take advantage of the idea that for animated/interactive content, a lot of the case a frame is similar to the previous one and a lot of work and data transfer can be avoided at the cost of some bookkeeping. Scaleform and WebRender are examples of this approach (WebRender in particular started with a model that didn't retain a lot and increasingly moves toward retaining as much as possible).

What are the things it needs to be really good at

Simplicity / high quality antialiasing / low quality but fast aniti-aliasing / large paths covering a lot of pixels with a lot of overdraw / small paths like text / small scenes / very large scenes with thousands of elements / etc.
Saying "fast' isn't enough because all approaches have trade-offs that make them good with certain kinds of content and bad at other kinds of content.

Just some food for thoughts. It is fair to experiment with different directions but once things need to settle there are always trade-offs to make. There isn't a single approach that is optimal at doing all of the above, and the goals of something like this inevitably affect the choice of the underlying architecture and algorithms.

(I'm personally most interested in something that would be games/interactivity oriented with an API articulated around the specifics of the rendering technique used, rather than trying to fit a GPU renderer behind, say the canvas API.)

@derekdreery
Copy link
Member

derekdreery commented Sep 23, 2018

Thanks for this.

I'm personally most interested in something that would be games/interactivity oriented with an API articulated around the specifics of the rendering technique used, rather than trying to fit a GPU renderer behind, say the canvas API.

Could you elaborate on what this means? Do you know of any existing APIs (in any language) that I could look at as an example?

Personally I was thinking of something very simple, where the first task is to be able to blit very fast and efficiently, so for example you don't copy the same texture over many times. To start with I'd probably just work on getting triangles and quads with either gradients or a bmp, then add in other things.

The way I saw each draw pass working is that the user specifies a list of changes and the library works out what work needs to be done to enact those changes (moving textures, vertex buffers etc.) before rendering the next frame.

In the future you could add an immediate mode like conrod where you diff against the previous state on the cpu, and then only send stuff that's changed to the gpu.

However I'm by no means an expert, and would happily defer to someone with more knowledge. I guess I'd want the high-level goal to be to build the simplest possible API over the complexity of managing a heterogeneous collection of hardware, that would allow higher-level stuff like D2D to be built on top.

@derekdreery
Copy link
Member

Also how would this project be different from/interact with lyon?

@nical
Copy link
Author

nical commented Sep 23, 2018

Lyon at the moment is only providing some tessellators to convert paths into meshes. You can then render the meshes provided by lyon using OpenGL, gfx-rs, glium, etc. but there's still quite a bit of code to write to get actual pixels on the screen.
The tessellators would be useful to use here, in my opinion.

@derekdreery
Copy link
Member

derekdreery commented Sep 23, 2018 via email

@nical
Copy link
Author

nical commented Sep 23, 2018

I'm mostly talking about having a bottom-up approach to this: start from a rendering technique and iterate/build the API around it. For example I have a certain bias towards geometry-based approaches (I spent so much time working on path tessellation). So I'd take an efficient way to render path as meshes and build the API from that. For example:

  • Batching is very important, so let's make sure it is either exposed explicitly in the API or let's design the API so that batching can be computed efficiently internally and making it hard for the user to produce content that doesn't batch well by accident.
  • One thing that works really well in WebRender is using the depth buffer to render opaque objects front to back in a first pass and then blended objects back to front. The depth buffer save a ton of memory bandwidth and avoid shading a lot of pixels that end up covered by something else. If the API let's you either take advantage of this trick, or is made so it knows how to reorder commands and do this, it's a net win.
  • If we use the depth buffer, the stencil buffer kinda comes for free. Let's make it available in the API in a useful way.
  • Instancing is really nice and useful, let's make sure it is exposed in a way that maps directly to how instancing is done in vulkan and the likes to avoid complex code between the API and the GPU commands.
  • Anti-aliasing is hard. There are techniques from games that don't match the quality of cairo or skia's anti-aliasing, but are fast and good enough for a lot of use-cases including tricks such as using previous frames to refine the anti-aliasing. Deciding what level(s) of quality is desired and building the API in a way that makes the chosen approach simple to implement and control is good.
  • Sometimes it's tiny differences like having a flag to tell the API what things change often and what things are static so that it can organize things efficiently in memory (a flag like this doesn't make much sense in cairo where there isn't a notion of frame, but for a game it would).
  • With vulkan and friend there's this cool notion of being able to build commands form multiple threads. It'd be cool to take advantage of that and expose it somehow.
  • etc.

APIs like Cairo and Skia were originally designed around the strengths and weaknesses of the typical CPU path rasterization algorithm. If we start from scratch and do something for GPUs only, then different choices can be made that would help make things efficient and more importantly keep the implementation simple. This is as opposed to taking a familiar API and implementing a GPU renderer for it.

Would you like to see drawing primitives like svg here, or something else?

From a usability standpoint, I think that SVG paths are nice in the sense that there are many good tools to create them, so there's value in making it easy for people to turn an SVG path into something that is usable with the 2D rendering API. But SVG is a very large specification, and making an actual SVG renderer is in my opinion out of scope. Fo example cairo isn't an SVG renderer but it is used in several SVG renderers.

Do you know of any existing APIs (in any language) that I could look at as an example?

Tough question. Unfortunately (in my opinion) a lot of 2D rendering libraries tend to either have been designed in the olden days or want to replace the older libraries and are shaped similarly. WebRender is interesting in its approach in the sense that it looks at the specific problem of rendering CSS interactively and tries to find the most efficient use of the GPU to that end. The notion of interactivity is built into the API (animations, scrolling), it has notions of low-priority and asynchronous updates to do expensive work without breaking smooth scrolling, etc. A game-oriented 2D API would look very different because the problems are not the same but I like the overall approach of looking at a specific and well defined problem, and doing it well by tapping directly into GPU APIs.

Personally I was thinking of something very simple, where the first task is to be able to blit very fast and efficiently, so for example you don't copy the same texture over many times. To start with I'd probably just work on getting triangles and quads with either gradients or a bmp, then add in other things.

That sounds reasonable, but in my opinion this already needs a notion of batches and render passes or be low level enough that the they could be implemented on top without overhead.

@nical
Copy link
Author

nical commented Sep 23, 2018

I don't want to make it sound like things need to be super ambitious. It's perfectly fine to say "I want something super simple that can fill and stroke shapes with simple patterns reasonably efficiently and that's it".
I brought all of this up because I believe that it's good to set some requirements/goals in order to figure out something that is both useful to the people who make or want it, and also isn't over-complicated by an architecture that tried to do it all at once without taking some stances. I'd rather have something simple and opinionated that is very good at one thing, than something that is okay-ish at a lot of things.
It's also fair to say let's experiment with a whole bunch of different things in parallel and then look at what pans out to decide what the real thing should be like, nothing needs to be set in stone now, but gathering initial motivation is a good start IMO.

@derekdreery
Copy link
Member

Apologies I think I fell asleep during the last message (which I've deleted) :P. I was just going to say thanks, I'll think about what you've said and try to firm it up into a proposal.

@derekdreery
Copy link
Member

How's this as a statement of intent?

This will start as a research project. We will try to solve the problem "whats's the simplest API that allows performance close to hand-written code". We will target the use-case of interactive games and applications that are highly interactive, although if we manage state well we should provide good performance for more static content.

Once we have a good 2D triangle-drawing system, and are handling transparency well, we can look at providing a simple drawing api, stroking and filling paths etc. Then we can look at performance, for example spotting shapes which are the same up to linear transformation and experimenting with doing the transformation on the gpu.

@derekdreery
Copy link
Member

Also this could be split into multiple crates, maybe one just abstracting gfx, and then others that work out the batching for you.

@nical
Copy link
Author

nical commented Sep 26, 2018

Sounds reasonable. I have some thoughts about the low level abstractions that I think make sense for this. I'll write them up some time in the next few days.

@derekdreery
Copy link
Member

derekdreery commented Sep 26, 2018

Awesome, thanks - I probably lack some knowledge of the optimal implementation strategy.

@raphlinus
Copy link

This is a topic I have strong thoughts and opinions about, in addition to some experience implementing similar things. I am hopefully a client of such an API, both in the game I'm writing, and text painting in front-ends for xi-editor (initially xi-win but potentially more). I'm happy to participate in the discussion.

Pulling back and looking at the crate layering, what I'd really, really like to see at the top level is a crate that provides a 2D graphics API abstraction, with the potential for multiple back-ends. I'm doing my prototyping now calling into direct2d, and that's working well for me. In the case of Windows, there are advantages to using the system's 2D graphics API, including small code size and minimal compile time. The tradeoffs will be different on different systems.

Another important back-end for such an abstraction would be web canvas. I think one of the great potentials for Rust is the ability to write code that can run well on both native and web deployments. The web canvas API is similar to API's such as direct2d and cairo, and if you think about applications such as charting, it makes sense for code to be portable.

In this picture, it sounds like the proposed draw2d crate would fit in as a back-end. One of the implications of having a multi-implementation abstraction is that it would be possible to compare different back-ends for performance and fidelity (ideally having a benchmark suite). I'd love for draw2d to eventually replace most of the back-ends.

Some general observations. It certainly is true that most 2D API's have an older feel, very stateful, not well suited to multithreading, etc. From my observations, most of them derive from Java2D, which in turn owes quite a bit to PostScript. I think it's possible to do better with a modern approach. Skia (and Android's libhwui before it) did a fair amount of work to reorder to improve batching, which helps, but it might be even better to expose a closer-to-the-metal interface up through to clients so you don't have to do as much work on the CPU tracing a graph to find out which reorderings are valid, etc.

Regarding the choice betwen immediate and retained, there is an intermediate position which may be interesting: explicit command lists. Many 2D API's implement these in some form (CommandList in Direct2d, SkPicture in Skia, Recording Surfaces in Cairo). A lot of times, these are implemented as just serialization of the drawing operations, but I think it's interesting to consider doing some "baking" so they can be replayed more efficiently. As an example, an arbitrary path may be tesselated during recording.

The concept of "layer" from Flutter might also be interesting. In Flutter, a layer might be implemented as a command list, or it might be rendered into a texture; this decision is made dynamically and using heuristics. The talk on Flutter's Rendering Pipeline gives more motivation for this.

Another major, major aspect of a 2d graphics library is text rendering. The traditional model is to support a texture atlas and expect other modules to (a) render from outline fonts into the atlas and (b) do shaping of text, selecting glyphs and x,y positions given a string with attributes. Again, this is a bit of a stale model, and does a very poor job in cases such as animating text through smooth changes of transform (or, to be even more advanced, variable font parameters). Newer approaches such as Pathfinder can push a lot of the rendering farther down the pipeline. Regarding shaping, I think it's totally out of scope for something like this crate, but in scope for higher levels - if you're using direct2d to paint graphics, then it makes perfect sense to use directwrite for both text layout and painting as well.

I hope these ramblings are interesting, and am very interested in following the progress.

@nical
Copy link
Author

nical commented Oct 9, 2018

Thanks @raphlinus!

I think it's possible to do better with a modern approach. Skia (and Android's libhwui before it) did a fair amount of work to reorder to improve batching, which helps, but it might be even better to expose a closer-to-the-metal interface up through to clients so you don't have to do as much work on the CPU tracing a graph to find out which reorderings are valid, etc.

This is the one thing I would like to get right first. I don't think many people have the time and energy to do something half as good as skia under the same design constraints, but if these design constraints are lifted I think we can do something that is worth the work and will be more appealing than a rust wrapper for something like skia for at least some use cases.
Having had my nose in Firefox's canvas2d code for a while, I pretty much think of it as a big collection of design choices to avoid.

In what aspects the API should be different is something I am still thinking about, but in general my process is to identify what's expensive in the way I would implement a 2d renderer, see if it can be retained and generated asynchronously and in parallel. For example turning a postscript-like representation of a collection of paths into something more directly usable by the GPU (tessellation, etc) is something that would be useful to let users do asynchronously and in different threads, rather than having it done synchronously by the implementation towards the end of a frame.
Batching is definitely something that needs careful consideration. Ideally it should interact well with culling so that minor modifications of the scene don't require invalidating many things. I like the concept of retained command lists with a little twist: animated properties which are values that you can change (transforms, colors, etc) without re-generating the heavier parts of the command list. Typically they end up just being updates to a bunch of values the shaders read from a buffer. This concept of animated properties works great in webrender, it's simple and efficient thanks to being a byproduct of how the rendering is implemented.

Pulling back and looking at the crate layering, what I'd really, really like to see at the top level is a crate that provides a 2D graphics API abstraction, with the potential for multiple back-ends.

I'd like that too, but I'd like to have a bottom-up approach and first design a simple and low level library around the strengths and weaknesses of modern GPUs (and their modern APIs), and then figure out a higher level abstraction that can be implemented on top of other APIs and gets to pay for the tradeoffs.
Building the portability abstraction certainly should feed back into the design of the lower level crate but in my experience it is hard to from high level to low level without losing simplicity and efficiency.

Getting webrender shipped and moving to a different city at the same time are draining 110% of my energy right now so I'm being a bit quiet and not writing a lot of code to back my claims, but I should have the bandwidth to prototype these things in a few months. In the mean time, I encourage anyone to chime in, write code and bikeshed away! I don't want to sound like I'm controlling this thing, but unless anyone else write some code, that's the direction I'll try to push this towards :).

By the way, Flutter's really interesting and I learnt about it by watching this talk you recommended in one of your own presentations, so thanks for that.

@derekdreery
Copy link
Member

I'm also quietly working away on getting a good grasp of vulkan and gfx-ll, I'm very much still working on this, but feel that I probably have some work to do to bring up my level of expertise. As a first exercise, I'm converting the gfx quad example to take a path, tessellated through lyon.

I see my role in this a bit like an apprentice, helping with the work and learning from people with more knowledge & experience. So I'm not at all precious about design decisions.

@federicomenaquintero
Copy link

Would you be interested in a braindump of what works and what doesn't work well in Cairo and its clients? (Maybe in a different issue than this one? Would you prefer a link to a blog post or something?)

I'm the maintainer of librsvg, which uses Cairo, but is mostly written in Rust these days. I was very involved in the transition from Gtk2 (X11 drawing) to Gtk3 (Cairo-based drawing), and have been watching the transition away from Cairo for Gtk4 from the sidelines. I was the author of gnome-canvas, a rather well-loved and well-hated retained drawing API based on @raphlinus's libart, which later got replaced with Cairo.

@nical
Copy link
Author

nical commented Oct 12, 2018

@federicomenaquintero I'd love to hear your thoughts on whichever medium you prefer.

@boustrophedon
Copy link

I have some small experience in this area and will be available to work on it for the next 2-ish months along with some other projects (and I also would like to use it in a UI library eventually)

@nical Regarding "animated properties which are values you can change [...] updates to a bunch of values the shader reads", my first thought was SkPaint. I don't remember exactly how much caching Skia does internally but I think it does do less work if you present it with a re-used SkPath and a different SkPaint.

@federicomenaquintero i'd also be very interested in reading a blog post or other form of info dump about cairo.

I had these two little thoughts:

  • It would be nice to have, probably as a separate crate on top of draw2d, an interface exposing the usual "immediate-mode/turtle-graphics" canvas API, because a lot of people are familiar with it.
  • It would also be nice to have a fully-featured stable C API.

@raphlinus
Copy link

I would also love to read @federicomenaquintero's thoughts. As he mentioned, he and I go way back on this stuff. Certainly 2D rendering has come a long way since the days of libart! I suspect such a retrospective would kick off fertile discussion, which again I'd love to participate in.

I should also point this thread to the blog I wrote about my goals for a higher-level crate, and Reddit discussion. Perhaps that overlaps with @boustrophedon's first bullet point.

@est31
Copy link

est31 commented Oct 13, 2018

Given this was featured on reddit, let me chime in. My main concern is stability across configurations, including possibly older graphics hardware. Thus, I think that the presence of a CPU fallback is important. The stability ambitions of this crate shouldn't be lower than those of Firefox for example, who only roll out Webrender to some few known configurations, and maintain a CPU fall back for the remainder. If this crate could offer an unified API over both a CPU implementation and a GPU one, including maybe logic to choose which one to use right now, it would be really great.

I've tried out limn a few months ago but it had weird crashes and they were all Webrender bugs... my hardware HD Graphics 530 (Skylake GT2) is atm not whitelisted by Firefox for use. My main use case for this library would be to either build or use a GUI system that uses this.

@boustrophedon
Copy link

Given that a lot of the conversation is around rendering for GUIs, it might be useful to summarize the techniques used in contemporary GUI rendering engines like Flutter, GTK/GSK, QT, and others (WebRender? Enlightenment/Evas?)

I think in general they all use some sort of scene graph, with varying levels of control wrt how the scene graph is stored - eg as Raph mentioned, due to how Skia is architected, Flutter uses Layers that abstract over Skia command lists.

Given some pointers I could at least attempt to start a document. Qt has some relatively good documentation, Flutter, as mentioned previously, has some very informative talks on their architecture, GTK has some scattered documentation, I haven't looked until just now but it seems that Enlightenment has some pretty good documentation.


As a separate thread, another nice thing to have for use in GUI libraries would be built-in hit testing (for mouseover etc.).

@nical
Copy link
Author

nical commented Oct 15, 2018

@nical Regarding "animated properties which are values you can change [...] updates to a bunch of values the shader reads", my first thought was SkPaint. I don't remember exactly how much caching Skia does internally but I think it does do less work if you present it with a re-used SkPath and a different SkPaint.

Skia caches the tessellation of the path (the path turned into triangle mesh). There are plenty of other expensive parts of the pipeline that could be reused over frames but aren't because the API makes it hard for the underlying implementation to figure out what and how some things changed.
In addition to caching expensive things, I think that a modern API should let the user do these expensive operations asynchronously prior to submitting the drawing commands in order to avoid risking to blow the frame budget the first time that complex path is being submitted.

@derekdreery
Copy link
Member

I've been playing with flutter to get a feel for their approach. I've had a pleasant experience, but I feel sometimes they take control away from you - you can't write your own shaders or anything. I'm going to investigate skia next to see what extra stuff you have access to.

As a separate thread, another nice thing to have for use in GUI libraries would be built-in hit testing (for mouseover etc.).

I think this is important for accessibility as well.

@nical
Copy link
Author

nical commented Oct 15, 2018

There's a lot of stuff going on in this thread so I created dedicated issues for for UIs and games in the hope that separating the dicussion will make the information easy to find and manage. There are other areas which we can get to in their own issues as they come up. We can also use pull request to this repo if anyone is motivated by writing up analysis of existing techniques in a rfc-fashion.

@est31
Copy link

est31 commented Oct 16, 2018

The intersection between the two, UIs for games is also very interesting I think. E.g. citybound recently switched to using a web based UI system for their otherwise Rust game.

@siriux
Copy link

siriux commented Oct 22, 2018

Hi @nical, thank you for bringing this up!

I've been working for a few months on something really similar to what you are proposing here because I need it for a project of mine. So let me explain my approach to see if it can be helpful for others too.

For my use case I need a mix of UI and vector graphics that can be animated (including things like 2D games), so this fit's perfectly with your proposal. Also the UI and graphics are defined on de fly (like a browser that only knows the page dinamically).

My approach is also bottom up, and it's heavily based on ideas from Pathfinder (and also WebRender and Lyon), but at the same time the core idea is really different.

In my design I have a single shader capable of rendering all the base pieces needed for UI and vector graphics (a single piece is a GPU entity). Having a single shader allows you to order your elements as you described for opaque and transparent passes and execute it on a single batch, as you don't need to change shaders everytime the next primitive needs a different shader. At the same time you need a shader that is simple enough to be fast, as having a really big uber-shader can be slow.

We can think of the basic pieces for the shader as two (a few more in reality for optimizations), an axis aligned rectangle for UI and a simple quadratic bezier curve. For both of this pieces you can define a border, including dashed borders, and a color, gradient or texture(s) for the fill and the border.

And here we have the main difference with Pathfinder's approach, we are able to render curves with border and dashing as a primitive (without tesselation) and we also integrate axis aligned rectangles to be able to render UI elements really fast and to perform compositing at the same time.

The trick to render borders in a single primitive is to use pentagons for the entities instead of triangles or something else, that allows you to easily render the border and the fill at the same time. For the aligned rectangles we just degenerate one vertex.

To be able to render dashed borders directly I perform an approximation to the normal to the curve at a given point that is really fast and it works reasonably well in most cases (in twisted curves with thick borders or large zoom quality is not perfect, and semi transparent borders of twisted curves need a fallback with two steps).

Of course this changes all the math from Pathfinder, but I'm quite happy with the results and the number of operations required.

Once you have the basic pieces working, you need a tesselator (similar to lyon, but many things will need to be reimplemented to meet the assumptions of the new math).

Finally, you need to compose all of this in a single flow. In my design I have the screen buffer and multiple offscreen (intermediate) buffers. To render a frame you use the same shader all the time but you change the blending and z-buffer config, as well as the source and target buffers.

For example, you can render a text (or some tiles of it) in a buffer, and in the next pass render an axis aligned rectangle that uses part of the text as texture, mixed with other texture for background and a color for the border. The basic idea is that you have upto three passes per buffer (masking, opaque and transparent) and you can do this as many times as needed with different buffers before the final pass to the screen buffer. This of course allows caching of already rendered things.

Now let's move to animations. We said that our pieces are GPU entities, but there is more to that. We have simple pieces, like axis aligned rectangles, that are described only by the information contained in the entity, and this is great because they take little memory and they can be incredibly fast. But if we want to represent complex things also as a single entity, and we don't want to pay the memory overhead for the simple ones, we need to store this extra information somewhere else.

In my design we store it in additional textures that we index with info provided by the entity itself. This is really good, but it also opens the door to optimized animations.

The idea is that you can have two equivalent ways to define the same piece of extra information, one that it's static, and another that is dinamic. The difference is which texture is used to read it. This way we can put all the static information in a texture, and all the info that can potentially change during the animation in another texure. Then, during the animation we only have to update the small texture, and everything else can stay in GPU memory.

The final thing of my design is to represent everything on your screen in a tree, that you can update to change the contents of the screen. The tree has some restrictions to be able to integrate with the rest of my system, but nothing really special.

This tree structure is perfect to describe render ordering in UI and vector groups implicitly (no expensive list ordering most of the time), and is also the perfect structure to define which elements to cache and how to separate them in layers for animations.

I think that's the perfect design for my use case, but in my opinion, it is also a really good design for UI only projects (integrated texts and icons on the pipeline) and also for 2D games.

Now, the current status. Right now, appart from the design that I described (and many many more details I didn't describe but that are needed to make it work) I have implemented the core math to render bezier curves with dashed borders in a js+canvas simulator. That has taken a lot of time to try different approaches and to polish it to get the quality to the level I need with a "small" number of operations.

The simulator is just a canvas with some js code, but I restrict myself to work in the same way the GPU works. So, I have a single function that takes the information for the current entity and a pixel position and returns the pixel color, and it's just math, so it's straight forward to port the core algorithm to the GPU. I've also checked that everything I use is available in relatively old OpenGL and WebGL 2.

I also started porting it to rust, with glium, and I tested a few things, but then I decided to wait for gfx-rs because I think is the way to go.

Now I think gfx-rs has reached a state that I can use it to start writing my design, and if there is interest and other people involved I can work almost full time on it for a few months, and part time afterwards.

I include some screenshots of the simulator, also the source code so you can play with it (really easy, just open index.html).

bezier_simulator.zip

The important source is directly written inside index.html. You can use the mouse to move the points, and left and right arrows to change the thinckness of the curve, and the length and offset paramener at line 690 to change the dashing. Just two warnings, it's really messy code, and it can hang your browser if you try to do really large curves. So consider it as it is, a proof of concept.

image

image

image

image

image
image

image

@siriux
Copy link

siriux commented Oct 22, 2018

Sorry, I forgot to mention that the simulator contains code from Pomax that I've modified (in the other .js files). https://github.com/Pomax

If you want to learn a lot about implementing bezier curves there is no better place than his page: https://pomax.github.io/bezierinfo/

@nical
Copy link
Author

nical commented Oct 22, 2018

@siriux exciting! What you describe indeed reminds me of a mix of Pathfinder and WebRender and also somewhat reminiscent of FastUIDraw for the über-shader approach.

I totally agree that separating the opaque/blend passes and taking advantage of depth buffer for to reduce overdraw (and masking in some cases) is the way to go. I think that for the opaque pass, having several shaders is fine, but having fewer shaders for the blend pass certainly reduces the need amount of state changes.

The trick to render borders in a single primitive is to use pentagons for the entities instead of triangles or something else, that allows you to easily render the border and the fill at the same time. For the aligned rectangles we just degenerate one vertex.

Interesting could you expend a bit about this? I'm curious about what the geometry looks like in general (what type of tessellation you need, etc). Pathfinder decomposes filled shapes into trapezoids rather than using a traditional triangle mesh which greatly simplifies the problem of generating the geometry and doing anti-aliasing, but can show seams under non-axis-aligned transformations (this problem is of course fixable but a bit tricky still).
You mentioned tessellation which is something I've spent quite a bit of time working on so don't hesitate to reach out if there's anything I can do to help.

The idea is that you can have two equivalent ways to define the same piece of extra information, one that it's static, and another that is dinamic. The difference is which texture is used to read it. This way we can put all the static information in a texture, and all the info that can potentially change during the animation in another texure. Then, during the animation we only have to update the small texture, and everything else can stay in GPU memory.

Having the shader read data from two textures means it needs to know which things to read from which texture. Another way that I think works well is to have for each entity a structure stored in a buffer (or packed in a float texture) that contains the offsets of the parameters to the shader (transform, UVs, color(s), etc.) and have a single buffer/texture that contains both static and dynamic data, with all of the static data stored at the beginning of the buffer and the dynamic data at the end. Updating the dynamic data is simply a matter of writing the new values in a staging buffer, upload it and do a GPU copy of the staging buffer into the actual buffer that the shader reads. The classic way to update buffers, really, just being careful about separating out static and dynamic data in order to avoid uploading static data when you don't need to. Having this intermediate descriptor with offsets to the parameters gives the flexibility of storing parameters where we want (and even allows aliasing parameters for example if a lot of entities have the same transform), at the cost of having a fetch that depends on another which affects latency, but GPUs tend to be good at hiding this latency. We do this in WebRender and it performs very well.

Now I think gfx-rs has reached a state that I can use it to start writing my design, and if there is interest and other people involved I can work almost full time on it for a few months, and part time afterwards.

I think that a lot of people in the rust community would love to be able to use the library you describe. If I may ask, what kind of license would you apply to this work? How can people help you?

@siriux
Copy link

siriux commented Oct 22, 2018

Hi @nical,

About the geometry and tesellation, I just use five vertices to be able to represent a quadratic bezier curve (that fits inside a triangle) plus its border. You start with a single curve defined by 3 points, you offset it and you get 3 new points, but you can represent the whole curve + border using a poligon with 5 vertices, 3 from the offseted curve plus 2 of the original one (we discard the control of the original).

In practice you want to offset a little bit this 5 vertices to the outside to leave some room for the antialiasing to work. This way you get AA on the curved parts of the curve, put not on the "caps" of the border, but that's just fine because usually will be joined with other borders (if they are "naked" you can add AA caps if you want).

For the tesselation, you need to decompose the curves first into quadratic beziers that are soft enough for the offseting to work (I use an heuristic similar to the one described by Pomax, but improved and with some problems fixed). Also, you need to perform the intersections between curves and straight lines to find any holes.

Finally you need to define which side is interior or exterior for each curve, this allows you to eliminate the curved parts and get the interior polygon that you can triangulate in any way you want (or with trapezoids or even pentagons).

I don't have all the details worked out for the intersections, triangulation, ... but I've read some papers and also your code for lyon and the tesselation of pathfinder and I'll work from that. I'm sure your help would be invaluable for this, as you have put a lot of effort on this part of the problem.

At the same time, I think the new way of rendering the curves simplifies somethings (you don't need the border tesselators) and removes some limitations needed by the pathfinder tesselator if I undertand it correctly (I would have to check my notes, but I think it removes the need for orientation aligment with respect to the pixels).

With respect to the data reading from the textures, I completely agree with you, and that is a possibility to take into account. But we will need to design the binary representation carefully to avoid consuming too much memory for the simple primitives.

And I'm really happy you mention the latency hidding, because it took me some time to really understand it and that was the thing that finally decided me to go with the extra data on textures instead of putting everything on the entity.

And I guess it will be even better in our specific case, because I think we can provide all the data needed for the heavy math upfront (for the bezier curves), and all that math will hide even more latency.

Finally, about the license I'm thinking about the same as Rust (MIT + Apache) as I also think it could be useful in many other situations. For my whole project I'm not so sure, and I might end up with something more restrictive like MPL or even GPL, but for generic libraries (not final products) like this I think is better to give more freedom.

I would like to hear from other other people willing to help with this, either coding or participating in other ways. Maybe @boustrophedon or @derekdreery would be interested in joining this design idea. @nical, how do you propose we organize? Should we use issues in this repo to communicate or is it better to use something else?

In any case, I think I would start by porting the simulator to gfx-rs to see if all the assumtions I made really hold, and work from there. I'm not really interested right now in defining the user facing API and I don't have a real opinion on how it should look like, but I feel we have to get the internals right first, and then see which API makes more sense.

Cheers

@nical
Copy link
Author

nical commented Oct 22, 2018

About the geometry and tesellation, I just use five vertices to be able to represent a quadratic bezier curve (that fits inside a triangle) plus its border. You start with a single curve defined by 3 points, you offset it and you get 3 new points, but you can represent the whole curve + border using a poligon with 5 vertices, 3 from the offseted curve plus 2 of the original one (we discard the control of the original).

Ah I see. Some aspects of this are similar to vertex-aa (the need to extrude a band of vertices along the contour). It can be quite challenging to implement in a tessellator if you need the guarantee that the extra geometry does not overlap with other geometry (might not be the case for your approach). There's a good presentation about how this works in Skia's tessellator which might probably be easier to understand if you have read the one about how their tessellator works.

Other aspects of your approach look similar to something I have been meaning to try for a long while but I haven't managed to spend the time to get all the way there yet (time flies, the last change to that wiki page was more than two years ago). I'm glad we are converging towards a similar vision. FWIW I has started a re-write of lyon's tessellator (it's taking a while but I intend to finish it) with the goal to be able to support monotonic quadratic bézier curves within the tessellator and do what the wiki page explains which goes a long way towards what you need if I understand correctly.

Finally, about the license I'm thinking about the same as Rust (MIT + Apache) as I also think it could be useful in many other situations. For my whole project I'm not so sure, and I might end up with something more restrictive like MPL or even GPL, but for generic libraries (not final products) like this I think is better to give more freedom.

Great. I would like to make at least some of the difficult core pieces like the low level code permissively licensed (MIT + Apache or at least MPL2).

And I guess it will be even better in our specific case, because I think we can provide all the data needed for the heavy math upfront (for the bezier curves), and all that math will hide even more latency.

That and also fetching some of the data early in the vertex shader instead of the fragment shader (and passing what the fragment shader needs as uniform).

I'm not really interested right now in defining the user facing API and I don't have a real opinion on how it should look like, but I feel we have to get the internals right first, and then see which API makes more sense.

I 200% agree with this.

@derekdreery
Copy link
Member

derekdreery commented Oct 22, 2018

In any case, I think I would start by porting the simulator to gfx-rs to see if all the assumtions I made really hold, and work from there.

I'd be very happy to help porting. Do you think there is a natural way to divide up the simulator, or to split the work into non-intersecting tasks?

@siriux
Copy link

siriux commented Oct 22, 2018

Other aspects of your approach look similar to something I have been meaning to try for a long while but I haven't managed to spend the time to get all the way there yet.

Yes, I know about that page, that's exacly what I'm trying to do. Once that's implemented, the only problem I see of using lyon, instead of a custom solution inspired by it, is that it would be nice to be able to update part of a vector path, and update only that part of the tesselation. Anyway, using lyon would save a lot of time.

Great. I would like to make at least some of the difficult core pieces like the low level code permissively licensed (MIT + Apache or at least MPL2).

Sure, I would like to make all the rendering library permissively licensed, and only the products built with it (my personal project) more restricted.

That and also fetching some of the data early in the vertex shader instead of the fragment shader (and passing what the fragment shader needs as uniform).

Yes, that's the idea.

@nical
Copy link
Author

nical commented Oct 22, 2018

Once that's implemented, the only problem I see of using lyon, instead of a custom solution inspired by it, is that it would be nice to be able to update part of a vector path, and update only that part of the tesselation. Anyway, using lyon would save a lot of time.

Lyon isn't set in stone and I'm open to adding features if we have a good idea of how to implement them and they don't regress performance too much for the common use cases. Don't hesitate to reach out about changes and we'll look at the different solutions (making another tessellator might turn out to be the best solution we'll see).

Updating part of the vector path is a complicated topic but not unreasonable. The tessellator (like many 2d geometry algorithms) is a sweep-line algorithm so it's in nature very sequential and interdependent, but one could imagine a scheme where the tessellator pre-generates slices of the path that can be re-built independently. I'm getting ahead of myself, though. In think that getting the proper data representation to the GPU and lighting up the right pixels is already an enormous amount of work and I'd rather get somewhere close to that before optimizing partial re-tessellation :)

@raphlinus
Copy link

My personal feeling is that being able to update part of a vector path is not something that would be used widely. You'd need an API for being able to address into a vector path, mutate, and so on. It feels like a very specialized niche feature, probably best in 99% of cases to make rebuilding vector paths from scratch fast. If you really care about incremental editing, you can use a cache model (much simpler API), but it's not obvious that will pay for its complexity. Just my 2¢.

@siriux
Copy link

siriux commented Oct 22, 2018

I'd be very happy to help porting. Do you think there is a natural way to divide up the simulator, or to split the job into non-intersecting tasks?

Hi @derekdreery,

Porting the simulator itself I think is very monolithic, and the largest part will be to learn how to use gfx-hal. But it shouldn't take long to have something on the screen.

Once there is a base I'm sure it will be much easier to split the next steps into non-intersecting tasks. Give me a couple of days to see if I can have something working.

Also, if you want you can learn more about gfx-hal and the algorithms behind pathfinder 2, as my ideas are based on it.

@siriux
Copy link

siriux commented Oct 22, 2018

Lyon isn't set in stone and I'm open to adding features if we have a good idea of how to implement them and they don't regress performance too much for the common use cases. Don't hesitate to reach out about changes and we'll look at the different solutions (making another tessellator might turn out to be the best solution we'll see).

Thanks, I think is best to start with your new implementation of lyon, and see if we can fit everything there. Anyway, there are plenty of things to do before we need the tesselator.

@siriux
Copy link

siriux commented Oct 22, 2018

My personal feeling is that being able to update part of a vector path is not something that would be used widely. You'd need an API for being able to address into a vector path, mutate, and so on. It feels like a very specialized niche feature, probably best in 99% of cases to make rebuilding vector paths from scratch fast. If you really care about incremental editing, you can use a cache model (much simpler API), but it's not obvious that will pay for its complexity. Just my 2¢.

Yes, you are right. The use case I was thinking are animations or games that not only move things around but that modify the paths itselves (maybe not common, but useful in plenty of cases). But probably I'm getting ahead of myself, and we can see if rebuilding is a problem once we have the rest working.

@boustrophedon
Copy link

boustrophedon commented Oct 22, 2018

Hi! I spent some time last week working on what an API would look like with the specific goal of having it mesh at least somewhat well with how GPUs handle data. You can look at it here: https://github.com/boustrophedon/draw2d-api-sketch and run the example with cargo run --example example. It outputs to /tmp/output.png.

The main use-case I was thinking of for this was UIs: You want to mostly draw lines, rects, rounded rects, circles, and text.

The core concept is simply that you specify some geometry, you specify some paint properties, and then optionally associate the paints to geometry using opaque handles returned by the renderer. It's basically a retained-mode version of cairo/skia/html5 canvas. What this gets you is the ability to "scroll" a bunch of objects by changing a single paint object, and the ability to change specific geometry elements (you'd still have to rebuild an arbitrary pre-tessellated vector path from scratch as discussed above) without invalidating other work done for other elements.

I tried a few things involving fancier APIs with trees and "group" or "layer" objects, but I realized that they made things too complicated. With just the handles and ability to associate paints to multiple geometries, you can define your own layers and layer-composition structures/scene-graphs.

Some downsides/problems:

  • Handles can become stale
  • Handle resource deletion is manual resource management
  • The transform is associated with the paint, needs to be split up more so that you can e.g. make a layer such that you can transform all of its children by a single transform.
  • API to render only a subset of the geometry. Could be as simple as adding a enabled: bool to the paint, or could be explicitly making "GeometrySets"/groups/layers, or even simply passing in a list of handles to render. I think the last one is probably the closest to being the right one.

The next step I would do here is figure out the best way to actually implement this with a GPU. Specifically, questions like: do we keep separate buffers on the GPU for each piece of geometry+paint, or use some kind of instancing and basically only upload the paints plus a discriminant for geometry type? Should the paints act more like uniforms or per-"vertex" data which is fed into a tessellation shader? Do we use separate shaders for each type of geometry or have one ubershader?

Thankfully, there is a lot of prior art for this part that many smart people have already done the hard parts of e.g. the Ganesh backend of Skia, Pathfinder, and Lyon. :)

@nical
Copy link
Author

nical commented Oct 23, 2018

It's a bit early for me to delve into API details. We can look at it in broad strokes but I don't think we can get the details right until we have a good idea of what goes underneith.

Some properties that I would want:

  • A command list API: command lists can be built independently (on any thread) and then submitted to the renderer and reused.
  • A way to animate some properties without re-building the entire command list.
  • Organize data so that animating one property doesn't force cause rebuilding all other properties (for example scrolling all of the elements should not need to re-build all paints, but rather just change a single transform and allow many elements to use that transform (you mention that in one of the bullet points).
  • Support for instancing, maybe at the command list level or on groups of drawing commands (in a way that is guaranteed to use instancing with GPU backends rather than as a heuristic optimization.

Some other thoughts: I'm not convinced a single shape is the best level of granularity (you mention geometry sets). For example accumulating paths with their own drawing parameters and calling it a mesh or layer, and then have the possibility to have per-mesh parameters and instancing the the mesh level. Working with groups of paths simplifies a bunch of things for example managing fewer allocations for all of it's vertices and drawing parameters, and making it explicit that the lifetime of all of the pieces is the same means we don't have to manage each piece individually. I'd even like to try to have the ability to group several meshes that have the same lifetime into a single handle as far as tracking and grouping allocations is concerned (independently of drawing command submission).

do we keep separate buffers on the GPU for each piece of geometry+paint

That's costly if we want to support drawing a lot of things which we should aim for. The grouping I'm describing above would help. In vulkan the recommendation (at least from nvidia) is to create a single very large buffer and do our own allocation schemes into it, which means lots of freedom but also lots of work. So whatever the API turns out to be it should be made to facilitate organizing data efficiently.

@siriux
Copy link

siriux commented Nov 5, 2018

Hi,

Some things got in the way, but I finally managed to work on this and start to port the bezier renderer from my js simulator to gfx_hal.

I just managed to get a basic curve on the screen with simple dashing using gfx_hal with the vulkan backend. Actually, porting the main algorithm was quite fast, but learning vulkan and gfx_hal it was not as straightforward, xD.

There are still various important things to implement to be equivalent to the simulator, like splitting the bezier curve to meet the requirements of the algorithm (now I use a valid harcoded curve), adding interactivity, ... Also, it doesn't work on the GL backend because instancing is not implemented, and it crashes if you resize the window (seems a gfx/driver problem, because the quad example crashes too).

I want to work a little bit more on it before uploading it to a public repo, but I wanted to tell you that the approach seem to be valid, and that I'm working on it.

Here is a screenshot with the current curve:

image

@siriux
Copy link

siriux commented Nov 5, 2018

I almost forgot. If you want to see more clearly how my idea works with the 5 vertices polygon, here it is another screenshot with red background for the polygon.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants