Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Proposal for Cross-plat Server-side Image Manipulation Library #2020
Proposal for Cross-plat Server-side Image Manipulation Library
The .NET Team and many developers in the community would like a graphics API for .NET Core and so we are starting to work on this. Of course, "graphics API" is a very broad term. To narrow down the scope we are looking at these main needs:
To address these needs, we plan to start experimenting with a cross-plat server-side image manipulation library.
In the next few weeks we will have some .NET summer interns start prototyping parts of this library so we are excited to get started!
Options for Cross-plat implementation:
Option 1 is currently our preferred option. There are many libraries for cross-plat server-side image manipulation that we have been investigating. OpenGL based libraries and libGD both looked promising. For this project we are starting to think libGD is the best place to start.
We have already had some great feedback from the community on primitive drawing types which we expect to use for this library. Now, we are ready for feedback on the image manipulation library. The first scenarios we are looking into are thumbnailing and watermarking. Please provide feedback on the goals, options for implementation, native image manipulation libraries we could use, and scenarios that we should address.
"Server-side" helps define what we're initially focusing on, both in scenarios and implementation. Unless we run into something unexpected, it should also work on or off the UI thread in client apps. Also note that goal 6 is that we design something that can also support client scenarios in the future.
Direct link to specific notes about building this kind of library.
LibGD is certainly the closest fit, but it needs lots of work. I do not believe it is possible to wrap LibGD 2.1 as-is; you would have to fork LibGD into a completely incompatible API in order to implement error handling (as well as to fix some hard design flaws).
Once you've done that, you've essentially committed to maintaining your own library. I know that the maintainers plan for LibGD 3 to be much closer to our needs, but nobody is funding work in this direction, and it's a helluva lot of work.
The path of least resistance is to create a very focused C library that only implements operations that are very fast and have predictable performance. Your managed wrapper (and LibGD) then consume this new API for all core processing needs. All vector drawing, font parsing/rendering should be segmented as a plugin that wraps Cairo. Png, jpeg, and gif support should be built in, but every other format should be a plugin. This way users can opt-in to high-risk features like TIFF parsing and font rendering, which have a terrible security record on every platform.
Platform APIs aren't even consistent enough between versions of Windows to be considered here. Also, everything is broken in this context, and it can't be fixed. This library has to be app-local versioned all the way down, or you're creating ultimate misery and security nightmares.
Naive C++ implementations are typically orders of magnitude too slow for real-time. I think a 100% managed code implementation is unrealistic. And if it doesn't need to be real-time, stick it in a queue for a linux box to deal with. ImageMagick is orders of magnitude slower than my algorithms, but it will get the job done fine in a queue. And libvips (despite no transparency support) is quite fast - fast enough that it can beat any managed imaging even if you include I/O, queue time, datacenter-local networking, etc.
Basically, this C library needs to be good, really good, or it won't have any reason for existence outside of the .NET community, and that would bode ill for its future health.
To add a bird-level feedback here: it would be great if in addition to a potential default set of .Net Core supported image types and manipulations, one should be able to write and plug in custom ones for both - codecs to support more image/media & manipulation types/transformations etc and even replace / override default .Net Core ones if one requires/wants to do so.
Ideally these .Net Core interfaces should also allow to be async/awaitable i.e. to allow transformations to be off-loaded to local GPU driven transformations, external processes or even remote systems/services (for example heavy lifting medical image reconstruction/transformation on remote server clusters).. which ought to be transparent for the local .Net Core application, be it a server or desktop one.
We should separate our high-level API needs from our low-level primitive needs.
At a high level, users will want (or end up creating) both declarative (result-descriptive) and imperative (ordered operation) APIs. People reason about images in a lot of different ways, and if the tool doesn't match their existing mental pattern, they'll create one that does.
At a mid-to-high level, I'd love to see a generic graph-based representation of an image processing workflow. Visual details will change depending on the backend (no two imaging libraries produce the same results), but being able to mix and match libvips, imagemagick, libgd & managed code would be nice. Cons: hard to reason about, complex to work with directly. Multi-dimensional images (TIFF, GIF) add even more trouble. Pros: Easily wrapped as a declarative API, as an imperative API. Can apply advanced optimizations and pick the fastest or best backend depending upon image format/resolution and desired workflow. Given how easily most operations compose, this could easily make the average workflow 3-8x faster.
From a practical standpoint, it's best to start with the low-level operations, and expose reusable APIs that others can build on top of. We don't want to chase data structure genericity at a low level. For example, if you expose an interface that supports multiple color spaces or bit depths, you implicitly force APIs to support all of those permutations, many of which will make no or little sense. Most compiler optimizations for inner loops only happen when the channel byte count is known ahead of time; this matters more than you'd think.
Key low-level primitives
Operations requiring matrix transposition (which we avoid at all costs)
Scale, convolve, rotate 90 degrees, blur, and sharpen - can be composed and require a single transposition. Separately they would require 7.
You'll note that affine transform/distort is notably absent. Distortion has exponentially bad performance with image size - it's not linear. Large convolution kernels have a similar effect. Distortion is rarely needed and use should be minimized. Cairo's implementation is fine.
On top of these primitives, and combined with existing codecs, we could build a respectable image library.
Please, add EXIF support, because it is hard to rotate image without affect EXIF orientation Tag. Also resized image should have all Exif tags from source image.
One of the main goals of a server is to serve the existing images, so being able to use optimized images will improve both storage and bandwidth usage.
So it would be great to have some low level methods to work on images and provide a guide or higher level methods to quickly optimize them.
The optimizations will depend on the type of image (lossless and lossy, ex, png and jpg) as well as knowing if the optimizations must be done in a lossless way (removing metadata) or some data loss is allowed.
So the first step is to remove metadata from an Image, Adobe is know to include huge amounts of data inside images, and many images might include thumbnails that aren't used. Stripping that away is step 0.
Integrate code from projects like jpegtran, jpegoptim, OptiPng, pngcrunch ... or develop similar code to reach those goals (specific ways to optimize jpg and png)
For Png, recompress the chunks using Zopfli
Being able to use other encoders like mozjpeg
Convert jpgs to subsampling 4:2:0
Recompress based on visual quality instead of a meaningless "quality" percentaje: http://calendar.perfplanet.com/2014/little-rgb-riding-hood-a-jpegs-tale/
Some of these operations are far more CPU expensive than the others, so they can't be blindly applied everywhere, but it's clear that in the long term having an optimized image will be much better than using a bloated one, and being able to optimize an uploaded image will be a great bonus for all the .Net developers, and the good thing is that the different optimizations can be build little by little, start by knowing the final goals and then work on each of the parts trying at least to being as good as other existing tools that can't be used in a managed environment.
Given the importance and technical difficulty involved in creating such a library I worry that writing it has been deemed such a low priority that it has been assigned to summer interns. Image processing done correctly is difficult and will require a wealth of experience and expertise.
I did not catch that on the first read-through. Perhaps you should consult with other imaging experts within Microsoft whom you trust (perhaps in Research or WIC) to get their opinion on whether this is the best approach? I personally find it far easier to create or work on compilers than to create correct and performant image processing algorithms; the former is easier to mathematically represent, has fewer variables, and requires less computer science knowledge.
Resources for the interns
There are not many great textbooks on the subject. Here are some from my personal bookshelf. Between them (and Wikipedia) I was able to put together about 60% of the knowledge I needed; the rest I found by reading the source code to many popular image processing libraries.
I would start by reading Principles of Digital Image Processing: Core Algorithms front-to-back, then Digital Image Warping. Wikipedia is also good, although the relevant pages are not linked or categorized together - use specific search terms, like "bilinear interpolation" and "Lab color space".
The Graphics Gems series is great for optimization inspiration:
I'm not aware of any implementations of (say, resampling) that are completely correct. Very recent editions of ImageMagick are very close, though (We got AppHarbor builds going, BTW!). Most offer a wide selection of 'filters', but fail to scale the input or output appropriately, and the error there is greater than the difference between the filters.
Source code to read
I have found the source code for OpenCV, LibGD, FreeImage, Libvips, Pixman, Cairo, ImageMagick, stb_image, Skia, and FrameWave is very useful for understanding real-world implementations and considerations. Most textbooks assume an infinite plane, ignore off-by-one errors, floating-point limitations, color space accuracy, and operational symmetry within a bounded region. I cannot recommend any textbook as an accurate reference, only as a conceptual starting point.
Also, keep in mind that computer vision is very different from image creation. In computer vision, resampling accuracy matters very little, for example. But in image creation, you are serving images to photographers, people with far keener visual perception than the average developer. The images produced will be rendered side-by-side with other CSS and images, and the least significant bit of inaccuracy is quite visible. You are competing with Lightroom; with offline tools that produce visually perfect results. End-user software will be discarded if photographers feel it is corrupting their work.
And, as always, I suggest that it is negligent to start a new project until you have completely read all issues and bug reports filed against similar existing projects. Everything about this space looks deceptively simple, when in fact it is intractably complex and success depends on make the correct compromises on day 1.
We're not expecting the interns to implement an entire graphics library. The scope of their summer project is still flexible, but will probably involve wrapping an existing native library with a .NET API.
We want interns to have a great experience at Microsoft, so we try to give them projects that are interesting, that they can be successful with, and that they will feel like they have made an impact. So prototyping parts of a new .NET graphics library is really an ideal intern project. They'll be working with members of the .NET team and together we'll also be relying on community feedback to guide us.
There is an unfortunate lack of overlap between permissively-licensed libraries and libraries that can be wrapped as-is. I completed a low-level generated wrapper and part of a prototype high-level wrapper for LibGD, so I know the scope of work here.
I can't think of a more cruel project to give them - this is one of those projects where you "minimize scope of failure", not "succeed at".
There are few things worse than being assigned to a project where management refuses to acknowledge the scope of the problem you're forced to solve. Since I am strongly against abusing interns, I'm going to keep writing until there is a crystal clear understanding of the challenges involved, and plenty of documentation they can reference in their final report.
I'm focusing on the native side of things, since that is the difficult part. The outer managed API is comparatively trivial.
I'd love for your team to find a library that just needs to be wrapped, but I've been analyzing and benchmarking different libraries for years without success. Name a library and I can list the issues.
This smells like throwing interns to the wolves on a problem nobody wants to think hard about (or, more likely, be responsible for).
@nathanaeljones As the PM intern assigned to this project I really appreciate the study material and all of your insight! I was wondering, what do you think of OpenGL and SDL in this scenario? I saw that you researched it, but I was curious why you decided against it?
Also, I've been here a few weeks and there has been barely any intern abuse. ;) Just so you know, our futures don't depend on whether or not a tool ships. We have great support from devs who will be in the trenches with us and really, we are simply prototyping and identifying issues. The more problems we identify the better arguments we will be able to give for what library Microsoft should take the time to develop on.
SDL doesn't have a software implementation of resampling, and I haven't found an effective way to leverage OpenGL yet. Texture rendering isn't resampling. Mipmapping is fine for doing part of a scaling operation, but the last 300% of scaling must be done with a correct interpolation filter. There aren't any primitives in OpenGL or in DirectX that offer acceptable visual quality.
Back in 2013 I created a benchmark to isolate performance issues in DrawImage and compare it to Direct2D. Direct2D's HighQualityCubic implementation was terribly slow (1-2 seconds for a moderately sized image). Same op was < 40ms on the CPU, single threaded.
On the OpenCL front, there's Halide, which is really interesting, but claims OpenCL isn't production ready. The Open CL 2.1 SPIR-V intermediate language looks great, but we're talking about a provisional spec that's easily 5 years from being commonplace.
I haven't been able to tune Halide to approach handwritten C performance, but @jrk probably could.
There's also the question of GPU virtualization consistency. I'm quite hesitant to take the presence of a fast GPU for granted, particularly when falling back to software rendering would be prohibitively slow.
All of the above adds up to nothing more than an "I'm doubtful". I am in no way an OpenGL or OpenCL expert, and I would suggest tracking one down for better answers.
It's great to hear my suspicions are unfounded.
It looks like OpenTK relies on System.Drawing for loading 2D images: http://www.opentk.com/doc/graphics/textures/loading. Maybe you are saying the same thing above @nathanaeljones.
That's quite a daunting task...
For an Image Manipulation Library, there is a large difference between:
Then there is the level of granularity:
As @nathanaeljones suggested, It is quite tempting to integrate some of the features of 2) for image (like image effects, composition by mask but with only alpha images and not svg layers) into 1), but while feasible, It is not ideal in term of separation of concerns. Imho, sticking to the perimeter of 1) would be better.
If it is mostly for some basic image manipulation on server side, I would expect a high-level API. I don't know much about libGD, but while not perfect, it looks okish for this task.
As mentioned earlier, don't expect anything from OpenGL/OpenCL, unless you want to implement a brand new low-level 2D API that supports HW, but most of a time, you need a software rasterizer anyway in case you don't have access to GPU HW (like on many servers).
I prefer the C API to be as granular as possible without sacrificing performance. I actually like many parts of WIC's API for low-level use, but the implementation leaves some things to be desired, particularly the epic fail that is IWICBitmapScaler.
The chain-based API of WIC is also wasted complexity since the implementation doesn't actually take advantage of it to reduce RAM requirements. If WIC was open-source, one could probably use it with less frustration; the docs are intentionally opaque about operating detail and resource consumption.
I'd also like to re-emphasize that one-size-fits-all is a bad idea here. First, create the low-level API that developers can build on top of. Once there's consensus about the preferred levels of abstraction, then add new APIs that expose them.
Most developers want to do their end-to-end image processing in one line of code; and that is certainly an API they should be given. Don't force them to understand resource lifetimes just to optimize or scale assets.
At the same time, don't make it hard for experts to extend and build on top of. You can't hide pointers and lifetimes from developers without repeating the failures of WPF and System.Drawing. Don't try; make it simple, consistent, and predictable instead of using leaky magical abstractions.
I also agree we don't want to reimplement SVG here. Cairo already exists, why re-create it? Cairo lacks great photo processing and has no image format support to speak of, which is conveniently the part we need most frequently. Sharing a memory layout isn't hard; we can mix libraries at will on the same buffers.
I would draw a small distinction between bitmap composition/effects and vector/layer/tree composition; bitmap alpha blending and basic effects are straightforward, unlike their vector counterparts. Given that we can heavily optimize them for a server context, I'd implement them in the core. Image overlay/watermarking shouldn't force an extra dependency, it's a basic need.
The stated requirements are a good start, but one of the challenges that you will face very quickly is text rendering, which will very rapidly rule out most trivial or simple solutions.
There are a few existing options that could be used.
In addition to having .NET bindings, Cairo is also the foundation that the C++ ISO committee is considering for a 2D API .
It is in general, a very pleasant library to use.
Cairo has good support for text rendering when combined with Pango : it will support rendering Unicode properly, handle left to right, and right to left (and text with both combined and properly rendered) as well as handling advanced ligature features in fonts and precise layout. In addition, Pango will use the native shaping with Uniscribe on Windows, and CoreText on OSX.
The major downside is that the text layout capabilities of System.Drawing are limited by a design that was barely aware of the complexity of Unicode. The text handling is not suitable for any scripts beyond European languages and the typography support is close to non-existent.
System.Drawing offers a few services on top of Cairo, like reading image metadata and loaders for various file formats.
Cario is the obvious choice for rendering, and I strongly suggest maintaining a compatible memory layout for the bitmap data. It does lack in the image format and image processing department. I don't think Cairo's scope should be extended, necessarily, but instead a complimentary library should be created to solve those concerns. I advocate a pay-as-you-go approach here to minimize attack surface area.
I would suggest preserving separable components through all layers, so that no layer prevents the developer from stripping out an unnecessary risk.
It's very important that high-risk components be used only when needed, and with disclaimers to monitor for security announcements. Given that nearly all use scenarios can avoid them, I think the design should reflect that.
We aren't quite ready to establish a development plan. While these solutions have taught us a lot about our options in the open source world it has also sparked some internal conversation about cross plat graphics. As we iron out short and long term plans for a cross plat graphics story we are starting to think there might be some allies from other teams with similar perspectives and goals. It has been stated that this is quite a big undertaking so deciding how to best tackle this and organizing the people ready to hop on it will take time. This discussion has done a fabulous job at putting the project in perspective. The feedback here is very appreciated.
We are continuing to prototype with libGD to understand its limits for server-side image manipulation. Many other developer needs have been brought up: text rendering, image file formats, low level GPU accelerated rendering, the high level API experience, the important distinction between a granular library like libGD with basic operations versus more advanced image composition operations, etc. These are part of a very large graphics story and weren't entirely expected to come up at this stage and in this issue. Regardless, the discussion is definitely helping map out the future.
I think remaining questions will come up as we delve deeper into libGD and Cairo to understand their potential, but separate issues can be started for each when they are needed. I'll leave this issue open a little longer in case folks have more to add. :-) Thank you all again for your input!
If this progresses beyond an intern project it seems prudent to hire (or come to some arrangement that allows) @nathanaeljones to lead the project.
Personally I don't think it's a priority. In a web/cloud scenario something is very wrong if the use case requires the web-server to process images in-process. Queue + job workers is easier to secure, scale and the IPC overhead can be kept in the low milliseconds.
I'd prefer MS to put resources on easy-win optimizations for all platforms to improve the @TechEmpower benchmark results..
I my humble opinion this project should receive a high attention at Microsoft. Versatile and performant cross-platform image manipulation library would add all new value to .NET Core. I would also second idea of Ryanbnl that MS would do best to hire @nathanaeljones for this project.
Perhaps it got lost in the noise, but you can use System.Drawing from Mono, it might require a change or two here and there:
It requires this:
I don't think it's terribly difficult to get at least the basics working. I've done a really crude port of our Windows System.Drawing (not the mono one), and didn't really hit any snags. I just ported a few things that I actually wanted to use, so I ignored all of the random stuff like Printing, etc. that weren't useful for me. By the way, @akoeplinger has a repo here that includes the System.Drawing sources from mono and seems to compile them for .NET Core: https://github.com/akoeplinger/mono-winforms-netcore
Just a heads-up: this is in no way complete and I planned to update it with DNX beta7 but got sidetracked due to other things. I'll probably just update it to beta8 when it comes out next week
referenced this issue
Oct 26, 2015
Today I launched a Kickstarter to fund this effort. You can't shoehorn any existing library to fill this need – I've tried.
If you want to support the project, go here: https://www.kickstarter.com/projects/njones/imageflow-respect-the-pixels-a-secure-alt-to-image
If you can spare a retweet, that would also be fantastic. We need to reach a large audience to make this happen.