Image Processing docs (#388)

* Fill values, links and images This commit fills in the template left by the previous commit * Replace Mathjax with code blocks Since GitHub doesn't allow Mathjax, the formula parts have been replaced with code blocks * Remove section on affine transformation It doesn't seem like the concept is used often, so postponed for now * Add basic explanation of convolution * Docs for Harris and Hessian The docs are written with a beginner in mind and has a basics section. The pictures and paper links are to be inserted. * Convert markdown to rst * Add some more relevant papers This commit cites and adds a link to paper about Hessian detector and a review paper about affine region detectors * Move to new concept name Space extrema detector doesn't seem to be a widespread usage of detector the detector class. There is a paper that uses "Affine region detector" term, which has quite a few citations * Fix mistakes in docs Fix mistakes related to terminology and algorithm steps * Add ip docs to index.rst Make sure ip docs are built and are in the final output * Fix formatting It seems like some lines are not properly formatted and rendered file containes unreadable lines. Fixed by not formatting it.
boostorg · Sep 3, 2019 · 6d61de0 · mloskot · Sep 3, 2019 · mloskot
1 parent bd00f91
commit 6d61de0
Show file tree

Hide file tree

Showing 5 changed files with 171 additions and 0 deletions.
diff --git a/doc/image_processing/Moravec-window-corner.png b/doc/image_processing/Moravec-window-corner.png
diff --git a/doc/image_processing/Moravec-window-edge.png b/doc/image_processing/Moravec-window-edge.png
diff --git a/doc/image_processing/affine-region-detectors.rst b/doc/image_processing/affine-region-detectors.rst
@@ -0,0 +1,115 @@
+Affine region detectors
+-----------------------
+
+What is being detected?
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Affine region is basically any region of the image
+that is stable under affine transformations. It can be
+edges under affinity conditions, corners (small patch of an image)
+or any other stable features.
+
+--------------
+
+Available detectors
+~~~~~~~~~~~~~~~~~~~
+
+At the moment, the following detectors are implemented
+
+-  Harris detector
+
+-  Hessian detector
+
+--------------
+
+Algorithm steps
+~~~~~~~~~~~~~~~
+
+Harris and Hessian
+^^^^^^^^^^^^^^^^^^
+
+Both are derived from a concept called Moravec window. Lets have a look
+at the image below:
+
+.. figure:: ./Moravec-window-corner.png
+   :alt: Moravec window corner case
+
+   Moravec window corner case
+
+As can be noticed, moving the yellow window in any direction will cause
+very big change in intensity. Now, lets have a look at the edge case:
+
+.. figure:: ./Moravec-window-edge.png
+   :alt: Moravec window edge case
+
+   Moravec window edge case
+
+In this case, intensity change will happen only when moving in
+particular direction.
+
+This is the key concept in understanding how the two corner detectors
+work.
+
+The algorithms have the same structure:
+
+1. Compute image derivatives
+
+2. Compute Weighted sum
+
+3. Compute response
+
+4. Threshold (optional)
+
+Harris and Hessian differ in what **derivatives they compute**. Harris
+computes the following derivatives:
+
+``HarrisMatrix = [(dx)^2, dxdy], [dxdy, (dy)^2]``
+
+(note that ``d(x^2)`` and ``(dy^2)`` are **numerical** powers, not gradient again).
+
+The three distinct terms of a matrix can be separated into three images,
+to simplify implementation. Hessian, on the other hand, computes second
+order derivatives:
+
+``HessianMatrix = [dxdx, dxdy][dxdy, dydy]``
+
+**Weighted sum** is the same for both. Usually Gaussian blur
+matrix is used as weights, because corners should have hill like
+curvature in gradients, and other weights might be noisy.
+Basically overlay weights matrix over a corner, compute sum of
+``s[i,j]=image[x + i, y + j] * weights[i, j]`` for ``i, j``
+from zero to weight matrix dimensions, then move the window
+and compute again until all of the image is covered.
+
+**Response computation** is a matter of choice. Given the general form
+of both matrices above
+
+``[a, b][c, d]``
+
+One of the response functions is
+
+``response = det - k * trace^2 = a * c - b * d - k * (a + d)^2``
+
+``k`` is called discrimination constant. Usual values are ``0.04`` -
+``0.06``.
+
+The other is simply determinant
+
+``response = det = a * c - b * d``
+
+**Thresholding** is optional, but without it the result will be
+extremely noisy. For complex images, like the ones of outdoors, for
+Harris it will be in order of 100000000 and for Hessian will be in order
+of 10000. For simpler images values in order of 100s and 1000s should be
+enough. The numbers assume ``uint8_t`` gray image.
+
+To get deeper explanation please refer to following **paper**:
+
+`Harris, Christopher G., and Mike Stephens. "A combined corner and edge
+detector." In Alvey vision conference, vol. 15, no. 50, pp. 10-5244.
+1988. <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.434.4816&rep=rep1&type=pdf>`__
+
+`Mikolajczyk, Krystian, and Cordelia Schmid. "An affine invariant interest point detector." In European conference on computer vision, pp. 128-142. Springer, Berlin, Heidelberg, 2002. <https://hal.inria.fr/inria-00548252/document>`__
+
+`Mikolajczyk, Krystian, Tinne Tuytelaars, Cordelia Schmid, Andrew Zisserman, Jiri Matas, Frederik Schaffalitzky, Timor Kadir, and Luc Van Gool. "A comparison of affine region detectors." International journal of computer vision 65, no. 1-2 (2005): 43-72. <https://hal.inria.fr/inria-00548528/document>`__
+
diff --git a/doc/image_processing/basics.rst b/doc/image_processing/basics.rst
@@ -0,0 +1,54 @@
+Basics
+------
+
+Here are basic concepts that might help to understand documentation
+written in this folder:
+
+Convolution
+~~~~~~~~~~~
+
+The simplest way to look at this is "tweaking the input so that it would
+look like the shape provided". What exact tweaking is applied depends on
+the kernel.
+
+--------------
+
+Filters, kernels, weights
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Those three words usually mean the same thing, unless context is clear
+about a different usage. Simply put, they are matrices, that are used to
+achieve certain effects on the image. Lets consider a simple one, 3 by 3
+Scharr filter
+
+``ScharrX = [1,0,-1][1,0,-1][1,0,-1]``
+
+The filter above, when convolved with a single channel image
+(intensity/luminance strength), will produce a gradient in X
+(horizontal) direction. There is filtering that cannot be done with a
+kernel though, and one good example is median filter (mean is the
+arithmetic mean, whereas median will be the center element of a sorted
+array).
+
+--------------
+
+Derivatives
+~~~~~~~~~~~
+
+A derivative of an image is a gradient in one of two directions: x
+(horizontal) and y (vertical). To compute a derivative, one can use
+Scharr, Sobel and other gradient filters.
+
+--------------
+
+Curvature
+~~~~~~~~~
+
+The word, when used alone, will mean the curvature that would be
+generated if values of an image would be plotted in 3D graph. X and Z
+axises (which form horizontal plane) will correspond to X and Y indices
+of an image, and Y axis will correspond to value at that pixel. By
+little stretch of an imagination, filters (another names are kernels,
+weights) could be considered an image (or any 2D matrix). A mean filter
+would draw a flat plane, whereas Gaussian filter would draw a hill that
+gets sharper depending on it's sigma value.
diff --git a/doc/index.rst b/doc/index.rst
@@ -26,6 +26,8 @@ Documentation
 
    design_guide
    io
+   image_processing/basics.rst
+   image_processing/affine-region-detectors.rst
    toolbox
    numeric
    API Reference <./reference/index.html#://>