# Scale Invariant Feature Transform（SIFT）

## Image scale space

Within a certain range, no matter whether the object is large or small, the human eye can distinguish it. However, it is difficult for a computer to have the same ability. Therefore, in order for the machine to have a unified understanding of objects at different scales, it is necessary to It is necessary to consider the characteristics of images that exist at different scales.

The acquisition of scale space is usually achieved using Gaussian blur.

![title](sift_3.png)

![title](sift_2.png)

Gaussian functions with different σ determine the smoothness of the image. A larger σ value corresponds to a blurrier image.

### Multi-resolution pyramid

![title](sift_4.png)

### Difference of Gaussian Pyramid (DOG)

![title](sift_5.png)

![title](sift_6.png)

### DoG space extreme value detection

In order to find the extreme point of the scale space, each pixel point is compared with all adjacent points in its image domain (same scale space) and scale domain (adjacent scale space). When it is greater than (or smaller than) all phase points, When it is an adjacent point, the point is the extreme point. As shown in the figure below, the middle detection point needs to be compared with 8 pixels in the 3×3 neighborhood of the image where it is located, and 18 pixels in the 3×3 areas of the adjacent upper and lower layers, for a total of 26 pixels. .

![title](sift_7.png)

### Precise positioning of key points

![title](sift_8.png)

![title](sift_9.png)

### Eliminate boundary response

![title](sift_10.png)

### Main direction of feature points

![title](sift_11.png)

Each feature point can obtain three pieces of information (x, y, σ, θ), namely position, scale and direction. Key points with multiple directions can be copied into multiple copies, and then the direction values are assigned to the copied feature points respectively. One feature point generates multiple feature points with the same coordinates and scales, but different directions.

### Generate feature description

After completing the gradient calculation of key points, the histogram is used to count the gradient and direction of pixels in the neighborhood.

![title](sift_12.png)

In order to ensure the rotation invariance of the feature vector, the feature point should be taken as the center and the coordinate axis should be rotated by an angle θ in the nearby neighborhood, that is, the coordinate axis should be rotated to the main direction of the feature point.

![title](sift_14.png)

Take an 8x8 window with the main direction after rotation as the center, and find the gradient amplitude and direction of each pixel. The direction of the arrow represents the gradient direction, and the length represents the gradient amplitude. Then use a Gaussian window to weight it, and finally in each 4x4 Draw a gradient histogram in 8 directions on a small patch, and calculate the cumulative value of each gradient direction to form a seed point, that is, each feature is composed of 4 seed points, and each seed point has 8 directions. vector information.

![title](sift_16.png)

The paper recommends using 4x4 total 16 seed points to describe each key point, so that a key point will generate a 128-dimensional SIFT feature vector.

![title](sift_17.png)

### opencv SIFT

In [17]:
import cv2
import numpy as np

img = cv2.imread('test_1.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

In [18]:
cv2.__version__ #3.4.1.15 pip install opencv-python==3.4.1.15 pip install opencv-contrib-python==3.4.1.15

'3.4.1'

Get feature points

In [19]:
sift = cv2.xfeatures2d.SIFT_create()
kp = sift.detect(gray, None)

In [20]:
img = cv2.drawKeypoints(gray, kp, img)

cv2.imshow('drawKeypoints', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Compute features

In [21]:
kp, des = sift.compute(gray, kp)

In [22]:
print (np.array(kp).shape)

(6827,)


In [23]:
des.shape

(6827, 128)

In [11]:
des[0]

array([  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,  21.,   8.,   0.,
         0.,   0.,   0.,   0.,   0., 157.,  31.,   3.,   1.,   0.,   0.,
         2.,  63.,  75.,   7.,  20.,  35.,  31.,  74.,  23.,  66.,   0.,
         0.,   1.,   3.,   4.,   1.,   0.,   0.,  76.,  15.,  13.,  27.,
         8.,   1.,   0.,   2., 157., 112.,  50.,  31.,   2.,   0.,   0.,
         9.,  49.,  42., 157., 157.,  12.,   4.,   1.,   5.,   1.,  13.,
         7.,  12.,  41.,   5.,   0.,   0., 104.,   8.,   5.,  19.,  53.,
         5.,   1.,  21., 157.,  55.,  35.,  90.,  22.,   0.,   0.,  18.,
         3.,   6.,  68., 157.,  52.,   0.,   0.,   0.,   7.,  34.,  10.,
        10.,  11.,   0.,   2.,   6.,  44.,   9.,   4.,   7.,  19.,   5.,
        14.,  26.,  37.,  28.,  32.,  92.,  16.,   2.,   3.,   4.,   0.,
         0.,   6.,  92.,  23.,   0.,   0.,   0.], dtype=float32)