**Aeronautics Institute of Technology – ITA**

**Computer Vision – CM-203**

**Professors:** 

Marcos Ricardo Omena de Albuquerque Maximo

Gabriel Adriano de Melo


**Instructions:**

Before submitting your lab, be sure that everything is running correctly (in sequence): first, **restart the kernel** (`Runtime->Restart Runtime` in Colab or `Kernel->Restart` in Jupyter). Then, execute all cells (`Runtime->Run All` in Colab or `Cell->Run All` in Jupyter) and verifies that all cells run without any errors, expecially the automatic grading ones, i.e. the ones with `assert`s.

**Do not delete the answer cells**, i.e. the ones that contains `WRITE YOUR CODE HERE` or `WRITE YOUR ANSWER HERE`, because they contain metadata with the ids of the cells for the grading system. For the same reason, **do not delete the test cells**, i.e. the ones with `assert`s. The autograding system executes all the code sequentially, adding extra tests in the test cells. There is no problem in creating new cells, as long as you do not delete answer or test cells. Moreover, keep your solutions within the reserved spaces.

The notebooks are implemented to be compatible with Google Colab, and they install the dependencies and download the datasets automatically. The commands which start with ! (exclamation mark) are bash commands and can be executed in a Linux terminal.

---

In this lab, you will implement some simple algorithms for image processing.

In [None]:
# !pip3 install opencv-contrib-python==4.6.0.66 Pillow==7.1.2 matplotlib==3.2.2 scipy==1.7.3 gdown==4.4.0
# tesseract-ocr (4.0.0-2), tesseract-ocr-eng (1:4.00)
def install_dependencies():
    """Install the dependencies and restart if needed"""
    try:
        import pytesseract
    except:
        !apt install tesseract-ocr && pip install pytesseract==0.3.10
        if 'google.colab' in str(get_ipython()):
            import os
            os.kill(os.getpid(), 9)

install_dependencies()

In [None]:
# Import the libraries
import cv2
import os
import pytesseract
import numpy as np
import PIL.Image
from pathlib import Path
from matplotlib import pyplot as plt

def ocr(imagem):
    """Returns the first line of characters detected by Tesseract as a string"""
    return pytesseract.image_to_string(imagem, config='--oem 1 --psm 7').split('\n')[0]

The next cell download a dataset with images of container plates.

In [None]:
# Verifies if the images have already been downloaded, and download and unzip them if necessary
! [ ! -d "/content/placas" ] && gdown -O /content/placas.zip 1x7ZyRx_be-U9u0NM_rSN_3-Wb_srf-5h &&  unzip /content/placas.zip -d /content && rm /content/placas.zip

imgs_path = Path("/content/placas")

The next cell uses Tesseract, which is a Optical Character Recognition (OCR) library, to detect the letters and numbers present in a container plate.

In [None]:
plate = cv2.cvtColor(cv2.imread(str(imgs_path/'placa_original.jpg')), cv2.COLOR_BGR2RGB)
print(ocr(plate))
PIL.Image.fromarray(plate)

## Color to grayscale conversion

A pixel in the image $I[y, x]$ is composed by its three color channels: blue $B[y, x]$, green $G[y, x]$, and red $R[y, x]$. Moreover, it follows the OpenCV convention of BGR: $I[y, x] = \left[ B[y, x], G[y, x], R[y, x] \right]$. Then, to convert a pixel to grayscale, we use the following linear transform:

$C[y, x] = 0.114 B[y, x] + 0.587 G[y, x] + 0.299 R[y, x]$.

These coefficients depend on the sensor sensibility and the screen accordingly to the human perception. The above coefficients are used for digital images accordingly to the specification ITU BT.601.

Note: this transform is defined considering a linear space, i.e. when the image scale has not been transformed by the gamma coefficient: $I_\text{nonlinear}(y, x) = I(y, x)^\gamma$.

Implement your own function below to compute this conversion (1 point). **You are not allowed** to use `cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)`, so you need to implement the matrix operations using NumPy.

<details><summary><b>Hints (click to expand):</b></summary>

- Use `matrix.astype(np.float64)` or `matrix.astype(np.uint8)` to convert a NumPy matrix to float64 or uint8.

- To obtain the image associated to a color channel, use `matrix[:, :, c]`, where `c` is the index of the color channel.

</details>

In [None]:
def convert_bgr_to_grayscale(bgr_img: np.ndarray) -> np.ndarray:
    """
    Converts an image from BGR to grayscale using the equation:
    C[y, x] = 0.114 * B[y, x] + 0.587 * G[y, x] + 0.299 * R[y, x]
    :param bgr_img: matrix (H, W, 3) which represents an image with height H, width W, and 3 color channels as BGR.
    :return: a new image (H, W) in grayscale in the 8 bits format.
    Uses truncation when converting floats to uint8 (for autograding).
    """
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return gray_img

In [None]:
img = np.arange(180, dtype=np.uint8).reshape(6, 10, 3)
assert convert_bgr_to_grayscale(img).dtype == np.uint8
assert np.all(convert_bgr_to_grayscale(img) == np.array(
      [[  1,   4,   7,  10,  13,  16,  19,  22,  25,  28],
       [ 31,  34,  37,  40,  43,  46,  49,  52,  55,  58],
       [ 61,  64,  67,  70,  73,  76,  79,  82,  85,  88],
       [ 91,  94,  97, 100, 103, 106, 109, 112, 115, 118],
       [121, 124, 127, 130, 133, 136, 139, 142, 145, 148],
       [151, 154, 157, 160, 163, 166, 169, 172, 175, 178]], dtype=np.uint8))

See the result of the conversion in an actual image below.

Note: this photo is not in a linear color space, so the conversion is not perfect, but the result seems fine visually anyway.

In [None]:
img = cv2.imread(str(imgs_path/'picture.png'))
PIL.Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)) # Pillow expects a RGB image

In [None]:
gray_img = convert_bgr_to_grayscale(img)
PIL.Image.fromarray(gray_img)

## Histogram of the pixels values in the image

We may analyze the illumination by the distribution of the pixel values in the image, i.e. by its histogram. In general, this is only applied to each color channel separately, or to the image in grayscale.

Therefore, let us build an histogram. We need to count how many pixels we have of each value. For 8 bits images, the values range from 0 to 255 (inclusive).

Implement the following function to return a count of pixel values of an image composed by a single color channel (1 point).

In [None]:
def histogram(mono_img: np.ndarray) -> np.ndarray:
    """
    Generates a histogram of the image, by counting how many pixels exist of a given value.
    :param mono_img: matrix (H, W) which represents an image of height H and width W.
    :return: an array (v) with the counting (q) of the pixel values (i) in the image, such that v[i] = q.
    """
    counts = np.zeros(256, dtype=np.uint64)
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return counts

In [None]:
img = cv2.imread(str(imgs_path/'picture.png'))
green_channel = img[:, :, 1]
hist = histogram(green_channel)
assert np.all(hist ==
      [   0,    0,    0,   11,   65,  111,  164,  261,  308,  431,  537,
        682,  846,  912, 1074, 1350, 1480, 1681, 1621, 2064, 1835, 2048,
       1989, 1911, 2258, 2129, 1748, 1776, 1850, 1687, 1605, 1661, 1355,
       1360, 1189, 1278, 1062, 1027, 1052, 1084, 1013, 1026,  915,  926,
       1029, 1013, 1023,  875, 1110, 1057,  862, 1120,  927, 1073, 1069,
       1004, 1275, 1274, 1129, 1266, 1495, 1548, 1591, 2046, 1706, 1998,
       1704, 2022, 1705, 1800, 1674, 1766, 1593, 1711, 1474, 1491, 1566,
       1456, 1445, 1524, 1371, 1545, 1278, 1606, 1364, 1475, 1513, 1537,
       1813, 1863, 1681, 1885, 2170, 1907, 2008, 1948, 2325, 2094, 1713,
       2084, 1670, 1811, 1720, 1743, 1603, 1693, 1386, 1515, 1639, 1542,
       1588, 1540, 1784, 1751, 1425, 1836, 1604, 1565, 1581, 1507, 1762,
       1769, 1448, 1654, 1872, 1914, 1844, 2103, 1925, 2023, 1811, 2206,
       1907, 1968, 2026, 2096, 2032, 2029, 1730, 1607, 1666, 1454, 1341,
       1354, 1247, 1213,  942, 1152,  931, 1080,  981,  904, 1006, 1003,
        814,  779,  848,  817,  756,  828,  690,  696,  579,  622,  514,
        528,  499,  506,  450,  504,  414,  483,  504,  473,  474,  452,
        621,  587,  512,  611,  652,  723,  645,  656,  850,  777,  720,
        719,  812,  741,  646,  703,  778,  767,  611,  800,  699,  795,
        781,  830,  816,  912,  843,  851,  847,  788,  687,  700,  573,
        499,  377,  388,  288,  294,  241,  201,  168,  167,  125,  114,
        102,  108,   68,   57,   54,   49,   16,   18,   10,    7,    5,
          9,    2,    1,    1,    0,    2,    0,    0,    0,    0,    0,
          1,    0,    0,    2,    0,    0,    1,    0,    0,    0,    0,
          0,    0,    0])

Plot of the histogram:

In [None]:
plt.bar(np.arange(256), hist, width=1)

`matplotlib` also has the method `plt.hist` which computes the histogram of an array. To use this function, we need to convert the image matrix into a 1D array using `matrix.ravel()` or `.flatten()`. Moreover, the `matplotlib`'s method also receives the number of bins used to plot the histogram as a parameter.

In [None]:
plt.hist(gray_img.ravel(), bins=np.arange(256))

## Additive and multiplicative gains

For each pixel $I[y, x]$, we will apply an affine transform composed of additive and multiplicative gains: $I_{r}[y, x] = \alpha \cdot I[y, x] + \beta$. Furthermore, the value is clipped so it stays within the interval $[0, 255]$.

Implement this operation in the following cell (1 point).

The function `cv2.convertScaleAbs(image, alpha, beta)` from OpenCV does the same operation. However, **you are not allowed to use this function**.

<details><summary><b>Hints (click to expand):</b></summary>
    
When needed, use `matrix.astype(np.float64)` / `np.uint8` to convert NumPy matrices to `float64` / `uint8`. Also, use `np.clip` to limit the values to stay within 0 to 255.
    
</details>

In [None]:
def gain(img: np.ndarray, alpha: float, beta: float) -> np.ndarray:
    """
    This function implements an affine transform composed of additive and multiplicative gains following:
    Ir[y, x] = alpha * I[y, x] + beta.
    Moreover, the value is clipped to stay within the interval [0, 255].
    :param img: matrix (H, W) or (H, W, C) which represents an image with height H, width W, and C color channels.
    :param alpha: multiplicative gain.
    :param beta: additive gain.
    :return: the transformed image.
    """
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return output_img

In [None]:
assert np.all(gain(np.ones((9, 9), dtype=np.uint8), 30, 50) == 80)
assert gain(np.ones((9, 9), dtype=np.uint8), 180, 160).dtype == np.uint8

We will now apply this function to improve the performance of an OCR algorithm in the case the image has been captured in a place with low illumination (1 point). Actually, to simulate this effect, we will apply a gain to attenuate the image.

<details><summary><b>Hints (click to expand):</b></summary>
See the histogram of the image.

Apply a gain to make the background color close to white. See the results in the cells below. Internally, Tesseract does already use a dynamic threshold for binarization using the Otsu's method.
</details>

In [None]:
def recover_dark_image(img: np.ndarray) -> np.ndarray:
    """
    Apply a gain transform to recover a dark image of a container plane.
    :param img: matrix (H, W) or (H, W, C) which represents an image of height H, width W, and C color channels.
    :return: a recovered image, where it is possible to visualize the characters of the plate.
    """
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return recovered_img

In [None]:
dark_plate = cv2.imread(str(imgs_path/'placa_escura.png'))
recovered_plate = recover_dark_image(dark_plate)
assert ocr(recovered_plate) == 'APZU 345314 4'

Notice how some characteres are hardly visible by human eyes in the low illumination condition, but become very visible after the transform.

In [None]:
print(ocr(dark_plate))
PIL.Image.fromarray(dark_plate)

In [None]:
print(ocr(recovered_plate))
PIL.Image.fromarray(recovered_plate)

The function `cv2.equivalizeHist` from OpenCV also permits equalizing the distribution of the values in an image. This function tries to mantain the histogram approximately constant.

In [None]:
PIL.Image.fromarray(cv2.equalizeHist(dark_plate[:,:,1]))

## Borders

As explained in class, when we use cross correlation or convolution, we may need to add pixels to the borders so we can apply the kernel on the borders. In our case of identifying letters in a plate, we may have difficulty with letters close to the image borders due to this issue.

We will now implement a function to add pixels to the borders of an image (also called padding) (0.5 points).

<details><summary><b>Hints (click to expand):</b></summary>
    
When indexing arrays in NumPy, use `matrix[start0:stop0, start1:stop1, start2:stop2]`.
    
</details>

In [None]:
def padding(image: np.ndarray, border_color: tuple, padding: tuple) -> np.ndarray:
    """
    Add padding to an image.
    :param image: matrix (H, W, C) which represents an image of height H, width W, and C color channels.
    :param border_color: tuple (C1, C2, C3, ...) which represents a color that can be applied to a border.
    :param padding: tuple (left, right, top, bottom) which represents the amount of pixels to be added
                    to each border.
    :return: the image after padding.
    """
    h0, w0, c = image.shape
    left, right, top, bottom = padding
    padded_image = np.zeros((h0 + top + bottom, w0 + left + right, c), dtype=np.uint8)
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return padded_image

In [None]:
img = np.arange(20, dtype=np.uint8).reshape(4, 5, 1)
assert np.all(padding(img, (0, ), (2, 2, 1, 1))[:, :, 0] == np.array(
      [[ 0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  1,  2,  3,  4,  0,  0],
       [ 0,  0,  5,  6,  7,  8,  9,  0,  0],
       [ 0,  0, 10, 11, 12, 13, 14,  0,  0],
       [ 0,  0, 15, 16, 17, 18, 19,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0]], dtype=np.uint8))

Now, let us add padding to a plate image to help the OCR algorithm (0.5 points).
Since the image background is white, we should add white pixels during padding. Moreover, adding 4 pixels on each border should be enough.

In [None]:
def plate_padding(image):
    """
    Adds padding to a plate image so the OCR method can correctly identify the characters.
    :param image: matrix (H, W, 3) which represents an image of height H, width W, and 3 color channels.
    :return: an image where characters close to the borders are recognizable by the OCR method.
    """
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return padded_image

In [None]:
cropped_plate = cv2.imread(str(imgs_path/'placa_cortada.png'))
padded_plate = plate_padding(cropped_plate)
assert ocr(padded_plate) == 'HLBU 305874 1'

Notice that some characters are identified correctly in the cropped plate, but after adding padding, all of them are correctly identified:

In [None]:
print(ocr(cropped_plate))
PIL.Image.fromarray(cropped_plate)

In [None]:
print(ocr(padded_plate))
PIL.Image.fromarray(padded_plate)

## Cross Correlation and Convolução

In the class, we discussed about the differences between cross correlation and convolution. Since many computer vision implement cross correlation instead of convolution to apply filters, we will implement cross correlation here. As explained in class, the diferrence is not very relevant, because we use cross correlation to do convolution just by flipping the kernel horizontally and vertically.nais. Na realidade, na implementação abaixo, é da operação matemática equivalente a correlação cruzada, uma vez que o kernel não está invertido.

The cross correlation operation is defined by the following equation:

$G[i, j] = \sum^k_{u=-k} \sum^k_{v=-k} H[u, v] I[i + u, j + v]$

Implement the cross correlation function below (2 points).


<details><summary><b>Hints (click to expand):</b></summary>

- Use the following array indexing from NumPy: `matrix[start0:stop0, start1:stop1, start2:stop2]`. 
    
- Use element-wise multiplication through the operator `*`.
    
- Use `np.sum` to sum the elements of an array.
    
</details>

In [None]:
def cross_correlation(image: np.ndarray, kernel: np.ndarray) -> np.ndarray:
    """
    Executes cross correlation of an image using a filter (kernel or mask).
    :param image: matrix (H, W) which represents an image of height H and width W.
    :param kernel: matrix (Hf, Wf) which represents a filter (kernel or mask) of height Hf and width Wf.
    :return: the result of the cross correlation between the image and the filter.
    """
    h0, w0 = image.shape
    hf, wf = kernel.shape
    output = np.zeros((h0 - hf + 1, w0 - wf + 1), dtype=np.float64)
    for i in range(h0 - hf + 1):
        for j in range(w0 - wf + 1):
            # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
            raise NotImplementedError()
    return output

In [None]:
assert np.all(cross_correlation(
    np.array([[1, 1, 1, 0, 0],
              [0, 1, 1, 1, 0],
              [0, 0, 1, 1, 1],
              [0, 0, 1, 1, 0],
              [0, 1, 1, 0, 0]]), 
    np.array([[1, 0, 1],
              [0, 1, 0],
              [1, 0, 1]])) == np.array([[4, 3, 4],
                                        [2, 4, 3],
                                        [2, 3, 4]]))

A cross correlation with a given filter (kernel) may implement a known mathematical operation. For example, the Sobel filter implements a partial derivate of the image through finite difference. The Sobel filter for computing the $x$ partial derivative of the image is given by: 

$\mathbf {S} _{x}={\begin{bmatrix}+1&0&-1\\+2&0&-2\\+1&0&-1\end{bmatrix}}$

The following cell computes the $x$ partial derivative of the Lena's image in grayscale.

In [None]:
sobel_x = np.array([[1, 0, -1],
                    [2, 0, -2],
                    [1, 0, -1]])
gray_img_dx = cross_correlation(gray_img, sobel_x)
plt.figure(figsize=(9,9))
plt.axis(False)
plt.imshow(gray_img_dx, cmap='gray')
plt.plot()

## Gaussian Filter

The Gaussian filter is frequently used to blue images or attenuate noise. The kernel of the Gaussian filter is a discrete approximation of the 2D Gaussian function:

$H(x, j) = \frac{1}{2 \pi \sigma^2} \exp \left( -\frac{(x - x_0)^2 + (y - y_0)^2}{2 \sigma^2} \right)$.

The discrete approximation is computed as:

$H[u, v] = \alpha  \exp \left( \frac{-(u-u_0)^2 + (v-v_0)^2}{2 \sigma^2} \right)$,

where $\alpha$ is a normalization constant so the kernel's values sum to 1 and $\sigma$ is a design parameter.

This is the best classic filter to attenuate Gaussian noise. It can also be interpreted as a low-pass filter that attenuates high frequencies. Using a Fourier transform, we can verify that this filter attenuates high frequencies.

Implement the function that computes the Gaussian kernel of dimension $(k, k)$ below, using the terms $u_0 = \frac{k-1}{2}$ e $v_0 = \frac{k-1}{2}$ so the Gaussian function stays at the center of the kernel (1 point).

**You are not allowed to use OpenCV to do this implementation.**

In [None]:
def build_gaussian_kernel(k, sigma):
    """
    Builds a Gaussian kernel of size k and standard deviation sigma.
    :param k: kernel size.
    :param sigma: standard deviation. 
    Retorna o kernel gaussiano normalizado, matriz float de tamanho (k, k) tipo float64
    """
    kernel = np.zeros((k, k), dtype=np.float64)
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return kernel

In [None]:
assert np.linalg.norm(build_gaussian_kernel(3, 1) - np.array(
      [[0.07511361, 0.1238414 , 0.07511361],
       [0.1238414 , 0.20417996, 0.1238414 ],
       [0.07511361, 0.1238414 , 0.07511361]])) < 1e-5

Let the effect of this kernel on the image below:

In [None]:
blurred_img = cross_correlation(gray_img, build_gaussian_kernel(7, 5))
plt.figure(figsize=(9,9))
plt.axis(False)
plt.imshow(blurred_img, cmap='gray')
plt.plot()
plt.figure(figsize=(9,9))
plt.axis(False)
plt.imshow(blurred_img[350:500, 100:250], cmap='gray', interpolation='nearest')
plt.plot()

As seen in class, a simpler filter consider only the arithmetic average. However, this filter adds a high frequency component, which creates undesired artifacts, as we can see in the cell:

In [None]:
blurred_img_box = cross_correlation(gray_img, np.ones(25).reshape(5,5)/25)
plt.figure(figsize=(9,9))
plt.axis(False)
plt.imshow(blurred_img_box, cmap='gray')
plt.plot()
plt.figure(figsize=(9,9))
plt.axis(False)
plt.imshow(blurred_img_box[350:500, 100:250], cmap='gray', interpolation='nearest')
plt.plot()

We can also create a filter to unblur (sharpen) an image. The idea behind this filter is to amplify the image and subtract its filtered version, so the variations in the image are amplified.

In the following cell, we use a sharpen kernel to sharpen the image.

In [None]:
identity_kernel = np.zeros((5, 5))
identity_kernel[2, 2] = 1
sharpen_kernel = 4 * identity_kernel - 3 * np.ones((5, 5)) / (5 * 5)
print('Sharpen Kernel:')
print(sharpen_kernel)
sharpened_img = cross_correlation(blurred_img, sharpen_kernel)
plt.figure(figsize=(9,9))
plt.axis(False)
plt.imshow(blurred_img, cmap='gray')
plt.figure(figsize=(9,9))
plt.axis(False)
plt.imshow(sharpened_img, cmap='gray')
plt.plot()

OpenCV has these functions already implemented:
- `cv2.GaussianBlur`: blurs an image using a Gaussian filter.
- `cv2.blur`: blurs an image using a box filter (arithmetic average).
- `cv2.medianBlur`: blurs an image using a median filter.
- `cv2.filter2D`: applies a kernel to an image using cross correlation.

## Noise

Gaussian noise is related especially to the sampling of incident photons on the image sensors, and to the spurious photons coming from the black body radiation.

The classical filter that best attenuates Gaussian noise is the Gaussian filter.

For each pixel, a random variable following a Gaussian distribution is added to the pixel value:

$I'[i,j] = I[i,j] + \eta[i,j]$,

where $\eta[i,j] \sim N(0,\sigma^2)$.

Implement the function below which applies a Gaussian filter to the image to allow the OCR to identify the characters in the plate (1 point).

In [None]:
def filter_noise_plate(image: np.ndarray) -> np.ndarray:
    """
    Applies a Gaussian filter in a plate's image so the OCR is able to identify the characters. Uses the cross correlation
    function with a Gaussian filter.
    :param image: matrix (H, W) which represents an image with height H and width W.
    :return: filtered image so the characters are  identifiable by the OCR.
    """
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return filtered_img

In [None]:
noisy_plate = convert_bgr_to_grayscale(cv2.imread(str(imgs_path/'placa_ruido.png')))
filtered_plate = filter_noise_plate(noisy_plate)
assert ocr(filtered_plate) == 'MEDU 297781 3'


In [None]:
print(ocr(noisy_plate))
PIL.Image.fromarray(noisy_plate)

In [None]:
print(ocr(filtered_plate))
PIL.Image.fromarray(filtered_plate)

## Morphological Operations

Morphological operations are similar to cross correlation with a kernel, since we slide a window through the image. However, the operations are nonlinear. For example, the morphological operations of dilatation and erosion choose the maximum and minimum of the window, respectively. These operations are implemented in OpenCV by `cv2.dilate` and `cv2.erode`, respectively.

![Dilatação](https://penny-xu.github.io/dialate-d6ec2fc1995eeeb95b917db2c6e1cea0.gif)

The morphological operations are defined for binary images. Nevertheless, we can also generalize them for grayscale (maximum/minimum element of a window).

Moreover, to binarize Lena's image below, we use a thredhold of 120.

In [None]:
binary_img = 255 * (gray_img > 120).astype(np.uint8)
PIL.Image.fromarray(binary_img)

The following images help visualize what the dilatation and erosion operations do to an image. Notice that dilatation and erosion work on the white pixels, i.e. dilation dilatates the white pixels while erosion erodes the white pixels. 

In [None]:
structuring_element = np.ones((3, 3))
dillated_img = cv2.dilate(binary_img, structuring_element)
PIL.Image.fromarray(dillated_img)

In [None]:
structuring_element = np.ones((3, 3))
eroded_img = cv2.erode(binary_img, structuring_element)
PIL.Image.fromarray(eroded_img)

In the next code cell, implement a morphological operation to recover the following plate so the OCR is able to better detect the characters (1 point).

In [None]:
faint_plate = convert_bgr_to_grayscale(cv2.imread(str(imgs_path/'placa_erodida.png')))
print(ocr(faint_plate))
PIL.Image.fromarray(faint_plate)

In [None]:
def morphological_operation_plate(image: np.ndarray) -> np.ndarray:
    """
    Executes a morphological operation to recover the plate.
    :param image: matrix (H, W) which represents an image of height H and width W.
    :return: image after morphological operation that allows the identification of the characters through OCR.
    """
    structuring_element = np.ones((3, 3))
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return output

In [None]:
faint_plate = convert_bgr_to_grayscale(cv2.imread(str(imgs_path/'placa_erodida.png')))
recovered_plate = morphological_operation_plate(faint_plate)
assert ocr(recovered_plate) == 'APZU 345314 4'

In [None]:
print(ocr(recovered_plate))
PIL.Image.fromarray(recovered_plate)

There are many other image processing methods that we can apply to an image. For example, to rotate/scale/translate/shear an image, we can use `cv2.warpAffine`.

In [None]:
img = cv2.imread(str(imgs_path/'picture.png'))
transform = cv2.getRotationMatrix2D((350, 250), 70, 1.4)
rotated_img = cv2.warpAffine(img, transform, (700, 700))
PIL.Image.fromarray(cv2.cvtColor(rotated_img, cv2.COLOR_BGR2RGB))

# Your data and feedback:

Write a feedback for the lab so we can make it better for the next years.

In the following variables, write the number of hours spent on this lab, the perceived difficulty, and the expected grade (you may delete the `raise` and the comments):

In [None]:
# meta_eval manual_graded_answer 0

horas_gastas = None    # 1.5   - Float number with the number of hours spent 
dificuldade_lab = None # 0     - Float number from 0.0 to 10.0 (inclusive)
nota_esperada = None   # 10    - Float number from 0.0 to 10.0 (inclusive)

# WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
raise NotImplementedError()

Write below other comments or feedbacks about the lab. If you did not understand anything about the lab, please also comment here.

If you find any typo or bug in the lab, please comment below so we can fix it.

WRITE YOUR SOLUTION HERE! (do not change this first line):

**ATTENTION**

**ATTENTION**

**ATTENTION**

**ATTENTION**

**DISCURSIVE QUESTION**

WRITE YOUR ANSWER HERE (do not delete this cell so the ID is not lost)

**ATTENTION**

**ATTENTION**


**End of the lab!**