# **Computer vision and cognitive systems laboratory**

# 1. Saturated arithmetic, Thresholds, Histogram

## 1.1 - Linear stretch

Your code will take as input a color image im (a torch.Tensor with dtype torch.uint8 and rank 3) and two scalars a and b. It must apply a pixel-wise linear transformation (every pixel 𝑝 is transformed to 𝑎⋅𝑝+𝑏 ). The code should produce a new image out with the same shape and dtype.

a and b can be either ints or floats. Be careful to: compute the exact result, round to nearest integer and then clip between 0 and 255.

In [None]:
import random
import numpy as np
import torch
from skimage import data
from skimage.transform import resize

im = data.coffee()
im = resize(im, (im.shape[0] // 8, im.shape[1] // 8), mode='reflect', preserve_range=True, anti_aliasing=True).astype(np.uint8)
im = np.swapaxes(np.swapaxes(im, 0, 2), 1, 2)
im = torch.from_numpy(im)

a = random.uniform(0,2)
b = random.uniform(-50,50)

#finish setup

out = torch.clamp(torch.round(im.float() * a + b), 0, 255).type(torch.uint8)

## 1.2 - Thresholding

Given an input grayscale image im (a torch.Tensor with shape (H, W) and dtype torch.uint8), write a code which performs a binary thresholding of the image at cut value val, and stores the result in out.

out should be another image, with the same shape of im, and with all the pixels greater than the threshold set to 255, all the others set to 0.

Be careful not to modify the original tensor in-place: the function should perform the thresholding on a copy of the image.



In [4]:
import random
import numpy as np
from skimage import data
from skimage.transform import resize
import torch

im = data.camera()
im = resize(im, (im.shape[0] // 2, im.shape[1] // 2), mode='reflect', preserve_range=True, anti_aliasing=True).astype(np.uint8)
im = torch.from_numpy(im)
val = random.randint(0, 255)

#finish setup

out = ((im > val) * 255).type(torch.uint8)

## 1.3 - Otsu Thresholding

Given an input grayscale image im (a torch.Tensor with shape (H, W) and dtype torch.uint8), write a code which computes the Otsu threshold for im stores the result in out.

Notice: beware of how the threshold is defined in the Otsu formulas. Your output should be compliant with our first definition of threshold (see slides).

In [None]:
import random
import numpy as np
from skimage import data
from skimage.transform import resize
import torch

im = data.camera()
im = resize(im, (im.shape[0] // 2, im.shape[1] // 2), mode='reflect', preserve_range=True, anti_aliasing=True).astype(np.uint8)
im = torch.from_numpy(im)

#finish setup

out = torch.histogram()

## 1.4 - Color histogram

Your code will take as input a color image im (a torch.Tensor with dtype torch.uint8 and shape (3, H, W)) and an integer nbin. It should compute a normalized color histogram of the image, quantized with nbin bins on each color plane.

The output should be a torch.Tensor with shape (3*nbin, ), containing the concatenation of the histograms computed on each color plane (in the same order of the input tensor).

The output should be L1-normalized (i.e. all bins of the final histogram should sum up to 1).

Quantization strategy: a pixel should go in the bin with index b iif: pixel*nbin // 256 == b

In [None]:
import random
import numpy as np
import torch
from skimage import data

im = data.astronaut()
im = np.swapaxes(np.swapaxes(im, 0, 2), 1, 2)
im = torch.from_numpy(im)
nbin = random.randint(32,128)

#finish setup



# 2. 2D convolution, 2D pooling

## 2.1 - 2D convolution

Your code will take an input tensor input with shape (n, iC, H, W) and a kernel kernel with shape (oC, iC, kH, kW). It needs then to apply a 2D convolution over input, using kernel as kernel tensor and no bias, using a stride of 1, no dilation, no grouping, and no padding, and store the result in out. Both input and kernel have dtype torch.float32

In [None]:
import random
import torch

n = random.randint(2, 6)
iC = random.randint(2, 6)
H = random.randint(10, 20)
W = random.randint(10, 20)

oC = random.randint(2, 6)
kH = random.randint(2, 6)
kW = random.randint(2, 6)

input = torch.rand(n, iC, H, W, dtype=torch.float32)
kernel = torch.rand(oC, iC, kH, kW, dtype=torch.float32)

#finish setup

out = torch.rand(n, oC, H - kH + 1, W - kW + 1 , dtype=torch.float32)

for n_im in range(n):
	for i in range(H - kH + 1):
		for j in range(W - kW + 1):
			out[n_im,:,i,j] = torch.sum(kernel*input[n_im,:,i:i+kH,j:j+kW], dim = (1,2,3))