# Computer Vision - P3
# First part


## Delivery

Up to **1 point out of 10** will be penalized if the following requirements are not fulfilled:

- Implemented code should be commented.

- The questions introduced in the exercises must be answered.

- Add title to the figures to explain what is displayed.

- Comments need to be in **English**.

- The deliverable of both parts must be a file named **P3_Student1_Student2.zip** that includes:
    - The notebook P3_Student1_Student2.ipynb completed with the solutions to the exercises and their corresponding comments.
    - All the images used in this notebook.

**Deadline (Campus Virtual): October 26th, 23:00 h** 

==============================================================================================
## Practicum 3: Image and Video Segmentation

==============================================================================================

The main topics of Laboratory 3 are:

First part: Video Segmentation:

3.1. Segmentation of video shots with static scenes.

3.2. Background subtraction.

Second part: Image Segmentation:

3.3. Segmentation of images.

To successfuly complete this practicum it is necessary to understand the following theory concepts: video segmentation, background subtraction, K-means clustering, different image segmentation approaches,...

The following chapters of the book “Computer Vision: Algorithms and Applicatons” from Richard Szeliski have further information about the topic:

* Chapter 4: Computer Vision: Algorithms and Applications.

* Chapter 5: Segmentation.



## Video segmentation

Given the video stored in ‘Barcelona-sequence’, which contains images acquired by a static camera, we ask you to:
- Find the temporal segments of the video (shots). Where the scene change? (Section 3.1)
- Extract the background images and thus, remove all the "artifacts" considered as foreground related to movement. (Section 3.2)


Note: One of the applications of the backgroun subtraction methods is the button "remove tourists" implemented in most commercial photo cameras. For instance, Adobe uses the "Monument Mode", which automatically deletes the people going by the cameras. Today, most of videoconference tools allow to put a virtual background. Thus, they need to extract the person (the foreground) and to put it on the new virtual background.


## 3.1 Segmentation of video shots

Read and visualize the sequence of images "images/Barcelona-sequence".

Hint: In order to read a collection of images, we wil use the function [animation.FuncAnimation](https://matplotlib.org/2.0.0/api/_as_gen/matplotlib.animation.FuncAnimation.html).

Observe in the following Example, how FuncAnimation is used to read and visualize a sequence of frames. Explore the parameters of animation.FuncAnimation().

In [1]:
%matplotlib inline

In [2]:
# Example
import numpy as np
import skimage
from skimage import io, img_as_float
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from skimage.exposure import histogram

ic = io.ImageCollection('images/Barcelona-sequence/*.png')
        # Reading a sequence of images from a folder

%matplotlib nbagg 
    #Changing the pluggin is necessary always when visualizing a video!

fig = plt.figure()  # Create figure
im = plt.imshow(ic[0], animated=True) #Visualize the first image

def updatefig1(i):   #Updating the frame visualization
    im.set_array(ic[i*5]) #Changing the content of the canvas
    return im, #to return a tuple!

plt.show()
ani = animation.FuncAnimation(fig, updatefig1, interval=5, blit=True, frames=50, repeat= False)
plt.show()

<IPython.core.display.Javascript object>

a) Each of the scenes in a video is usually called 'shot'. Find where a shot (scene) finishes and the following starts (the shot boundaries). 

To solve this exercise, you need to create a **temporal plot** showing a frame by frame difference measure to be defined by you, being applicable to distinguish the shots. Define a criterion to detect the boundaries of the shots and visualize it in a static plot.

In [3]:
# Your solution here
# Encontrar cuando cambia mucho el histograma
# Sumar RGB canales antes de hacer la diferencia

def difference_of_histograms(u, v):
    return np.linalg.norm(u - v)

def temporal_plot(nbins):
    prev_hist = None
    differences_list = []
    for img in ic:
        # Convert img to float
        img = img_as_float(img)

        # Get RGB channels
        r = img[:,:,0]
        g = img[:,:,1]
        b = img[:,:,2]

        # Get histogram of each channel and sum
        r_hist, r_bin = histogram(r, nbins=nbins)
        g_hist, g_bin = histogram(g, nbins=nbins)
        b_hist, b_bin  = histogram(b, nbins=nbins)

        hist = np.array(r_hist + g_hist + b_hist)
        if prev_hist is not None:
            dif = difference_of_histograms(hist, prev_hist)
            differences_list.append(dif)
        prev_hist = hist

    return differences_list

differences_list_32bins = temporal_plot(32)
# Plot
fig, ax = plt.subplots()
plt.plot(np.arange(1, len(ic)), differences_list_32bins)
plt.show()

<IPython.core.display.Javascript object>

Additionally, create an interactive plot executing the following code in "Example A". Substitute the bottom plot with the temporal plot showing the differences between the consecutive frames. 

In [5]:
# Example A
# Sinusoidal plot points generation
def data_gen():
    t = data_gen.t
    for diff in differences_list_32bins:
        t += 1
        # adapted the data generator to yield both sin and cos
        print(diff)
        yield t, diff

data_gen.t = 0

%matplotlib nbagg

# create a figure with two subplots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(8,6))

# intialize a line object on the second axes for plotting
line, = ax2.plot([], [], lw=2, color='r')

ax2.set_ylim(0, 10000)
ax2.set_xlim(0, 5)
ax2.grid()

# initialize the data arrays 
xdata, ydata = [], []
def run(data):
    # update the data plot
    t, y = data
    xdata.append(t) # time = x axis
    ydata.append(y) # y axis

    # Plot image on top row
    ax1.imshow(ic[len(xdata)])
    
    # Update y limit axis
    ymin, ymax = ax2.get_ylim()
    if y >= ymax:
        ax2.set_ylim(ymin, y*1.1)
    
    # Update x limit axis
    xmin, xmax = ax2.get_xlim()
    if t >= xmax:
        ax2.set_xlim(xmin, 2*xmax)
        # Plot difference in bottom row
        ax2.figure.canvas.draw()
        
    # update the data of both line objects
    line.set_data(xdata, ydata)

    return line

ani = animation.FuncAnimation(fig, run, data_gen, blit=True, interval=10, repeat=False)
plt.show()

<IPython.core.display.Javascript object>

In case you used histograms to separate shots, use different bins to see what is the optimal size of the histogram to better separate the shots.

In [6]:
# Let's prove with 8 bins
differences_list_8bins = temporal_plot(8)
# Let's prove with 64 bins
differences_list_64bins = temporal_plot(64)
# Plot
fig, ax = plt.subplots()
plt.plot(np.arange(1, len(ic)), differences_list_8bins)
plt.show()

fig, ax = plt.subplots()
plt.plot(np.arange(1, len(ic)), differences_list_64bins)
plt.show()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

**R:** It's better to use a smaller number of bins to get the optimal size of the histogram to better separate the shots.

b) Show the initial and final images of each shot extracted as follows:

<img src="images_for_notebook/result_shot_detection.png">


**Hint:**
Use the previous plot to define the proper threshold value over the differences of histograms. Use the threshold to localize the initial and final frames.


In [7]:
a = np.array(differences_list_8bins)
indices = np.where(a > 100000) # Threshold setted to 100000
indices = indices[0]

# First scene
img1 = ic[0] # First frame
img2 = ic[indices[0]] # Last frame
# Second scene
img3 = ic[indices[0]+1] # First frame
img4 = ic[indices[1]] # Last frame
# Third scene 
img5 = ic[indices[1]+1] # First frame
img6 = ic[indices[2]] # Last frame

# Show images
fig, axes = plt.subplots(3, 2, figsize=(13, 10))
ax = axes.ravel()

ax[0].imshow(img1)
ax[0].axis('off')
ax[0].set_title("First frame")
ax[1].imshow(img2)
ax[1].axis('off')
ax[1].set_title("Last frame")
ax[2].imshow(img3)
ax[2].axis('off')
ax[2].set_title("First frame")
ax[3].imshow(img4)
ax[3].axis('off')
ax[3].set_title("Last frame")
ax[4].imshow(img5)
ax[4].axis('off')
ax[4].set_title("First frame")
ax[5].imshow(img6)
ax[5].axis('off')
ax[5].set_title("Last frame")

plt.show()

<IPython.core.display.Javascript object>

c) Which measure have you used in order to visually distinguish the shots in a plot? Explain your solution.

**R:** First, we separated the channels of the images, then we calculated the histogram of each channel and added those values to get a score.
After that, we calculated the euclidian distance of the score between two following shots.
Finally, we saved the distances and plot them in order to see the largests differences, which means there is a change of scene.

d) Would your video segmentation strategy be able to capture continuous transition? Argue your answer.

**R:** No, because we examinated each image of the scene as independent to the others and the histogram's difference between two following shots on a continuos transition is very small.

e) Would your video segmentation strategy be able to separate and track objects? Argue your answer.

**R:** No, because our video segmentation it's taking to account all image pixels. But we can apply the background subtraction algorithm to separate and track objects.

## 3.2 Background subtraction

Apply the background subtraction algorithm (check theory material).

a) Visualize the following images for each different scene (there are 3) of the video:

    1) an image belonging to the shot
    2) the background image, and
    3) the foreground.
    
**Hint**: You can construct a mask obtained from the original image and the background in order to know which parts of the image form part from the foreground and recover from the original image just the foreground regions.

In [8]:
# Your solution here
# Implements the background algorithm given a initial and a final indices
def background_sub(initial, final):
    collection = []
    for i in range(initial, final + 1):
        # Convert img to float
        img = img_as_float(ic[i])
        collection.append(img)
        
    # Create background and foreground images
    first_frame = collection[0]
    background = np.median(collection, axis = 0)
    foreground = np.abs(first_frame - background)
    return first_frame, background, foreground

# Create the figure
fig, axes = plt.subplots(3, 3, figsize=(14, 10))
ax = axes.ravel()
initial = 0
i = 0
# Plot first image, background and foreground for each scene
for final in indices:
    img, background, foreground = background_sub(initial, final)
    ax[i].imshow(img)
    ax[i].axis('off')
    ax[i].set_title("Image of the sceene")
    ax[i+1].imshow(background)
    ax[i+1].axis('off')
    ax[i+1].set_title("Background image")
    ax[i+2].imshow(foreground)
    ax[i+2].axis('off')
    ax[i+2].set_title("Foreground image")
    i += 3
    initial = final + 1

plt.show()

<IPython.core.display.Javascript object>

b) Comment your implementation explaining all steps 

c) Answer the following questions:
- What happens if the shots are not correctly extracted? 
- What happens if you find too many shots in the video? 
- What do the static background images represent? 
- In which situations does the algorithm work and in which it does not? 
- What happens if you subtract the background image from the original one?
- Do you see any additional application for this algorithm?

d) **[OPTIONAL]**
- Apply the algorithm to some other video that you found.

**b)** We have applied a median filter to all pixels along the temporal axis using the np.median function. More details is available at the code.

**c)** 
**1.** We could extract some wrong shoots as long as the median value does not change too much.<br>
**2.** If wee find many different sceenes, the median filter would be less acurate because there would be less frames per scene.<br>
**3.** The static background images represents the most frequent pixels of the image. <br>
**4.** The algorithm work when the shots had been taken from a static camera. The algorithm won't work if the camera moves or if the brightness changes significantly. <br>
**5.** If we subtract the background image from the original one we get the pixels that the value has changed along the time. <br>
**6.** This algorithm could be applied to motion detection.<br>