# Data exploration

The goal of this notebook is to explore data and understand how to filter bad frames.

In [None]:
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from PIL import Image
import numpy as np
import cv2
import pandas as pd
from torchvision import transforms

In [None]:
LUCA_CARTOONS_PATH = "../../data/cartoon_frames/Luca"
luca_frames_paths = sorted([os.path.join(LUCA_CARTOONS_PATH, frame) for frame in os.listdir(LUCA_CARTOONS_PATH) if os.path.isfile(os.path.join(LUCA_CARTOONS_PATH, frame))])

In [None]:
TSLOP_CARTOONS_PATH = "../../data/cartoon_frames/TheSecretLifeOfPets"
tslop_frames_paths = sorted([os.path.join(TSLOP_CARTOONS_PATH, frame) for frame in os.listdir(TSLOP_CARTOONS_PATH) if os.path.isfile(os.path.join(TSLOP_CARTOONS_PATH, frame))])

In [None]:
CARTOONS_DF_PATH = "../../data/frames_all.csv"
frames_df = pd.read_csv(CARTOONS_DF_PATH, index_col=0)

In [None]:
PICTURES_DF_PATH = "../../data/pictures_all.csv"
pictures_df = pd.read_csv(PICTURES_DF_PATH, index_col=0)

## Explore image size

All the images don't have the same size. We must see how they differ.

In [None]:
frames_df[["width", "height"]].value_counts()

In [None]:
pd.set_option('display.max_rows', pictures_df.shape[0]+1)
pictures_df[["width", "height"]].value_counts()

In [None]:
frames_df["ratio"] = frames_df.apply(lambda row : row["width"]/row["height"], axis=1)
frames_df["ratio"].value_counts()

In [None]:
pictures_df["ratio"] = pictures_df.apply(lambda row : row["width"]/row["height"], axis=1)
pictures_df["ratio"].value_counts()

Resolutions are quite different, and some images seem to be in low resolution.\
As we want a resolution of at least 256 x 256 (like in the original paper), we should discard all images with width or height lower than that.

All the frames are in a landscape mode, but a lot of the pictures aren't.\
If we crop the images to match a specific resolution, we shouldn't be worried by these ratios. However, if we only resize the images, we may want to discard images in portrait mode.

## Resize vs crop

We must see how we should preprocess the images.

In [None]:
# sample_frames_df = frames_df.groupby('movie', group_keys=False).apply(lambda x: x.sample(3))
# sample_frames_df["movie"].value_counts()
sample_frames_df = frames_df.sample(15)

In [None]:
for path in sample_frames_df["path"]:
    image = Image.open(path)
    plt.imshow(image)
    plt.show()

In [None]:
sample_pictures_df = pictures_df.sample(15)

In [None]:
for path in sample_pictures_df["path"]:
    image = Image.open(path)
    plt.imshow(image)
    plt.show()

In [None]:
new_size = (256, 256)

First we can resize them.

In [None]:
resize = transforms.Resize(new_size)

In [None]:
for path in sample_frames_df["path"]:
    image = Image.open(path)
    image = resize(image)
    plt.imshow(image)
    plt.show()

In [None]:
for path in sample_pictures_df["path"]:
    image = Image.open(path)
    image = resize(image)
    plt.imshow(image)
    plt.show()

Images are here quite deformed.

We can now try to crop them:

In [None]:
def crop_center(image): 
    min_side = min(image.size)
    ratio = new_size[1]/new_size[0]
    image = transforms.CenterCrop((min_side, int(ratio*min_side)))(image)
    return resize(image)

In [None]:
for path in sample_frames_df["path"]:
    image = Image.open(path)
    image = crop_center(image)
    plt.imshow(image)
    plt.show()

In [None]:
for path in sample_pictures_df["path"]:
    image = Image.open(path)
    image = crop_center(image)
    plt.imshow(image)
    plt.show()

This solution seems to be the best one as images aren't deformed here.\
We could even randomly crop the image, to avoid always taking its center.

In [None]:
def crop_random(image): 
    min_side = min(image.size)
    ratio = new_size[1]/new_size[0]
    image = transforms.RandomCrop((min_side, int(ratio*min_side)))(image)
    return resize(image)

In [None]:
for path in sample_frames_df["path"]:
    image = Image.open(path)
    image = crop_random(image)
    plt.imshow(image)
    plt.show()

In [None]:
for path in sample_pictures_df["path"]:
    image = Image.open(path)
    image = crop_random(image)
    plt.imshow(image)
    plt.show()

## Detect text in images

We want to remove images with titles or added text

In [None]:
interesting_images_text = [0, 1, 3, 13, 27, 28, 32, 41, 46, 47, 48, 50, 51, ]
interesting_paths_text = [luca_frames_paths[i] for i in interesting_images_text]

In [None]:
for path in interesting_paths_text:
  img = mpimg.imread(path)
  imgplot = plt.imshow(img)
  plt.show()

In [None]:
mser = cv2.MSER_create()

for path in interesting_paths_text:
  image = Image.open(path)
  img = np.asarray(image)
  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  regions, _ = mser.detectRegions(gray)
  hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]
  cv2.polylines(img, hulls, 1, (0, 255, 0))

  plt.imshow(img)
  plt.show()

In [None]:
!sudo apt install tesseract-ocr
!pip install pytesseract

The mser method doesn't seem to work.

Let's try an OCR

In [None]:
import pytesseract

for path in interesting_paths_text:
  image = Image.open(path)
  img = np.asarray(image)
  print(pytesseract.image_to_string(img).strip())
  plt.imshow(img)
  plt.show()

The OCR seems to work pretty well.  
We can now try it on all the LUCA images.

In [None]:
img_with_text = []
for i, path in enumerate(luca_frames_paths[:500]):
  image = Image.open(path)
  img = np.asarray(image)
  detected_text = pytesseract.image_to_string(img).strip()
  if detected_text != "":
    print(i, "has text:")
    print(detected_text)
    plt.imshow(img)
    plt.show()
    img_with_text.append(i)
img_with_text

And some frames from another movie:

In [None]:
for i, path in enumerate(tslop_frames_paths[:20]):
  image = Image.open(path)
  img = np.asarray(image)
  detected_text = pytesseract.image_to_string(img).strip()
  print(detected_text)
  plt.imshow(img)
  plt.show()

We see here that images with text are often detected, however the OCR also detects text in images without any.

We see that we could use this method to remove images with text, by filtering images in which the OCR detects some real words. This would be a nice filter, even if filtering by hand seems much better, as we could not detect some images with text above and filter some without anything on them (or a sign etc.).

## Detect blur

We will try now to detect blurry images.

In [None]:
for path in tslop_frames_paths[:30]:
  image = Image.open(path)
  img = np.asarray(image)
  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  fm = cv2.Laplacian(gray, cv2.CV_64F).var()
  print("fm =", fm)
  plt.imshow(img)
  plt.show()

In [None]:
for path in luca_frames_paths[:30]:
  image = Image.open(path)
  img = np.asarray(image)
  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  fm = cv2.Laplacian(gray, cv2.CV_64F).var()
  print("fm =", fm)
  plt.imshow(img)
  plt.show()

The Laplacian method doesn't seem to work well on this kind of images.

In [None]:
def detect_blur_fft(image, size=60):
	# grab the dimensions of the image and use the dimensions to
	# derive the center (x, y)-coordinates
  (h, w) = image.shape
  (cX, cY) = (int(w / 2.0), int(h / 2.0))
  # compute the FFT to find the frequency transform, then shift
  # the zero frequency component (i.e., DC component located at
  # the top-left corner) to the center where it will be more
  # easy to analyze
  fft = np.fft.fft2(image)
  fftShift = np.fft.fftshift(fft)
  # zero-out the center of the FFT shift (i.e., remove low
  # frequencies), apply the inverse shift such that the DC
  # component once again becomes the top-left, and then apply
  # the inverse FFT
  fftShift[cY - size:cY + size, cX - size:cX + size] = 0
  fftShift = np.fft.ifftshift(fftShift)
  recon = np.fft.ifft2(fftShift)
  # compute the magnitude spectrum of the reconstructed image,
  # then compute the mean of the magnitude values
  magnitude = 20 * np.log(np.abs(recon))
  mean = np.mean(magnitude)
  # the image will be considered "blurry" if the mean value of the
  # magnitudes is less than the threshold value
  return mean

In [None]:
for path in tslop_frames_paths[:30]:
  image = Image.open(path)
  img = np.asarray(image)
  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  value = detect_blur_fft(gray)
  print("val =", value)
  plt.imshow(img)
  plt.show()

In [None]:
for path in luca_frames_paths[:30]:
  image = Image.open(path)
  img = np.asarray(image)
  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  value = detect_blur_fft(gray)
  print("val =", value)
  plt.imshow(img)
  plt.show()

The fft method doesn't seem to work either. We must do that by hand.

**Remark:** there are only a few blurry images, maybe it's not important to delete them.