# Kaggle DeepFake Competition <a name="start">
In-depth review of the winner's code - prediction part only

Link to the competition : httmps://www.kaggle.com/c/deepfake-detection-challenge

Link to the Selim Seferbekov's winning repo : https://github.com/selimsef/dfdc_deepfake_challenge

# Table of content

1. [Imports](#imports)
2. [Model preparation ](#model)
    1. [Efficient net](#efficient)
    2. [Adding custom layers to the efficient net](#custom)
3. [Data preparation](#data)
    1. [Video frame extraction](#frame)
    2. [Face detection](#detect)
    3. [Intermediate scoring strategy](#scoring)
    4. [Image manipulation](#manip)
    5. [Data preparation's main pipeline](#pipeline)
4. [Main porcessing loop](#loop)
    1. [Per file processing](#process)
    2. [Small parenthesis](#parenthesis)
    3. [Aside thought](#aside)
    4. [Prediction](#pred)
5. [Subsidiary questions](#subs)
    1. [What does partial do ?](#partial)
    2. [Numpy linspace](#linspace)
    3. [Numpy Rand randint](#randint)
    4. [Numpy clip](#clip)
    5. [Numpy uint8 format](#uint8)
    6. [Numpy count_nonzero](#zero)
    7. [Calling float() or putting the argument dtype=torch.float32](#float)

# Imports <a name="imports"></a>

In [2]:
from functools import partial                               # Way to partially call a function and 
                                                            # keep its intermediary stage.
import os                                                   # File system navigation.
import sys                                                  # System information access.
import time                                                 # Time measure tools.
import traceback                                            # Print the stacktrace when an exception is manually caught.

import torch
from timm.models.efficientnet import tf_efficientnet_b7_ns  # We will only use the b7 ns version.
from torch.nn.modules.pooling import AdaptiveAvgPool2d      # AVG Pooling2D filter.
from torch.nn.modules.dropout import Dropout                # Dropout procedure.
from torch.nn.modules.linear import Linear                  # FullyConnected Neural Network.

from torchvision.transforms import Normalize                # Image normalization.
from facenet_pytorch.models.mtcnn import MTCNN              # Face Detection model.

from PIL import Image                                       # Image manipulation library with its own class Image.

from concurrent.futures import ThreadPoolExecutor           # Using several workers to process parrallelizable tasks.

import cv2                                                  # OpenCv for many kinds of image manipulation
import numpy as np                                          # Mathematical arrays.
import pandas as pd                                         # DataFrames.

In [None]:
# Please put the train and test video folders' relative or absolute paths in these variables :
train_video_folder = "N:/Datasets/DeepFake_Kaggle/train_sample_videos"
test_video_folder = "N:/Datasets/DeepFake_Kaggle/test_videos"
# Let's put it true to see some steps information at the execution.
verbose = True

[Go back to the top](#start)

# Model preparation  <a name="model"></a>

## Efficient net   <a name="efficient"></a>

In [None]:
# Let's only keep the params that were ultimately used
encoder_params = {"tf_efficientnet_b7_ns": 
           {"features": 2560,
              "init_op": partial(tf_efficientnet_b7_ns, pretrained=True, drop_path_rate=0.2)}
          }

[Go back to the top](#start)

## Adding custom layers to the efficient net <a name="custom"></a>

In [None]:
# We will get the face images into a EfficientNet before passing it to a last filtering step 
# and a Fully COnnected Neural NEtwork for the final prediction.
class DeepFakeClassifier(torch.nn.Module):
    def __init__(self, encoder, dropout_rate=0.0) -> None:
        super().__init__()
        self.encoder = encoder_params[encoder]["init_op"]()
        self.avg_pool = AdaptiveAvgPool2d((1, 1))
        self.dropout = Dropout(dropout_rate)
        self.fc = Linear(encoder_params[encoder]["features"], 1)

    def forward(self, x):
        # We encode the image using our EfficientNet. Please refer to the readme.
        x = self.encoder.forward_features(x)
        # We pass the resulting data to an Average Pooling convolution.
        # flatten(x) changes the size so that there all dimensions from index x are downsized to one.
        # flatten(1) means that the result will have 2 dimensions (the second wil contain all the other dimension values)
        x = self.avg_pool(x).flatten(1)
        # Explained in the readme. The dropout is turned off by the call to model.eval(). 
        # It is turned on by model.train()
        x = self.dropout(x)
        # Our Linear (or dense or fully connected) layer, with the count of features in input size and 1 output neuron,
        # which will give us the probability of the image being fake. A value equal or greater than 0.5 
        # will utltimately mean fake.
        x = self.fc(x)
        return x

In [None]:
# Using a CUDA compatible GPU boosts the speed of all the previously implemented processing.
model = DeepFakeClassifier(encoder="tf_efficientnet_b7_ns").to("cuda")

In [None]:
model.eval()

Unfortunately I could not find the weights (error 404 from the authors's Git) and I clearly don't have time to train the network. That's why we will focus on the prediction step in this notebook.

[Go back to the top](#start)

# Data preparation  <a name="data"></a>

## Video frame extraction <a name="frame"></a>

In [None]:
# This function takes an image and apply a color conversion, and potentially some cropping.
def _post_process_frame(frame):
    # First, it converts opencv's Blue-Green-Red format to the more conventional Red-Green-Blue.
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    
    # In the original repo, we have cropping according to specified insets here.
    # But it is practically always called with default values which are (0, 0) (no crop).
    
    return frame

In [None]:
# This function takes the video, extract the frames at specified indices,
# apply post_processing on it and return them all as a list of frame.
def read_frames_at_indices(path, frame_idxs):
    """Reads frames from a video and puts them into a NumPy array.

    Arguments:
        path: the video file
        frame_idxs: a list of frame indices. Important: should be
            sorted from low-to-high! If an index appears multiple
            times, the frame is still read only once.

    Returns:
        - a NumPy array of shape (num_frames, height, width, 3)
        - a list of the frame indices that were read

    Reading stops if loading a frame fails, in which case the first
    dimension returned may actually be less than num_frames.

    Returns None if an exception is thrown for any reason, or if no
    frames were read.
    """
    assert len(frame_idxs) > 0
    # Loads the video capture.
    capture = cv2.VideoCapture(path)
    # The try catch will make sure we release the capture, after an exception or not.
    # Releasing the capture is necessary to release the hardware and software resources,
    # so it is ready for the next video.
    try:
        # frames will store the kept frames.
        # idxs_read will store their indices.
        frames = []
        idxs_read = []
        for frame_idx in range(frame_idxs[0], frame_idxs[-1] + 1):
            ## Get the next frame, but don't decode if we're not using it.
            # The grab method is faster than "read" as it doesn't decode it for further processing.
            # Indeed, we either process it or completely pass it, 
            # therefore it's more optimized to use grab to pass.
            ret = capture.grab()
            # We check if the frame was correctly grabbed.
            # If not, it will exit the for loop and return None.
            if not ret:
                if verbose:
                    print("Error grabbing frame %d from movie %s" % (frame_idx, path))
                break

            ## Need to look at this frame?
            # The next frame to retrieve and memorize. 
            # For example, if 7 frames were already retrieved, then the next frame to retrieve will be in our
            # frame_idx argument at position 7.
            current = len(idxs_read)
            # If the current frame is effectively the next one we are looking for, we can retrieve it.
            if frame_idx == frame_idxs[current]:
                # The retrieve method returns a tuple with:
                # - a bool to tell if the image was correctly retrieved.
                # - the retrieved image itself.
                ret, frame = capture.retrieve()
                if not ret or frame is None:
                    if verbose:
                        print("Error retrieving frame %d from movie %s" % (frame_idx, path))
                    break
                # We have to convert opencv's BGR color format to RGB.
                frame = _post_process_frame(frame)
                # We keep this frame in the list of selected frames.
                frames.append(frame)
                # Also, we keep its index in the other separated list.
                idxs_read.append(frame_idx)
            
        # Now, we can stack our frames. Stack will simply transform a list of multidimensional arrays into
        # an array of multidimensional arrays. It does not imply dimension loss.
        if len(frames) > 0:
            return np.stack(frames), idxs_read
        if verbose:
            print("No frames read from movie %s" % path)
        return None
    except:
        if verbose:
            print("Exception while reading movie %s" % path)
            traceback.print_exc() 
        return None
    finally:
        # Thanks to the finally clause, we release the capture 
        # even if an exception was caught and after a "return".
        capture.release()

In [None]:
# This function selects the index of evenlly separated frames, passes it to the previsously implemented
# read_frames_at_indices() function, so it can return the associated frames in the wanted RGB format.
def read_frames(path, num_frames, jitter=0, seed=None):
    """Reads frames that are always evenly spaced throughout the video.

    Arguments:
        path: the video file
        num_frames: how many frames to read, -1 means the entire video
            (warning: this will take up a lot of memory!)
        jitter: if not 0, adds small random offsets to the frame indices;
            this is useful so we don't always land on even or odd frames
        seed: random seed for jittering; if you set this to a fixed value,
            you probably want to set it only on the first video
    """ 
    assert num_frames > 0
    # Simply loads the video capture, using OpenCV.
    capture = cv2.VideoCapture(path) 
    # Gets the total number of frame of the video
    frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
    
    # If the video is "empty", simply returns None
    if frame_count <= 0:
        return None
    
    # This would get us [frame_counts] frames equally spaced on a linear scale of all the frames in the video.
    # And we cant the first and last frames in it.
    frame_idxs = np.linspace(0, frame_count - 1, num_frames, endpoint=True, dtype=np.int)
    # We don't want to take only odd or even frames, so we add random jittering, reproducible thanks to the seed.
    if jitter > 0:
        # We initialize the seed of the pseudo-random generator
        np.random.seed(seed)
        # For each selected frame index, we want to add or substract the value of the  jitter argument.
        jitter_offsets = np.random.randint(-jitter, jitter, len(frame_idxs))
        # Next we add the jitters to each frame index and we pass the result to a clip function,
        # to make sure that we won't keep an out of bounds index.
        # ?BEWARE? it might take the same indices several times, especially the
        # boundaries indices, or when the total number of frames is low.
        # If we had added more jitter, we should have found a better way to make sure we don't.
        frame_idxs = np.clip(frame_idxs + jitter_offsets, 0, frame_count - 1)
    # Now we can finally get the frames of all the selected indices.
    result = read_frames_at_indices(path, frame_idxs)
    # As always, we have to close the capture to release all induced resources.
    capture.release()
    return result
    
    
    

[Go back to the top](#start)

## Face detection <a name="detect">

Our network is  ready, as weel as our video image exctracting. Now we need to extract the bounding boxes of the faces, thanks to a specific MTCNN network called Facenet. (Please refer to the readme)

In [1]:
# This function is aimed to extract all the faces, cropped, from a selection of video frames.
# The cropping of the faces is done thanks to the facenet neural network.
# The frame index selection is done thanks to the function we just made and passed as argument in a partial state.
# Please refer to the section "Subsidiary questions" to know more about the partial method.
def extract_faces(video_read_fn, video_path):
    # Lists to store infos and data about each face, as dictionaries.
    results = []

    # Our MTCNN. Its process will use the available CUDA device.
    detector = MTCNN(margin=0, thresholds=[0.7, 0.8, 0.8], device="cuda")

    # Calls the function with the video_path as argument
    result = video_read_fn(video_path)
    if result is None:
        return None
    ## Keep track of the original frames (need them later).
    my_frames, my_idxs = result
    
    for i, frame in enumerate(my_frames):
        # Our video frames have 3 dimensions, we need to get the length 
        # of the first and second to get their height and width.
        # The third is the colors channels.
        h, w = frame.shape[:2]
        # np.uint8 is a 8 bit integer which takes only one byte and can store integer values from 0 to 255.
        # This cast into 8 bytes integer makes it an array that is directly transcriptable to a pillow Image.
        img = Image.fromarray(frame.astype(np.uint8))
        # We can now resize it by half height and half width
        image = img.resize(size=[s // 2 for s in img.size])
        # Example : We started the loop with a frame of 768*1024*3 values as int64
        # we now have a frame of 384*512*3 values as uint8.

        # Uses our facenet's detect method to detect faces. 
        # We only want the bounding boxes of the faces, not their landmarks (like mouth, eyes and nose).
        # probs is the confidence the model gives to each box showing a face.
        batch_boxes, probs = detector.detect(img, landmarks=False)

        if batch_boxes is None:
            # No faces were detected
            return None

        faces = []
        scores = []

        for bbox, score in zip(batch_boxes, probs):
            # bbox is a tuple of 4 integer :
            # 1- x-coord of the left edge
            # 2- y-coord of the bottom edge
            # 3- x-coord of the right edge
            # 4- y-coord of the top edge
            if bbox is not None:
                # Since we divided the size of our original image by 2,
                # we have to multiply the box coordinates by 2 to retrieve
                # the predicted face locations from the original image.
                xmin, ymin, xmax, ymax = [int(b * 2) for b in bbox]
                # width of the box = left edge's x - right edge's x
                w = xmax - xmin
                # height of the box = top edge's y - bottom edge's y
                h = ymax - ymin
                # We are going to keep only a third of those dimension sizes for incoming selection.
                p_h = h // 3
                p_w = w // 3
                # Our frame being a numpy array, we can slice it according to two dimensions ;
                # this action actually produce a cropping.
                # We are actually going to crop a larger box than the detected face bounding boxes.
                # the max(ymin - p_h, 0) insures we don't try to slice with negative row index,
                # which would give us an error. We do the same with columns index.
                # Oddly, the author didn't do the 'min' warranty of the top row index and right column index.
                crop = frame[max(ymin - p_h, 0):ymax + p_h, max(xmin - p_w, 0):xmax + p_w]
                # We save this cropped image of a face in the faces list. The crop comes from the original image, 
                # not the smaller one we gave to the facenet, because we want the better definition.
                faces.append(crop)
                scores.append(score)

        # We keep the video reference, the face image, some meta infos, the confidence score and the face image
        # in a convenient dictionary.
        frame_dict = {"video": video_path,
                      "frame_idx": my_idxs[i],
                      "frame_w": w,
                      "frame_h": h,
                      "faces": faces,
                      "scores": scores}
        # We append the result list with this dictionary, insuring we have the same format for all videos.
        results.append(frame_dict)
    # We now have a list of all face images from our video.
    return results

[Go back to the top](#start)

## Intermediate scoring strategy <a name="scoring">

In [None]:
# This function highers the confidence for fakeness above a fixed threshold,
# and also higher the confidence for non-fakeness below a certain percentage of low value.
def confident_strategy(pred, threshold=0.8):
    # We cast any iterable to a convenient numpy array
    pred = np.array(pred)
    # We are going to need to get the size twice, so let's keep it in a variable.
    sz = len(pred)
    # We count how many values of pred are above our threshold.
    fakes = np.count_nonzero(pred > threshold)
    # If more than 40 % of fakes of more than 11 fakes, 
    # we higher the probability of detecting the whole video as fake
    ## 11 frames are detected as fakes with high probability
    if fakes > sz // 2.5 and fakes > 11:
        # We return the mean of all the fakes, which gives a high value
        return np.mean(pred[pred > threshold])
    # If 90% or more of predictions are less than 0.2, 
    # we return their mean
    elif np.count_nonzero(pred < 0.2) > 0.9 * sz:
        return np.mean(pred[pred < 0.2])
    else:
        # Else, we simply return the mean of all predictions.
        # Thus, medium prediction confidence, between 0.2 and 0.8
        # won't be changed by this strategy.
        return np.mean(pred)

[Go back to the top](#start)

## Image manipulation <a name="manip">

In [None]:
# This function makes our rectangular crop of a face fit inside a rectangle of fixed sides. 
# It preserves the aspect ratio by calculating the target sizes beforehand.
# Please refer to the readme for more explanations.
def isotropically_resize_image(img, size):
    # As usual, height and width are the length of the first two dimensions of our images
    h, w = img.shape[:2]
    # If the greatest size already has the size of our target, we can simply return the raw image
    # as we can't do any isotropic deformation.
    if max(w, h) == size:
        return img
    # We are going to calculate the goal width and height. It depends on the ratio 
    # and which side is greater. We keep the scale value to apply.
    if w > h:
        scale = size/w
        h *= scale
        w = size
    else:
        scale = size/h
        w *= scale
        h = size
    # We choose the interpolation method according to the size of the resizing :
    # if we want the images to get bigger (meaning target size > max(x,h)), 
    # then we use INTER_AREA interpolation, else we use INTER_CUBIC.
    interpolation = cv2.INTER_AREA if scale >1 else cv2.INTER_CUBIC
    # The interpolation will be done with OpenCV, and we can directly return the resulting image.
    return cv2.resize(img, (int(w), int(h)), interpolation = interpolation)
    

In [None]:
# This function will make sure that we obtain an image of fixed size,
# and all the original values are spread around the center of the 2D image arrays.
# At this step, the input image should not be greater than the target size.
def put_to_center(img, target_size):
    # We make sure our image doesn't exceed the size limit in one or both sizes.
    img = img[:target_size, : target_size]
    # We initialize an empty squared array image of size input_size, with three color channels
    # and of type unsigned 8 bits integer.
    image = np.zeros((target_size, target_size, 3), dtype=np.uint8)
    # For both sides, the difference between the image size and target size
    # gives us the length of the "emptiness". We need to divide it by two
    # to make sure we have the same emptiness at the left and right (respectively top and bottom)
    # of the image. In other words, we center it.
    start_w = (target_size-img.shape[1]) // 2
    start_h = (target_size-img.shape[0]) // 2
    # All is remaining is slicing the empty image so that img will fit around the exact center,
    # and we return it.
    image[start_h:start_h + img.shape[0], start_w: start_w + img.shape[1], :] = img
    return image

[Go back to the top](#start)

## Data preparation's main pipeline <a name="pipeline">

In [None]:
# We are about to extract 32 frames from each video.
frames_per_video = 32
# This is the callable function to extract our video, but it doesn't call it right now.
video_read_fn = lambda x: read_frames(x, num_frames=frames_per_video)
# For the next step, we want to keep the pipeline of process to extract faces from the video.
# The original author used a class architecture, but to stay relevant with this current projet and the notebook
# format, we are going to use the partial function tool instead.
# It will only need the argument video_path to be called.
face_extractor = partial(extract_faces,  video_read_fn)
# This value represents the size of the input of our custom predictor.
# It was most probably got after model selection and tuning. Let's keep it.
# If you intend to launch this notebook, and if you don't have a GPU with large memory (8 GB or more),
# you should lower it.
input_size = 380

# Our list of videos. We keep all videos with mp4 extension, from our video folder.
test_videos = sorted([test_video_folder+"/"+x for x in os.listdir(test_video_folder) if x[-4:] == ".mp4"])
train_videos = sorted([train_video_folder+"/"+x for x in os.listdir(train_video_folder) if x[-4:] == ".mp4"])

[Go back to the top](#start)

# Main porcessing loop <a name="loop">

## Per file processing <a name="process">

In [None]:
# The author chose to manually store the means and stds, 
# which must be done by color chanel for pytorch image transformation.
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
# Normalize is a tensor transformation, thus we have to pass the 
# means and standard deviations as parameter, then call
# the forward method, or call the tensor as a function.
normalize_transform = Normalize(mean, std)

# This is our final and most main-purposed function.
# We still need it as a callable function so that we can take advantage of concurrent working.
def process_file(i):
    video = test_videos[i]
    # We chose 32 frames per video. The author multiplied it by 4, so you'd better have a RTX Titan or 3090 at least.
    batch_size = frames_per_video*4
    try:
        # This time, the complete function is called to extract the face.
        # In each iteration, the function will be called at the same state, 
        # meaning the first argument we previously gave it will remain in memory.
        faces = face_extractor(video)
        # If no faces were found, we simply store 0.5, which is the decision boundary for classic classification.
        if len(faces) == 0:
            return 0.5
        # Here, the author made a weird choice to me. I will put the block after an impossible condition test
        # so it is never reached but we can still read it
        # We initialize a batch of empty images, with the format: unsigned 8 bits int (0 to 255), as usual.
        if False:
            x = np.zeros((batch_size, input_size, input_size, 3), dtype = np.uint8)
            n = 0
            # We are going to process each individual face, stored in our list of convenient dictionaries.
            for frame_data in faces:
                for face in frame_data["faces"]:
                    # To better predict with faces and have the least disparity of sizes
                    # we apply an isotropic transformation
                    resized_face = isotropically_resize_image(face, input_size)
                    # Then we center the image inside a fixed size image.
                    resized_face = put_to_center(resized_face, input_size)
                    # At this step, all images have the exact same shape (input_size, input_size, 3).
                    # Question
                    # This part is a mistery. Why pass instead of leaving all the loops ? 
                    if n+1 < batch_size:
                        x[n] = resized_face
                        n += 1
                    else:
                        pass
        # Instead of doing what the author did,
        # we will leave both the loops when the batch is full, by using a function.
        #######################################################################
        # Please read and execute the next cell, then come back to this point.#
        #######################################################################
        x, n = resize_batch(faces, batch_size)

        # We go back to the author train of thoughts.
        # We test that we have a non-empty batch.
        if n > 0:
            # We can put our batch on the cuda compatible device, hopefully a powerful GPU.
            # GPU are optimized and more performant with 32 bits floats, so we can cast it to floats.
#             x = torch.tensor(x, devide="cude").float()
            # I personally prefer to call it like this. Please refer to the section "Subsidiary questions".
            x = torch.tensor(x, device="cuda", dtype=torch.float32)
            ## Preprocess the image
            # For the incoming pass to the models we previously selected, which deals with images data shaped
            # in a different way, we need to permute the dimensions of our image, 
            # so that (batch, height, width, color) becomes (batch, color, height, width)
            x = x.permute((0,3,1,2))
            # Then, for each image of the batch :
            for i in range(len(x)):
                # We call normalize transform like we would do with a standard function:
                # it takes the input tensor and returns a normalized output tensor,
                # for each image in the batch, with the fixed mean and std values.
                x[i] = normalize_transform(x[i] /255.)
            # We call no_grad procedure to specify to torch 
            # that we don't want to update any tensor weight,
            # because we finally use our model, in sequence.
            with torch.no_grad():
                # 
                # half() casts the tensor value to half precision (16 bits floats, from 32 bits floats).
                # The reason to do that is it takes half less memory, and might avoid an Out of Memory Error.
                # I still need to figure out why it provokes a RuntimeError in my setup.
                y_pred = model(x[:n].half())
                # squeeze() gets rid of any dimension of size 1.
                # Here : y_pred has a shape (n,1), after squeezeing, it will have shape (n),
                # simply a tensor array of size n. Then we call the sigmoid function
                # which is 1 / (1 + e(-x)), also equal to e(x) / (e(x)+1)
                y_pred = torch.sigmoid(y_pred.squeeze())
                # bpred is simply the numpy version of the one dimensional tensor pred.
                # It needs to be retrieved from the cpu.
                bpred = y_pred[:n].cpu().numpy()
                # Now we pass the predicted result to our confident strategy.
                # This will higher the confidence of the predictions
                # near extreme values (near 0 and 1).
                # This is the final array of prediction. That's what we will return.
                return confident_strategy(bpred)
    except Exception as e:
        # In case we have errors, we don't want to lock the other workers.
        # We simply return the decision boundary, here again
        traceback.print_exc() 
        return 0.5

In [None]:
# Here we can "break" the loop instead of processing all remaining face images of the video.
def resize_batch(faces, batch_size):
    x = np.zeros((batch_size, input_size, input_size, 3), dtype=np.uint8)
    n=0
    for frame_data in faces:
        for face in frame_data["faces"]:
            resized_face = isotropically_resize_image(face, input_size)
            resized_face = put_to_center(resized_face, input_size)
            if n + 1 < batch_size:
                x[n] = resized_face
                n += 1
            else:
                return x, n 
    return x, n 

[Go back to the top](#start)

## Aside thought <a name="aside">

The next and final step towards prediction involves multiple workers. To optimize this step, we actually want to know how many cpu threads are available.

In [None]:
if sys.platform == 'win32':
    n_threads = (int)(os.environ['NUMBER_OF_PROCESSORS'])
    print(f"Windows system with {n_threads} threads.")
else:
    n_threads = (int)(os.popen('grep -c cores /proc/cpuinfo').read())
    print(f"System with {n_threads} threads.")

[Go back to the top](#start)

## Prediction <a name="pred">

As said previously, we won't execute it. You can, but you'll get many skipped steps because the face extractor is not trained.

In [None]:
# We are going to keep track of the time taken for the prediction.
# For this, we keep the value of the current time, and we'll substract it from the value 
# of the current time when the prediction process is completed.
# Times are stored Unix format, which is in seconds from January, 1st 2070.
# For example : 1st January 2020 at 00:00is the value 1577836800.
# In 32 bits (unsigned) systems, we can relax until the max limit of July 2nd 2106 at 6:28am (UTC).
# In 64 bits, it would reset in about 586 billion years. We should have 128 bits when this time come
# (perhaps even 256 bits !)
stime = time.time()

# Finally we are going to make all the prediction, using a ThreadPoolExecutor
# To spread video processing to multiple threads (using python workers).
# We can put the number of threads we found during the precedent step.
# Please refer to the readme for more 
with ThreadPoolExecutor(max_workers=n_threads) as ex:
    # map is a method of the executor. It will create a worker for each video,
    # thanks to the indexing.
    predictions = ex.map(process_file, range(len(test_videos)))
# We format the predictions in a convenient format.
submission_df = pd.DataFrame({"filename":test_videos, "label": predictions})
# We also store the results in a local csv file.
submission_df.to_csv("submission.csv", index=False)
# The current time - the start time == elapsed time.
print("Elapsed:",time.time()-stime)

[Go back to the top](#start)

# Subsidiary questions <a name="subs">

## What does partial do ? <a name="partial">

In [26]:
def simili_partial(func, *args):
    args = list(args)
    def wrapper(*extra_args):
        # Basically, it adds extra args to the original args list.
        new_args = args+list(extra_args)
        return func(*new_args)
    # And it return a callable function, not the call to its return.
    return wrapper

In [27]:
partial_sum = simili_partial(sum, [1,10,100])
print(partial_sum)

<function simili_partial.<locals>.wrapper at 0x000002971D99C708>


In [28]:
print(partial_sum(1000))

1111


In [29]:
# sum([1,10,111]) is still in memory
print(partial_sum(10000))
print(partial_sum(1))

10111
112


In [30]:
p = partial(sum, [1,10, 100])
print(p())
print(p(1000))
print(p(1))

111
1111
112


In [3]:
cust_sum = lambda a,b : a+b
pp = partial(cust_sum, 15)

In [4]:
pp

functools.partial(<function <lambda> at 0x0000025B3AF69D38>, 15)

In [5]:
pp(3)

18

[Go back to the top](#start)

## Numpy linspace <a name="linspace">

In [34]:
print(np.linspace(0,90, num=10))

[ 0. 10. 20. 30. 40. 50. 60. 70. 80. 90.]


In [35]:
print(np.linspace(start=0, stop=3, num=3, endpoint=True))
print(np.linspace(start=0, stop=3, num=3, endpoint=False))

[0.  1.5 3. ]
[0. 1. 2.]


In [36]:
print(np.linspace(0,3,3))
print(np.linspace(0, 3, 3, dtype=int))

[0.  1.5 3. ]
[0 1 3]


[Go back to the top](#start)

## Numpy Rand randint <a name="randint">

In [38]:
print(np.random.randint(low=0, high=100, size=10))
print(3 in np.random.randint(low=0, high=3, size=100000))

[95 14 22  7 54 20 34 43 60 58]
False


In [39]:
print(np.random.randint(low=0, high=100, size=(4, 3, 2)))

[[[81 42]
  [42 26]
  [36 75]]

 [[40 45]
  [93 98]
  [ 4 29]]

 [[66 42]
  [54 85]
  [71 75]]

 [[70 88]
  [95 42]
  [11 16]]]


[Go back to the top](#start)

## Numpy clip <a name="clip">

In [40]:
np.clip(np.array([1,2,3,4,15000]), a_min=2, a_max=10)

array([ 2,  2,  3,  4, 10])

[Go back to the top](#start)

## Numpy uint8 format <a name="uint8">

In [41]:
an_int, another_int = np.array([128]), np.array([-256])
print(an_int, another_int)
print(an_int.astype(np.uint8), another_int.astype(np.uint8))

[128] [-256]
[128] [0]


In [42]:
an_int, another_int = np.array([256]), np.array([257])
print(an_int, another_int)
print(an_int.astype(np.uint8), another_int.astype(np.uint8))
print(256%256, 257%256)
del(an_int, another_int)

[256] [257]
[0] [1]
0 1


[Go back to the top](#start)

## Numpy count_nonzero <a name="zero">

In [19]:
test_array = np.random.rand(1000)
threshold = 0.8

In [20]:
%%timeit
r1 = (test_array>threshold).sum()

4.97 µs ± 150 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [21]:
%%timeit
r2 = np.count_nonzero(test_array>threshold)

2.37 µs ± 69 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [22]:
(test_array>threshold).sum() == np.count_nonzero(test_array>threshold)

True

[Go back to the top](#start)

## Calling float() or putting the argument dtype=torch.float32 <a name="float">

In [23]:
%%timeit
c = np.random.randint(256, size=(16, 96, 128, 3))
ct=torch.tensor(c, device="cuda", dtype=torch.float32)
try:
    del(ct)
except:
    pass
finally:
    torch.cuda.empty_cache()

3.64 ms ± 38.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [24]:
%%timeit
c = np.random.randint(256, size=(16, 96, 128, 3))
ct=torch.tensor(c, device="cuda").float()
try:
    del(ct)
except:
    pass
finally:
    torch.cuda.empty_cache()

3.41 ms ± 48.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


[Go back to the top](#start)