# Inference notebook for DFDC 
Short notebook to run inference on Deep Fake Detection Challenge test set in submission
### Imports
Seperated in blocks:
- Python imports
- Pytorch/torchvision imports
- User imports**

** Note: As well as our user submitted code, we are rllying on [facenet-pytorch](https://github.com/timesler/facenet-pytorch) for the preprocessing stage to isolate people's faces

In [59]:
%load_ext autoreload
%autoreload 2
import os
import sys
import pandas as pd
import numpy as np
import yaml
from tqdm import tqdm

import torch
from torch.utils.data import DataLoader
from torchvision import transforms

sys.path.insert(1, '../')
import dfdet as dfd
from facenet_pytorch import MTCNN

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Config
Set the device for processing (should be gpu for this analysis) and whether or not to run the preprocessing stage.  This second switch is mostly there for debugging the second stage after preprocessing

In [None]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
preprocess = True

### Index data
Index the test data and return it in data-frame compatible with our defined data-sets

In [67]:
test_path = '../deepfake-detection'
def index_test_dir(test_path=''):
    ''' Index files and create data frame
    Parameters
    ----------
    test_path : str
        Path to directory with test files
    Returns
    -------
    df : pd.DataFrame
        DataFrame for preprocessing
    '''
    path, dirs, files = next(os.walk('{}/test_videos'.format(test_path)))
    df = pd.DataFrame(files, columns=['File'])
    df['split'] = 'test_videos'
    df['label'] = -1
    return df

## Preprocessing
Preprocessing stage.  Feed to the files to face-net pytorch to isolate faces and save the desired frames to a local directory "temp"

In [None]:
if preprocess:
    df = index_test_dir(test_path)
    with open('../config_files/preprocess.yaml') as f:
        config = yaml.load(f)
    mt = config['mtcnn']
    mtcnn = MTCNN(
        image_size=mt['image_size'], margin=mt['margin'],
        min_face_size=mt['min_face_size'], thresholds=mt['thresholds'],
        factor=mt['factor'], post_process=mt['post_process'],
        device=device
    )
    faces_df = dfd.preprocess_df(df=df, mtcnn=mtcnn, path=test_path, 
                                 outpath='./temp', n_seconds=6, debug=False)
    faces_df.to_csv('./temp/faces_metadata.csv')
    del mtcnn, df
else:
    faces_df = pd.read_csv('./temp/faces_metadata.csv') 

### Re-indexing
Re-index the dataframe to allow the DataLoader to reference the file names.  Not needed but added safegaurd for correct submission

In [None]:
faces_df['label'] = faces_df.index

### Deep Fake Detector
Create deep fake detector model and load the pre-trained weights

In [64]:
model = dfd.ConvLSTM(num_classes=1, attention=True, encoder='ResNet').to(device)
chpt = torch.load('../checkpoints/best_model.pth.tar')
model.load_state_dict(chpt['model'])
print(chpt['description'])

best model loss: 0.3598859710824972


### Data
Create dataset and dataloader using the correct transformations for input to the model

In [65]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])
batch_size = 16
testset = dfd.DFDC_Dataset(df=faces_df, transform=transform, path='./temp')
testloader = DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=16)

## Evaluation
Inference step to produce predictions for the test set and return a dataframe with (File, probability) pairs

In [68]:
def dfdc_evaluation(df=None, model=None, dataloader=None):
    ''' Evaluate model on dataset
    Parameters
    ----------
    df : pd.DataFrame
        DataFrame associated to pre-processed data
    model : torch.Module
        Pytorch model for inference
    dataloader : torch.utils.data.DataLoader
        Data-loader for pre-processed data
    Returns
    -------
    probabilities : pd.DataFrame
        DataFrame with (filename, probaility) pairs
    '''
    model.eval()
    model.to(device)
    probabilities = []
    for idx, batch in enumerate(dataloader):
        frames, lbls = batch
        frames = frames.to(device)
        with torch.no_grad():
            model.lstm.reset_hidden_state()
            predictions = model(frames)
        for idx in range(predictions.shape[0]):
            probabilities.append(
                {'File' : df['File'][lbls[idx].item()], 'Probability': predictions[idx].cpu().item()}
            )
    return pd.DataFrame(probabilities)

In [56]:
probabilities = dfdc_evaluation(df=faces_df, model=model, dataloader=testloader)

### Save
Create predictions csv for submission to competition

In [58]:
probabilities.to_csv('submission.csv', index=False, columns=False)

TypeError: 'bool' object is not iterable