# Inference Demo based on Video Stylization

Here we provide an inference demo based on [video stylization](https://github.com/rnwang04/nuc_demo/tree/main/FSPBT-Image-Translation) to show the acceleration effect of BigDL-Nano. Belows result are all obtained on a [Intel® Core™ i9-12900 Processor](https://www.intel.com/content/www/us/en/products/sku/134597/intel-core-i912900-processor-30m-cache-up-to-5-10-ghz/specifications.html).

## Prepare the environment

We recommend you to use [Miniconda](https://docs.conda.io/en/latest/miniconda.html) to prepare the environment.

```python
conda create -n nano python=3.7 setuptools=58.0.4  # "nano" is conda environment name, you can use any name you like.
conda activate nano
pip install --pre --upgrade bigdl-nano[pytorch]

pip install torch==1.12.0 torchvision --extra-index-url https://download.pytorch.org/whl/cpu 
# Necessary packages for inference accelaration
pip install neural-compressor==1.12
```
Initialize environment variables with script `bigdl-nano-init` installed with bigdl-nano.
```bash
source bigdl-nano-init
```
You need to install `ffmpeg` to deal with video input.
```bash
sudo apt install ffmpeg # for Linux
brew install ffmpeg # for Mac
```                                                                                           

#  Prepare Data
You should unzip corresponding data(which is provided by the author of https://github.com/rnwzd/FSPBT-Image-Translation) under directory `data/` first:

In [None]:
!unzip webcam.zip

## Training first
Before inference, you should train your model first by simply runing `python nano_train.py` which is accelerated by BigDL-Nano Trainer. This process will cost about 7 minutes, then you will get a generator model(generator.pt) under directory `./data/webcam/models/`.

In [None]:
!python train.py

## Load model and acceleration by InferenceOptimizer
Then you can load your trained model and accelerate it by InferenceOptimizer. Here we take InferenceOptimizer.quantize for example.

In [1]:
from torch.utils.data import DataLoader
import torch
from tqdm import tqdm
import torchvision.transforms as transforms
from pathlib import Path
from data import read_image_tensor, write_image_tensor, ImageDataset
from train import data_path, model_save_path

# load model
device = 'cpu'
dtype = torch.float32

generator = torch.load(model_save_path/"generator.pt")
generator.eval()
generator.to(device, dtype)

# prepare calib dataloader
input_dir = data_path/'input'
file_paths = [file for file in input_dir.iterdir()]

params = {'batch_size': 1,
          'num_workers': 8,
          'pin_memory': True}

dataset = ImageDataset(file_paths, transform=None)
loader = DataLoader(dataset, **params)

from bigdl.nano.pytorch import InferenceOptimizer
model = InferenceOptimizer.quantize(model=generator,
                                    calib_dataloader=loader)

## Helper function
Below are three helper functions used for displaying video and mutual conversion between video and image.

In [2]:
import os
from IPython.display import HTML
from base64 import b64encode
from PIL import Image as PILImage
import cv2
from cv2 import VideoCapture, imwrite
import numpy as np


def display_video(file_path, width=512):
    # Source: https://colab.research.google.com/drive/1_kbRZPTjnFgViPrmGcUsaszEdYa8XTpq#scrollTo=DxlIqGfATvvj&line=1&uniqifier=1
    compressed_video_path = 'comp_' + file_path
    if os.path.exists(compressed_video_path):
        os.remove(compressed_video_path)
    os.system(f'ffmpeg -i {file_path} -vcodec libx264 {compressed_video_path}')
    
    mp4 = open(compressed_video_path, 'rb').read()
    data_url = 'data:simul2/mp4;base64,' + b64encode(mp4).decode()
    return HTML("""
        <video width={} controls>
            <source src="{}" type="video/mp4">
        </video>
        """.format(width, data_url))


def imgs_to_video(output_dir, video_name='demo_output.mp4', fps=15):
    # Refer to: https://stackoverflow.com/questions/52414148/turn-pil-images-into-video-on-linux
    imgs = []
    for image_name in os.listdir(output_dir):
        if image_name.endswith('.jpg'):
            imgs.append(PILImage.open(output_dir + image_name))
    video_dims = (imgs[0].width, imgs[0].height)
    fourcc = cv2.VideoWriter_fourcc(*'DIVX')
    video = cv2.VideoWriter(video_name, fourcc, fps, video_dims)
    for img in imgs:
        tmp_img = img.copy()
        video.write(cv2.cvtColor(np.array(tmp_img), cv2.COLOR_RGB2BGR))


def video_to_imgs(video_name='demo_output.mp4', image_dir="./images/", fps=15):
    video_capture = VideoCapture(video_name)
    number = 0
    while True:
        flag, frame = video_capture.read()
        if flag is False:
            break
        w, h = frame.shape[0], frame.shape[1]
        if w % 4 != 0 or h % 4 != 0:
            NW = int((w // 4) * 4)
            NH = int((h // 4) * 4)
            frame = cv2.resize(frame, (NW, NH))
        imwrite(image_dir + str(0000+number)+'.jpg', frame)
        number += 1

## Demo input video
Below is a demo input video from [UCF101 Dataset](https://www.crcv.ucf.edu/data/UCF101.php).

In [3]:
display_video("demo.mp4")

ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --e

## Inference with input video
Below code provide inference code in `image_stylization` function, and provide multi-processes implementation for inference. 

You can use any processes you want by modify this line:

`num_processes = 4  # specify number of processes`

If you want to test the performance of original model, just replace `outputs = model(inputs)` with `outputs = generator(inputs)` in `image_stylization` function.

In [9]:
import multiprocessing
from pathlib import Path
import time


def image_stylization(img_list):
    params = {'batch_size': 1,
            }
    dataset = ImageDataset(img_list, transform=None)
    loader = DataLoader(dataset, **params)
    output_dir = Path("./video-output/")
    with torch.no_grad():
        for inputs, names in tqdm(loader):
            inputs = inputs.to(device, dtype)
            # original model
            # outputs = generator(inputs)
            # accelerated model
            outputs = model(inputs)
            for k in range(len(outputs)):
                write_image_tensor(outputs[k], output_dir/names[k])
            del outputs


if __name__ == "__main__":
    input_video = "demo.mp4"
    num_processes = 4  # specify number of processes
    
    image_dir = "./video2pic/"
    output_dir = "./video-output/"
    os.makedirs(image_dir, exist_ok=True)
    os.makedirs(output_dir, exist_ok=True)

    video_to_imgs(input_video, image_dir)

    frames = len(os.listdir(image_dir))

    if num_processes > 1:
        print("{} processes is used.".format(num_processes))
        pool = multiprocessing.Pool(num_processes)
        st = time.perf_counter()
        slice = [frames // num_processes] * num_processes
        remainder = frames % num_processes
        if remainder > 0:
            for i in range(remainder):
                slice[i] += 1
        # assign img to each process
        num = 0
        total_img_list = []
        for i in range(num_processes):
            img_list = []
            for j in range(slice[i]):
                img_path = Path("{0}{1}.jpg".format(image_dir, num + j))
                img_list.append(img_path)
            total_img_list.append(img_list)
            num += slice[i]
        for img_list in total_img_list:
            pool.apply_async(image_stylization, args=(img_list,))
        pool.close()
        pool.join()
        end = time.perf_counter()
        print("Generation costs {}s".format(end - st))
    else:
        print("Only one processes is used.")
        img_list = []
        for x in range(frames):
            img_path = Path("{0}{1}.jpg".format(image_dir, x))
            img_list.append(img_path)
        st = time.perf_counter()
        image_stylization(img_list)
        end = time.perf_counter()
        print("Generation costs {}s".format(end - st))

    # turn image to gif
    imgs_to_video(output_dir, "demo_output.mp4", fps=24)

4 processes is used.


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:17<00:00,  1.99it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:17<00:00,  1.99it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:17<00:00,  2.02it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:17<00:00,  2.01it/s]


Generation costs 17.544292122940533s


OpenCV: FFMPEG: tag 0x58564944/'DIVX' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


### Time cost

The following table shows the speed of video stylization before and after BigDL-Nano acceleration on a [Intel® Core™ i9-12900 Processor](https://www.intel.com/content/www/us/en/products/sku/134597/intel-core-i912900-processor-30m-cache-up-to-5-10-ghz/specifications.html) with different process number. Each latency result is calculated by averaging 20 repeated experiments. 

| model      | Process=1 |  Process=4     | 
| ----------- | ----------- | ----------- | 
| Original      | 35.57s      |  30.90s      |
| Qutization(int8)   | 30.35s  |    18.53s   |

## demo output

Below is the output of video stylization model. Is it cool ?

In [12]:
display_video("demo_output.mp4")

OpenCV: FFMPEG: tag 0x58564944/'DIVX' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --e