# 🎨 Video Neural Style Transfer - Interactive Notebook

This notebook allows you to apply a feedforward neural style transfer model to a video.

You can:
- Choose a video file from the input directory or upload your own
- Select a pretrained style model
- Adjust the width and smoothing settings
- Stylize the video and download the result

In [None]:
import os
import torch
import cv2
import ipywidgets as widgets
import matplotlib.pyplot as plt
from IPython.display import display, Video
from utils.utils import frame_to_tensor, post_process_image, print_model_metadata
from utils.jupyter_parsing import parse_uploaded_file
from models.definitions.transformer_net import TransformerNet

## 🗂️ Paths and Video Options

In [None]:
# Directory paths
content_path = "data/input"
output_path = "data/output"
model_path = "models/binaries"

os.makedirs(content_path, exist_ok=True)
os.makedirs(output_path, exist_ok=True)

available_videos = sorted([f for f in os.listdir(content_path) if f.lower().endswith(('.mp4', '.mov'))])
available_styles = sorted([f for f in os.listdir(model_path) if f.endswith('.pth')])

# Upload or choose video
use_uploaded = widgets.Checkbox(value=False, description="Upload your own video")
uploader = widgets.FileUpload(accept=".mp4,.mov", multiple=False)

video_dropdown = widgets.Dropdown(options=available_videos, description="Choose Video:")
style_dropdown = widgets.Dropdown(options=available_styles, description="Choose Style:")

# Parameter controls
smoothing_slider = widgets.FloatSlider(value=0.3, min=0.0, max=1.0, step=0.05, description="Smoothing:")
verbose_checkbox = widgets.Checkbox(value=False, description="Verbose")

# Display widgets
display(use_uploaded, uploader)
display(video_dropdown, style_dropdown, smoothing_slider, verbose_checkbox)

## 📂 Resolve Video and Model Inputs

In [None]:
if use_uploaded.value and uploader.value:
    input_video_name = parse_uploaded_file(uploader, content_path)
    print(f"Uploaded video saved as: {input_video_name}")
else:
    input_video_name = video_dropdown.value

style_model_name = style_dropdown.value

## 🧐 Stylize the Video

In [None]:
# Setup device
if torch.backends.mps.is_available():
    device = torch.device("mps")
elif torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

# Load model
model = TransformerNet().to(device)
checkpoint = torch.load(os.path.join(model_path, style_model_name), map_location=device)
model.load_state_dict(checkpoint["state_dict"])
model.eval()

if verbose_checkbox.value:
    print_model_metadata(checkpoint)

# Open video
video_path = os.path.join(content_path, input_video_name)
cap = cv2.VideoCapture(video_path)

fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

# Prepare output writer
output_filename = f"styled_{os.path.splitext(input_video_name)[0]}_{os.path.splitext(style_model_name)[0]}.mp4"
output_path_full = os.path.join(output_path, output_filename)
out_writer = cv2.VideoWriter(output_path_full, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))

# Stylize video frame by frame
prev_stylized = None
alpha = smoothing_slider.value

from tqdm.notebook import tqdm
with torch.no_grad():
    for _ in tqdm(range(total_frames), desc="Stylizing frames"):
        ret, frame = cap.read()
        if not ret:
            break

        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        tensor = frame_to_tensor(rgb, device, should_normalize=True)
        output_tensor = model(tensor).cpu().numpy()[0]
        stylized = post_process_image(output_tensor)

        if prev_stylized is not None and alpha > 0:
            stylized = cv2.addWeighted(stylized, 1 - alpha, prev_stylized, alpha, 0)
        prev_stylized = stylized.copy()

        bgr_output = cv2.cvtColor(stylized, cv2.COLOR_RGB2BGR)
        out_writer.write(bgr_output)

cap.release()
out_writer.release()

print(f"\nStylized video saved to: {output_path_full}")

## 🎥 View Output Video

In [None]:
Video(output_path_full, embed=True, width=700)

### ⚠️ Note on Video Playback

Due to browser and Jupyter notebook limitations, certain video formats or encodings (especially ones not using H.264) may not play correctly **inside the notebook**, especially when using VSCode or some Jupyter environments.

However, the **stylized video is correctly saved to disk** and should play normally when opened with:
- File Explorer / Finder
- VLC or any standard media player
- Browsers (after converting with H.264 using `ffmpeg`)

If playback fails inside the notebook, try running this in terminal to re-encode:
```bash
ffmpeg -i your_video.mp4 -vcodec libx264 -pix_fmt yuv420p compatible_video.mp4

## 🔭 Future Improvements

### 1. ✨ Model: From FastFeed to Temporal-Aware Networks

The current approach uses a **Feedforward Transformer model** trained on static images. While it’s fast, it lacks **temporal awareness** — meaning:

- Each frame is stylized independently
- May cause **flickering or inconsistency** between frames
- Not ideal for smooth video playback

#### ✅ Better Alternative: Temporal Loss (Video Style Transfer)

Temporal-aware models (e.g., **ReCoNet**, **STROTSS**, or models with optical flow tracking) can:

- Compare stylized current frame with previous frame
- Minimize differences using **temporal consistency loss**
- Result in **visually smoother and consistent output**, like a real-time artistic video

These models are **trained differently** (with consecutive frames), so existing `.pth` files from static NST **won’t work**.

---

### 2. 🧠 Advanced Features: Semantic Segmentation

Adding **segmentation** allows **selective stylization**, e.g.:

- Style the background differently from foreground
- Preserve faces or moving objects
- Combine different styles per object/region

This can be done by:
- Using **pretrained semantic segmentation models** (e.g., DeepLabv3)
- Creating masks per frame
- Blending stylized and original content accordingly

This adds:
- More control and creativity
- But also more compute & complexity