# **Negin Heidarifard**  
**M2 in Artificial Intelligence, Paris-Saclay University**  
**Course: Computer Vision**  
**Professor: Dr. Celine Hudelot** 

---


### Project Introduction

I implemented Eulerian Video Magnification (Wu et al., 2012) as a way to explore how far subtle temporal signals can be recovered from standard video without explicit motion tracking. My initial expectation was that amplifying narrow frequency bands would cleanly expose signals like pulse-related color changes, but in practice the behavior was much more sensitive to parameter choices and noise than anticipated.

Small changes in the temporal band or amplification factor often led to visible artifacts or amplification of irrelevant motion, especially outside regions of interest. This pushed me to experiment with different filtering strategies and to separate motion amplification from color-only amplification, depending on the type of signal being targeted.

Rather than treating EVM as a plug-and-play method, this project focuses on understanding where it works reliably, where it breaks down, and which design choices are critical for keeping the results visually meaningful.
 

In [1]:
# Basic environment setup (Kaggle runtime)
# Using the default Kaggle Python image for convenience and reproducibility.

import numpy as np
import pandas as pd

# Quick check of available input files
# Mostly to verify paths before running the pipeline.

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Note:
# Outputs written to /kaggle/working/ are preserved.
# Temporary files under /kaggle/temp/ are not.


I keep a reference visualization here mainly as a sanity check while implementing the pipeline.
I did not rely on this video beyond verifying that the qualitative behavior matches expectations.


In [2]:
from IPython.display import IFrame
IFrame('https://www.youtube.com/embed/ONZcjs1Pjmk', width=700, height=350)

In [3]:
# Cell 2: Imports & Environment Setup
import cv2
import numpy as np
import scipy.fftpack
import scipy.signal
import os
import gc
from IPython.display import FileLink, display


**Explanation:**  
We import the required libraries:  
- **cv2** for image/video processing  
- **numpy** for array manipulation  
- **scipy.fftpack and scipy.signal** for FFT-based filtering  
- **os, gc** for file handling and memory management  
- **IPython.display** for creating download links.  
This cell sets up our working environment.


## Helper Functions
**Explanation:**  
These functions handle loading a video (without downsampling) and saving processed videos as AVI files.  
- **load_video_no_downsample()** reads the video frame by frame and converts the pixel values to the [0,1] range.  
- **save_video_float32_as_avi()** converts the processed float32 video back to uint8 and writes it using the XVID codec.


In [4]:
# Cell 3: Helper Functions

def load_video_no_downsample(video_filename):
    """
    Loads the full video into memory without downsampling.
    This turned out to be memory-heavy for longer videos.
    """

    if not os.path.isfile(video_filename):
        raise FileNotFoundError(f"Video not found: {video_filename}")
    cap = cv2.VideoCapture(video_filename)
    fps = cap.get(cv2.CAP_PROP_FPS)
    frames = []
    while True:
        ret, frame_bgr = cap.read()
        if not ret:
            break
        frame_f = frame_bgr.astype(np.float32) / 255.0
        frames.append(frame_f)
    cap.release()
    video_array = np.array(frames, dtype=np.float32)
    return video_array, fps

def save_video_float32_as_avi(video_data, fps, out_filename="output.avi"):
    """
    Writes output video to disk.
    """
    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    h, w = video_data.shape[1:3]
    out = cv2.VideoWriter(out_filename, fourcc, fps, (w, h), True)
    for i in range(video_data.shape[0]):
        frame_uint8 = np.clip(video_data[i] * 255.0, 0, 255).astype(np.uint8)
        out.write(frame_uint8)
    out.release()
    print(f"Saved {out_filename}")


### Eulerian Motion Magnification

This block contains the core implementation of the Eulerian pipeline.
Most of the complexity here comes from handling pyramid boundaries and keeping shapes consistent across levels, which turned out to be more error-prone than expected.


In [6]:
# Core EVM implementation


def create_laplacian_pyramid_frame(frame, pyramid_levels=4):
    gauss_pyr = [frame]
    for _ in range(1, pyramid_levels):
        gauss_pyr.append(cv2.pyrDown(gauss_pyr[-1]))
    lap_pyr = []
    for i in range(pyramid_levels - 1):
        up = cv2.pyrUp(gauss_pyr[i+1])
        h, w = gauss_pyr[i].shape[:2]
        up = up[:h, :w]
        lap_pyr.append(gauss_pyr[i] - up)
    lap_pyr.append(gauss_pyr[-1])
    return lap_pyr

def create_laplacian_video_pyramid(video, pyramid_levels=4):
    nframes = video.shape[0]
    pyramid = None
    for i in range(nframes):
        frame_pyr = create_laplacian_pyramid_frame(video[i], pyramid_levels)
        if pyramid is None:
            pyramid = []
            for lvl in range(pyramid_levels):
                lvl_h, lvl_w = frame_pyr[lvl].shape[:2]
                pyramid.append(np.zeros((nframes, lvl_h, lvl_w, 3), dtype=np.float32))
        for lvl in range(pyramid_levels):
            pyramid[lvl][i] = frame_pyr[lvl]
    return pyramid

def collapse_laplacian_pyramid_frame(lap_pyr):
    output = lap_pyr[-1]
    for lvl in reversed(range(len(lap_pyr) - 1)):
        up = cv2.pyrUp(output)
        h, w = lap_pyr[lvl].shape[:2]
        up = up[:h, :w]
        output = lap_pyr[lvl] + up
    return output

def collapse_laplacian_video_pyramid(pyramid):
    nframes = pyramid[0].shape[0]
    collapsed_frames = []
    for i in range(nframes):
        lap_pyr_frame = [pyramid[lvl][i] for lvl in range(len(pyramid))]
        collapsed_frame = collapse_laplacian_pyramid_frame(lap_pyr_frame)
        collapsed_frames.append(collapsed_frame)
    return np.array(collapsed_frames, dtype=np.float32)

def temporal_bandpass_filter(data, fps, freq_min, freq_max, amplification=1.0, axis=0):
    fft_data = scipy.fftpack.rfft(data, axis=axis)
    freqs = scipy.fftpack.rfftfreq(data.shape[0], d=1.0/fps)
    low_idx = np.argmin(np.abs(freqs - freq_min))
    high_idx = np.argmin(np.abs(freqs - freq_max))
    fft_data[:low_idx] = 0
    fft_data[high_idx+1:] = 0
    filtered = scipy.fftpack.irfft(fft_data, axis=axis)
    filtered *= amplification
    return filtered

def eulerian_magnification(vid_data, fps, freq_min, freq_max, amplification,
                           pyramid_levels=4, skip_levels_at_top=1):
    vid_pyr = create_laplacian_video_pyramid(vid_data, pyramid_levels)
    for lvl in range(len(vid_pyr)):
        if lvl < skip_levels_at_top or lvl == len(vid_pyr) - 1:
            continue
        bandpassed = temporal_bandpass_filter(vid_pyr[lvl], fps, freq_min, freq_max, amplification, axis=0)
        vid_pyr[lvl] += bandpassed
    return collapse_laplacian_video_pyramid(vid_pyr)


## Eulerian Color Amplification

I switched to a color-only formulation when motion amplification proved unstable under small parameter changes.


In [8]:
def eulerian_color_amplification(vid_data, fps, freq_min, freq_max, amplification, pyramid_levels=4):
    """
    Color-only variant used when motion amplification introduced visible artifacts.
    This approach turned out to be highly sensitive to frequency bounds and amplification.
    """
    nframes, orig_h, orig_w, _ = vid_data.shape
    gauss_frames = []
    for i in range(nframes):
        frame = vid_data[i]
        for _ in range(pyramid_levels - 1):
            frame = cv2.pyrDown(frame)
        gauss_frames.append(frame)

    gauss_video = np.array(gauss_frames, dtype=np.float32)
    bandpassed = temporal_bandpass_filter(
        gauss_video, fps, freq_min, freq_max, amplification, axis=0
    )
    filtered_coarse = gauss_video + bandpassed

    up_frames = []
    for i in range(nframes):
        up_frame = filtered_coarse[i]
        for _ in range(pyramid_levels - 1):
            up_frame = cv2.pyrUp(up_frame)
        up_frame = up_frame[:orig_h, :orig_w]
        amplified_frame = np.clip(vid_data[i] + up_frame, 0, 1)
        up_frames.append(amplified_frame)

    return np.array(up_frames, dtype=np.float32)


### Parameter Search

In practice, the behavior of Eulerian magnification was highly sensitive to frequency bounds and amplification strength. Small changes often led to either no visible signal or strong artifacts, making manual tuning unreliable.

To reduce trial-and-error, I implemented a simple grid search over a narrow parameter range. The objective used here is intentionally crude and only meant to provide a relative comparison between settings, not a robust physiological estimate.

This utility helped narrow down reasonable parameter ranges, but the results were not always stable across videos or regions of interest.


In [10]:
# Parameter search (heuristic)

def compute_objective(vid_data, fps, freq_min, freq_max, amplification, roi=(0, 50, 0, 50)):
    """
    Crude objective for comparing parameter settings.
    This is not a reliable physiological metric and is sensitive to ROI choice and noise.
    """
    # Assumes Eulerian magnification is already defined
    processed = eulerian_magnification(
        vid_data,
        fps,
        freq_min,
        freq_max,
        amplification,
        pyramid_levels=5,
        skip_levels_at_top=1
    )

    # Aggregate signal inside the ROI over time
    x1, x2, y1, y2 = roi
    roi_signal = processed[:, y1:y2, x1:x2, :].sum(axis=(1, 2, 3))

    # Frequency-domain analysis of the aggregated signal
    fft_vals = np.abs(np.fft.fft(roi_signal))
    freqs = np.fft.fftfreq(len(roi_signal), d=1.0 / fps)

    # Keep only positive frequencies
    pos_mask = freqs > 0
    fft_vals = fft_vals[pos_mask]
    freqs = freqs[pos_mask]

    # Measure energy in a narrow target band
    band_mask = (freqs >= 0.7) & (freqs <= 1.2)
    if not np.any(band_mask):
        return 0.0

    return np.max(fft_vals[band_mask])


def grid_search_eulerian_params(vid_data, fps, roi=(0, 50, 0, 50)):
    """
    Simple brute-force search over a small parameter grid.
    Used to narrow down unstable regions of the parameter space.
    """
    freq_min_vals = [0.7, 0.75, 0.8]
    freq_max_vals = [1.0, 1.05, 1.1]
    amp_vals = [70, 80, 90]

    best_score = -float("inf")
    best_params = None

    for fmin in freq_min_vals:
        for fmax in freq_max_vals:
            if fmin >= fmax:
                continue
            for amp in amp_vals:
                score = compute_objective(vid_data, fps, fmin, fmax, amp, roi)
                if score > best_score:
                    best_score = score
                    best_params = (fmin, fmax, amp)

    return best_params, best_score


### Testing and Observations

I evaluated the pipeline on a small set of short videos with different signal characteristics (infant breathing, facial color changes, wrist pulse). In practice, the behavior varied significantly across videos, and a single parameter setting did not generalize well.

For videos dominated by subtle color variations, the color-only amplification was more stable than full motion amplification, which often introduced visible artifacts. Parameter values reported below were chosen empirically after several failed or unstable runs, and should not be interpreted as universally optimal.

The goal of this section is to qualitatively assess when the method produces interpretable results and when it breaks down, rather than to report quantitative performance.


In [20]:
# Testing on a small set of videos with empirically chosen parameters

# Base path for local testing (Kaggle dataset)
base_path = "/kaggle/input/cv-eulerian-videos"

# Per-video parameters tuned empirically.
# Color-only amplification is used where motion amplification was unstable.
videos = {
    "baby": {
        "path": os.path.join(base_path, "baby.mp4"),
        "params": {
            "freq_min": 0.8,
            "freq_max": 2.0,
            "amplification": 50,
            "pyramid_levels": 4,
            "skip_levels_at_top": 1
        },
        "color_amp": False
    },
    "baby2": {
        "path": os.path.join(base_path, "baby2.mp4"),
        "params": {
            "freq_min": 2.0,
            "freq_max": 2.5,
            "amplification": 100,
            "pyramid_levels": 4
        },
        "color_amp": True
    },
    "face": {
        "path": os.path.join(base_path, "face.mp4"),
        "params": {
            "freq_min": 0.8,
            "freq_max": 1.0,
            "amplification": 80,
            "pyramid_levels": 5
        },
        "use_grid_search": True,
        "roi": (50, 150, 40, 120),
        "color_amp": True
    },
    "wrist": {
        "path": os.path.join(base_path, "wrist.mp4"),
        "params": {
            "freq_min": 0.4,
            "freq_max": 3.0,
            "amplification": 15,
            "pyramid_levels": 4,
            "skip_levels_at_top": 1
        },
        "color_amp": False
    }
}

# Process each video sequentially and save the result
for vid_name, vid_info in videos.items():
    print(f"\nProcessing {vid_name} ...")
    video_path = vid_info["path"]

    if not os.path.isfile(video_path):
        print(f"File not found: {video_path}")
        continue

    try:
        vid_data, fps = load_video_no_downsample(video_path)
        print(f"{vid_name} loaded: shape={vid_data.shape}, fps={fps}")
    except Exception as e:
        print(f"Error loading {vid_name}: {e}")
        continue

    params = vid_info["params"]

    if vid_info.get("use_grid_search", False):
        # Grid search used only to narrow down unstable parameter regions
        from IPython.display import clear_output
        print("Running grid search ...")
        best_params, best_score = grid_search_eulerian_params(
            vid_data,
            fps,
            roi=vid_info.get("roi", (0, 50, 0, 50))
        )
        if best_params:
            params["freq_min"], params["freq_max"], params["amplification"] = best_params
        clear_output(wait=True)

    if vid_info.get("color_amp", False):
        # Color-only amplification (motion amplification caused artifacts here)
        magnified = eulerian_color_amplification(
            vid_data,
            fps,
            freq_min=params["freq_min"],
            freq_max=params["freq_max"],
            amplification=params["amplification"],
            pyramid_levels=params["pyramid_levels"]
        )
    else:
        # Standard Eulerian motion magnification
        magnified = eulerian_magnification(
            vid_data,
            fps,
            freq_min=params["freq_min"],
            freq_max=params["freq_max"],
            amplification=params["amplification"],
            pyramid_levels=params["pyramid_levels"],
            skip_levels_at_top=params.get("skip_levels_at_top", 1)
        )

    out_filename = f"magnified_{vid_name}.avi"
    save_video_float32_as_avi(magnified, fps, out_filename)

    # Explicit cleanup to limit memory growth across runs
    del vid_data, magnified
    gc.collect()

print("\nAll processing completed.")


Saved magnified_face.avi

Processing wrist ...
wrist loaded: shape=(894, 352, 640, 3), fps=30.0
Saved magnified_wrist.avi

All processing completed.


In [21]:
# Access processed outputs for qualitative inspection

# Explicit list of generated result files
output_files = [
    "magnified_baby.avi",
    "magnified_baby2.avi",
    "magnified_face.avi",
    "magnified_wrist.avi"
]

# Provide direct links to the generated videos
for file in output_files:
    display(FileLink(file))


## Analysis of *baby.mp4*

When I started working with *baby.mp4*, my main question was whether Eulerian motion magnification could make the infant’s breathing visible without damaging the overall visual quality of the video. In the raw footage, the motion is extremely subtle; at first glance, the breathing is almost imperceptible unless you know exactly where to look.

I initially had some doubts about the parameter choices, especially the amplification factor. An amplification of 50 felt relatively high, and I expected it might introduce visible artifacts or edge distortions, particularly around the baby’s body. To mitigate this, I decided to skip the finest pyramid level, since early experiments showed that this level tended to amplify noise rather than meaningful motion.

After applying the magnification with a frequency band between 0.8 Hz and 2.0 Hz, the breathing pattern became noticeably easier to perceive. The motion is still subtle, but it is now clearly distinguishable from background noise. Importantly, the image remains visually stable: I did not observe obvious halos, flickering, or spatial distortions, which suggests that this parameter combination strikes a reasonable balance between amplification strength and visual fidelity.

One thing I noticed is that the result is highly dependent on the infant remaining relatively still. The temporal consistency of the amplified motion looks good, and the breathing appears smooth, which indicates that the temporal filtering behaves as intended. However, I expect that larger global movements would quickly dominate the signal and reduce the usefulness of this setup.

Overall, this experiment shows that Eulerian magnification can enhance subtle physiological motion in infant videos in a qualitative sense. With the chosen parameters, the breathing becomes more visible while preserving a natural appearance, making the output potentially useful for non-intrusive visual monitoring rather than precise measurement.


## Analysis of *wrist.mp4*

For the wrist video, the main question was whether Eulerian motion magnification could make the pulse-related motion visible without introducing distracting artifacts. In the original *wrist.mp4*, the wrist appears mostly static, and any pulsating motion is extremely subtle and easy to miss unless closely inspected.

Before processing, I was cautious about the amplification strength. Unlike the infant video, wrist motion is more localized and sensitive to noise, so I deliberately chose a lower amplification factor. An amplification of 15 felt like a safer starting point, as early trials with higher values tended to exaggerate small lighting variations rather than the pulse itself. Skipping the top pyramid level was again helpful in preventing high-frequency noise from dominating the result.

After applying Eulerian magnification with a frequency range between 0.4 Hz and 3.0 Hz, the pulsating motion in the wrist became more noticeable. The enhancement remains subtle, but the rhythmic pattern is clearer and easier to follow over time compared to the original video. This suggests that the selected frequency band captures the intended physiological signal reasonably well.

In terms of visual quality, the processed video stays stable. I did not observe strong artifacts or visible degradation, which indicates that the chosen parameters are appropriate for this type of motion. The amplified signal is temporally smooth, and the pulse appears consistent across frames, suggesting that the temporal filtering is effectively isolating the relevant frequency content.

Overall, the enhanced video preserves a natural appearance while making the wrist pulse easier to observe. This makes the output suitable for qualitative inspection and exploratory analysis, particularly in contexts where non-intrusive visualization of physiological signals is desirable.


## Analysis of *face.mp4*

When working with *face.mp4*, my main objective was to see whether Eulerian Video Magnification could make very subtle facial color variations visible, particularly those that might be linked to physiological signals such as heart rate or blood flow. In the original video, the face appears visually stable, with only minimal and barely perceptible color fluctuations across frames.

Early on, I decided to focus exclusively on color amplification rather than motion. Previous experiments showed that applying motion magnification to facial videos often introduces distracting artifacts, especially around edges and expressions, which can easily dominate the signal of interest. For this reason, the processing was restricted to color changes only.

The frequency band was set between 0.8 Hz and 1.0 Hz to roughly match the resting heart rate of an adult. I chose an amplification factor of 80 somewhat cautiously. While lower values tended to make the effect almost invisible, higher values quickly led to exaggerated color shifts that no longer looked natural. Using a five-level pyramid helped stabilize the reconstruction, and restricting the analysis to a specific region of interest on the face ((50,150,40,120)) made the color variations easier to interpret by avoiding irrelevant background regions.

After processing, subtle color fluctuations become more apparent in the selected facial area. These variations are not dramatic, but they are noticeably more visible than in the original video and appear temporally consistent. Importantly, the overall facial structure and expressions remain intact, and I did not observe strong artifacts or distortions that would compromise realism.

The amplified signal looks smooth over time, suggesting that the temporal filtering is isolating the intended frequency band reasonably well. While the result is still qualitative, the enhanced visibility of these color changes suggests potential for non-contact monitoring of physiological signals.

Overall, *magnified_face.avi* demonstrates that color-based Eulerian magnification can highlight subtle facial color variations while preserving a natural appearance. This makes the approach promising for exploratory applications in non-invasive health monitoring, where visual interpretability is more important than precise numerical measurement.


## Analysis of *baby2.mp4*

For *baby2.mp4*, the focus shifted toward detecting subtle color variations rather than motion, with the assumption that these variations could be linked to physiological signals such as pulse in a newborn. In the original video, these changes are extremely faint and not easily distinguishable by visual inspection alone.

Given the higher heart rate typically observed in newborns, I selected a relatively narrow frequency band between 2.0 Hz and 2.5 Hz. This choice felt slightly aggressive at first, especially when combined with a high amplification factor. An amplification of 100 raised concerns about potential color saturation or unstable visual artifacts, but lower values tended to make the signal almost invisible. Using a four-level pyramid was a compromise between preserving spatial detail and keeping the processing stable.

To avoid amplifying motion-related artifacts, I relied exclusively on color-only amplification. Each frame was downsampled to obtain a coarse color representation, temporally filtered within the target frequency band, and then reconstructed back to the original resolution. This approach proved more robust than full motion amplification for this particular video.

After processing, rhythmic color variations become noticeably more visible, particularly in regions corresponding to the newborn’s face. These variations are barely perceptible in the original footage but emerge clearly in the processed output. Despite the relatively high amplification factor, the visual quality remains acceptable, and I did not observe strong artifacts or distracting distortions.

The amplified color signal appears smooth and temporally consistent, which suggests that the temporal bandpass filtering successfully isolates the intended frequency components. While the result remains qualitative, the enhanced visibility of these subtle color changes indicates that this parameter configuration is effective for exploratory analysis.

Overall, *magnified_baby2.avi* demonstrates that color-based Eulerian magnification can reveal physiological color signals in newborn videos when appropriate frequency ranges and amplification levels are used. The method preserves a natural visual appearance while making otherwise imperceptible signals easier to observe, supporting its potential use in non-invasive health monitoring contexts.


# Part 2 – My Own Dataset  
## Testing Eulerian Color Amplification on My Own Video

### Subtle Color Changes Detection with Optimized Parameters

---

### Objective

In this part, I applied **Eulerian Color Amplification (ECA)** to my own recording (*face3.mp4*) with the goal of visualizing **very subtle facial color variations** that are not noticeable in the raw video.  
The main difficulty here was finding a parameter set that enhances physiological signals clearly, without pushing the amplification to the point where the face looks unnatural or noisy.

---

### Methodology Overview

The processing pipeline follows the standard ECA framework, with a few practical choices motivated by empirical testing rather than theory alone:

1. **Preprocessing**

   - The video was first **downsampled to 640×360**, mainly to reduce memory usage and make experimentation faster.
   - A light **Gaussian denoising** step was added, since early tests showed that high-frequency noise tends to get amplified together with the signal.
   - The sequence was limited to **300 frames**, which turned out to be sufficient to capture periodic physiological variations while keeping the computation manageable.

2. **Temporal Bandpass Filtering**

   - A frequency band of **0.7–1.2 Hz** was selected, corresponding to a typical adult resting heart rate.
   - An **amplification factor of 30** was chosen after testing higher values, which produced stronger signals but also introduced visible color artifacts.

3. **Color-Only Amplification**

   - A **5-level pyramid decomposition** was used to capture subtle color changes at an appropriate spatial scale.
   - During reconstruction, **cubic interpolation** was applied to keep the upsampling visually smooth and stable.

4. **Reconstruction and Saving**

   - The amplified color signal was added back to the original frames.
   - The final result was saved in **AVI format using the XVID codec**, ensuring smooth playback and reasonable file size.

---

### Key Parameters and Rationale

- **Frequency Range (0.7–1.2 Hz):** Chosen to target physiological color variations related to blood flow.
- **Amplification Factor (30):** A compromise between visibility of the signal and visual realism.
- **Pyramid Levels (5):** Provides enough spatial detail without significantly increasing computation time.
- **Gaussian Denoising:** Helps prevent the amplification of irrelevant noise components.

---

### Output Details

- **Output File:** `magnified_face3_lower_amp.avi`  
- **Video Codec:** XVID  
- **Resolution:** 640×360  

Overall, the output highlights **subtle facial color changes** while preserving a **natural appearance** and avoiding excessive noise, which is crucial for any realistic monitoring scenario.

---

### Conclusion

This experiment shows that **Eulerian Color Amplification**, when carefully tuned, can successfully reveal subtle facial color variations in a real, unconstrained video.  
The chosen parameters produce a stable and visually coherent result, demonstrating the practical usefulness of ECA for **non-invasive physiological signal visualization**.


In [24]:
# =======================================================
# Final Experiment: Subtle Color Amplification on "face3.mp4"
# (Carefully tuned for realism rather than strong amplification)
# =======================================================

# Step 1: Load video with practical constraints in mind
# - Downsampling for memory efficiency
# - Mild denoising to avoid amplifying high-frequency noise
# - Frame limiting to keep computation reasonable
# -------------------------------------------------------
def load_video_downsampled_denoise(video_path, max_frames=300, width=640, height=360):
    if not os.path.isfile(video_path):
        raise FileNotFoundError(f"Video not found: {video_path}")

    cap = cv2.VideoCapture(video_path)
    fps = cap.get(cv2.CAP_PROP_FPS)

    frames = []
    count = 0
    while True:
        ret, frame = cap.read()
        if not ret or count >= max_frames:
            break

        # Resize early to reduce memory usage
        frame = cv2.resize(frame, (width, height))

        # Light Gaussian blur: enough to suppress sensor noise,
        # but not strong enough to wash out color information
        frame = cv2.GaussianBlur(frame, (3, 3), 0)

        frames.append(frame.astype(np.float32) / 255.0)
        count += 1

    cap.release()
    return np.array(frames, dtype=np.float32), fps


# -------------------------------------------------------
# Step 2: Temporal bandpass filtering
# Isolates frequency components related to physiological signals
# -------------------------------------------------------
def temporal_bandpass_filter(data, fps, freq_min, freq_max, amplification=1.0):
    fft_data = scipy.fftpack.rfft(data, axis=0)
    freqs = scipy.fftpack.rfftfreq(data.shape[0], d=1.0 / fps)

    # Identify frequency indices corresponding to the target band
    low_idx = np.argmin(np.abs(freqs - freq_min))
    high_idx = np.argmin(np.abs(freqs - freq_max))

    # Suppress everything outside the band of interest
    fft_data[:low_idx] = 0
    fft_data[high_idx + 1:] = 0

    return scipy.fftpack.irfft(fft_data, axis=0) * amplification


# -------------------------------------------------------
# Step 3: Eulerian Color Amplification (low-amplitude version)
# This variant focuses on subtle color changes rather than motion
# -------------------------------------------------------
def eulerian_color_amplification_improved(
    vid, fps, freq_min, freq_max, amplification, pyramid_levels=5
):
    nframes, orig_h, orig_w, _ = vid.shape

    # Build a coarse representation using repeated downsampling
    coarse_frames = []
    for i in range(nframes):
        frame = vid[i]
        for _ in range(pyramid_levels - 1):
            frame = cv2.pyrDown(frame)
        coarse_frames.append(frame)

    coarse_video = np.array(coarse_frames, dtype=np.float32)

    # Apply temporal filtering on the coarse signal
    filtered = coarse_video + temporal_bandpass_filter(
        coarse_video, fps, freq_min, freq_max, amplification
    )

    # Upsample back to original resolution and combine with input
    up_frames = []
    for i in range(nframes):
        up_frame = cv2.resize(
            filtered[i], (orig_w, orig_h), interpolation=cv2.INTER_CUBIC
        )
        combined = np.clip(vid[i] + up_frame, 0, 1)
        up_frames.append(combined)

    return np.array(up_frames, dtype=np.float32)


# -------------------------------------------------------
# Step 4: Save the processed video to disk
# -------------------------------------------------------
def save_video(video, fps, filename="output.avi"):
    fourcc = cv2.VideoWriter_fourcc(*"XVID")
    h, w = video.shape[1:3]
    out = cv2.VideoWriter(filename, fourcc, fps, (w, h), True)

    for frame in video:
        out.write(np.clip(frame * 255.0, 0, 255).astype(np.uint8))

    out.release()
    print(f"Saved {filename}")


# -------------------------------------------------------
# Step 5: Run the full pipeline on face3.mp4
# Parameters are intentionally conservative to preserve realism
# -------------------------------------------------------

video_path = "/kaggle/input/cv-eulerian-videos/face3.mp4"

vid_data, fps = load_video_downsampled_denoise(
    video_path, max_frames=300, width=640, height=360
)

print(f"Loaded face3.mp4: shape={vid_data.shape}, fps={fps:.2f}")

freq_min, freq_max = 0.7, 1.2      # Typical adult heart-rate band
amplification = 30                 # Lower amplification to avoid artifacts
pyramid_levels = 5

result = eulerian_color_amplification_improved(
    vid_data, fps, freq_min, freq_max, amplification, pyramid_levels
)

save_video(result, fps, "magnified_face3_lower_amp.avi")

del vid_data, result
gc.collect()


# -------------------------------------------------------
# Step 6: Provide a download link for the final output
# -------------------------------------------------------
display(FileLink("magnified_face3_lower_amp.avi"))


Loaded face3.mp4: shape=(300, 360, 640, 3), fps=30.07
Saved magnified_face3_lower_amp.avi


## Comparative Analysis of Original and Amplified Face Video

### Context and Goal  
I applied Eulerian Color Amplification to *face3.mp4* with the intention of checking whether very subtle, physiology-related color variations could be made visible without harming the visual realism of the video. The goal was deliberately conservative: instead of pushing the amplification aggressively, I wanted to see if a **low-amplitude, carefully tuned setup** could still reveal meaningful signals while keeping noise and artifacts under control.

The comparison focuses on how the processed output (*magnified_face3_lower_amp (1).avi*) behaves visually relative to the original video, rather than on any quantitative metric alone.

---

### How the Comparison Was Performed  
- The **original video (*face3.mp4*)** was used as a baseline to assess lighting stability, natural skin tones, and overall image quality.  
- The **processed video (*magnified_face3_lower_amp (1).avi*)** was then examined to see whether subtle color fluctuations, invisible in the raw footage, became perceptible after amplification.

---

### Observations and Findings  

#### Visual Quality  
In the original video, lighting conditions are stable and skin tones appear natural, with no obvious color flicker. After amplification, the overall resolution and sharpness are preserved. I initially expected that even a modest amplification might introduce visible noise or pixel-level artifacts, but in practice the image remains clean and visually stable.

#### Detection of Subtle Color Changes  
The processed video reveals faint but coherent color variations across the facial region. These changes are subtle enough that they do not distract from the original appearance, yet they are clearly more noticeable than in the unprocessed video. Without amplification, these variations are essentially imperceptible, which suggests that the chosen parameters are effectively targeting signals that were present but hidden.

#### Noise and Artifacts  
One concern was that amplifying color signals would also amplify sensor noise or compression artifacts. This turned out to be less problematic than anticipated. Noise levels remain low, and there are no obvious ringing effects or color bleeding. The mild Gaussian denoising applied during preprocessing likely played an important role in stabilizing the result.

#### Temporal Consistency  
The amplified color changes evolve smoothly over time, without abrupt jumps or temporal instability. This suggests that the temporal bandpass filter was well aligned with the target frequency range. Importantly, the rhythm of the color variations appears consistent with expected physiological behavior, which makes the result more convincing from a qualitative perspective.

---

### Conclusion  
The processed video *magnified_face3_lower_amp (1).avi* demonstrates that Eulerian Color Amplification can enhance subtle physiological color variations while preserving the natural appearance of the face. The combination of a narrow frequency band, moderate amplification, and multi-scale processing appears to be a good compromise between visibility and realism.

Rather than dramatically altering the video, this configuration reveals information that was already present but visually inaccessible, which aligns well with the original motivation behind Eulerian approaches.

---

### Limitations and Next Steps  
This analysis is qualitative and relies on visual inspection, which means it cannot confirm whether the observed color variations correspond exactly to physiological measurements. Future work could include controlled experiments under different lighting conditions, comparisons across subjects, or validation against ground-truth signals such as heart rate sensors.

From an application perspective, exploring real-time implementations or adaptive parameter tuning could further improve robustness in less controlled settings.


## Extended Eulerian Video Magnification: From a Basic Pipeline to a Paper-Aligned Implementation

When I first implemented Eulerian Video Magnification (EVM), the goal was mainly to get a working pipeline and understand the core idea: amplifying subtle temporal variations without explicitly estimating motion. As the experiments progressed, it became clear that the initial implementation, while functional, was far from the level of robustness and visual quality described in the original work by Wu et al. (2012).

To address this gap, the pipeline was gradually refined by incorporating additional components inspired directly by the paper:

> H. Wu, M. Rubinstein, E. Shih, J. Guttag, F. Durand, and W. Freeman,  
> *Eulerian Video Magnification for Revealing Subtle Changes in the World*,  
> ACM Transactions on Graphics (Proc. SIGGRAPH), 2012.

What follows is not just a feature comparison, but a reflection on how and why each step evolved from a minimal implementation toward something closer to the approach proposed in the paper.

---

## Early Implementation vs. Expanded Approach

### Initial Version

The earliest version of the pipeline focused on simplicity. Video frames were read, normalized, temporally filtered, and written back with minimal preprocessing. All processing was performed directly in the RGB color space, which worked reasonably well for small amplification factors but quickly led to color distortions when amplification increased.

Magnification was applied uniformly over the entire frame, which meant that background regions and irrelevant motion were amplified together with the signal of interest. A Laplacian pyramid was used, but with a fixed number of levels and without any adaptive strategy to control amplification across spatial scales. Temporal filtering was also fairly naive, relying on basic bandpass filtering with limited control over phase behavior.

While this version was useful for understanding the core mechanism, it became clear during testing that it was fragile and prone to noise and artifacts.

---

### Expanded, Paper-Inspired Version

The refined implementation incorporates several design choices that are directly motivated by Wu et al. (2012).

Instead of working purely in RGB, the pipeline converts frames to the YIQ color space, separating luminance from chrominance. This change alone significantly improves color stability when amplifying subtle signals, as chrominance channels can be amplified without distorting overall brightness.

Face detection using Haar Cascades was introduced to restrict amplification to the facial region. This decision was driven by practical observation: amplifying the entire frame often emphasized background noise more than the physiological signal itself. By focusing only on the face, the signal-to-noise ratio improves noticeably.

Spatial processing was also refined through a multiscale Laplacian pyramid. Rather than treating all spatial frequencies equally, each pyramid level is filtered and amplified independently, following the coarse-to-fine strategy described in the paper. Temporal filtering was upgraded to a zero-phase Butterworth bandpass filter, which provides more precise frequency isolation and avoids phase distortions that were visible in earlier experiments.

Finally, amplification is no longer uniform across pyramid levels. Instead, it is attenuated at higher spatial frequencies according to the paper’s formulation, which helps prevent the introduction of high-frequency artifacts during reconstruction.

---

## How the Pipeline Evolved, Step by Step

Rather than a complete rewrite, the implementation evolved incrementally:

- **Imports and setup** expanded to include scientific filtering tools and more robust error handling.
- **Video I/O** was improved with consistent normalization, shape checks, and flexible output codecs.
- **Color space handling** was added to decouple luminance and chrominance and reduce clipping.
- **Face masking** was introduced to localize amplification and suppress background flicker.
- **Laplacian pyramids** were extended to multiple levels, enabling scale-specific processing.
- **Temporal filtering** moved from simple bandpass logic to zero-phase Butterworth filters.
- **Mask resizing** ensured that the region of interest remained consistent across pyramid levels.
- **Adaptive amplification** scaled the amplification factor by spatial level, following the paper’s derivation.

Each of these changes was motivated by issues observed during testing rather than by theoretical completeness alone.

---

## Key Concepts Borrowed from the Paper

Several ideas from Wu et al. (2012) proved particularly important in practice. The first-order motion approximation explains why small variations can be magnified without explicit optical flow. Careful frequency range selection makes it possible to target specific physiological signals such as heartbeat or breathing. Masking and multiscale analysis help suppress noise and avoid artifacts, especially at higher spatial frequencies.

In particular, the constraint  
\[
(1 + \alpha)\,\delta(t) < \lambda / 8
\]  
provides a useful guideline for controlling amplification and was directly reflected in the adaptive scaling strategy used in the final implementation.

---

## Conclusion

Comparing the initial, minimal pipeline with the expanded, paper-aligned implementation highlights substantial improvements in visual stability, color fidelity, and robustness to noise. The refined version is not only closer to the methodology described by Wu et al. (2012), but also significantly more usable in practice.

Rather than simply adding complexity, each refinement addresses a specific limitation observed in earlier experiments. The result is a pipeline that produces higher-quality magnification of subtle signals such as pulse or micro-variations in color, while minimizing unintended amplification of noise or background motion.

---

### Reference

- H.-Y. Wu, M. Rubinstein, E. Shih, J. Guttag, F. Durand, and W. Freeman.  
  *Eulerian Video Magnification for Revealing Subtle Changes in the World.*  
  ACM Transactions on Graphics (Proc. SIGGRAPH), 2012.


In [1]:
# ===============================================
# STEP 1: IMPORTS & BASIC SETUP
# ===============================================
import cv2
import numpy as np
from scipy.signal import butter, filtfilt
import os


**Formulas/Concepts**  
- Loading a video involves reading each frame via OpenCV’s `VideoCapture`.
- Saving a video uses OpenCV’s `VideoWriter` with a chosen codec (here, 'XVID').


In [2]:
# ===============================================
# STEP 2: VIDEO I/O UTILITIES
# ===============================================
def load_video(path):
    if not os.path.isfile(path):
        raise FileNotFoundError(f"Video not found: {path}")
    cap = cv2.VideoCapture(path)
    fps = cap.get(cv2.CAP_PROP_FPS)
    frames = []
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        # Normalize to [0, 1]
        frames.append(frame.astype(np.float32) / 255.0)
    cap.release()
    return np.array(frames), fps

def save_video(video, fps, filename):
    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    h, w = video.shape[1:3]
    out = cv2.VideoWriter(filename, fourcc, fps, (w, h), True)
    for frame in video:
        out.write((frame * 255).astype(np.uint8))
    out.release()
    print(f"Saved {filename}")


## Step 3: Color Space Conversions (RGB ↔ YIQ)

At this point, I move away from the standard RGB color space and switch to **YIQ**.  
The main reason for this choice is practical rather than theoretical: when I tried amplifying signals directly in RGB, even moderate amplification values started to introduce visible color distortions, especially on skin tones.

By working in YIQ space, I can clearly separate **brightness information** from **color information**, which makes the amplification step much more stable.

---

### Why YIQ?

In RGB, intensity and color are tightly coupled.  
This means that amplifying subtle temporal changes in one channel often unintentionally alters the perceived color balance of the entire frame.

YIQ addresses this by splitting the signal into:

- **Y**: overall intensity (luminance)
- **I, Q**: chrominance (color information)

This separation is particularly useful for Eulerian Video Magnification, where the goal is to amplify *very small temporal variations* (such as pulse-related intensity changes) without introducing artificial color shifts.

---

### RGB → YIQ Transformation

The conversion from RGB to YIQ is performed using a fixed linear transformation:

\[
\begin{bmatrix}
Y \\
I \\
Q
\end{bmatrix}
=
\begin{bmatrix}
0.299 & 0.587 & 0.114 \\
0.596 & -0.274 & -0.322 \\
0.211 & -0.523 & 0.312
\end{bmatrix}
\begin{bmatrix}
R \\
G \\
B
\end{bmatrix}
\]

This projection maps the RGB values into one luminance channel and two chrominance channels.  
In practice, this makes subtle temporal variations easier to isolate and analyze, especially when they are primarily expressed as small intensity changes.

---

### YIQ → RGB Conversion

After temporal filtering and amplification, the processed video must be converted back to RGB for visualization and saving.

This is done by applying the **inverse** of the RGB→YIQ transformation matrix.  
In the implementation, the inverse is computed numerically using `np.linalg.inv` to ensure accuracy and consistency.

---

### Practical Insight

In practice, operating in YIQ space significantly reduces color artifacts compared to working directly in RGB.  
I observed that amplified signals remain clearly visible, while skin tones and overall color appearance stay natural and stable.

This step turned out to be essential for making Eulerian Video Magnification robust, especially when targeting physiological signals such as heartbeat-related color fluctuations.


In [3]:
# ===============================================
# STEP 3: COLOR SPACE CONVERSIONS (RGB <-> YIQ)
# ===============================================
def rgb_to_yiq(img):
    transform = np.array([[0.299, 0.587, 0.114],
                          [0.596, -0.274, -0.322],
                          [0.211, -0.523, 0.312]])
    return np.dot(img, transform.T)

def yiq_to_rgb(img):
    transform = np.linalg.inv(np.array([[0.299, 0.587, 0.114],
                                        [0.596, -0.274, -0.322],
                                        [0.211, -0.523, 0.312]]))
    return np.dot(img, transform.T)


## Step 4: Face Mask Generation (Haar Cascades)

At this stage, I restrict the amplification process to the **facial region only**.  
The motivation behind this step is simple: applying Eulerian magnification to the entire frame often amplifies background noise and irrelevant motion, which can easily overwhelm the subtle physiological signals of interest.

To address this, I use OpenCV’s **Haar Cascade face detector** to automatically locate the face and generate a **binary mask**.

---

### What the Mask Represents

The generated mask follows a very simple rule:

- Pixels **inside** the detected face region are set to **1**
- Pixels **outside** the face region are set to **0**

This allows all subsequent amplification steps to focus exclusively on the facial area, while leaving the background untouched.

---

### Why Haar Cascades?

Haar Cascades are a classical but reliable object detection method based on:
- hand-crafted features,
- a cascade of trained classifiers,
- and fast evaluation on grayscale images.

Although more modern face detectors exist, Haar Cascades are:
- lightweight,
- easy to integrate,
- and sufficiently robust for this controlled experimental setup.

The function `detectMultiScale` returns bounding boxes corresponding to detected faces in the frame, which are then directly converted into a binary spatial mask.

---

### Practical Note

In practice, even a coarse face mask significantly improves the stability of Eulerian Video Magnification.  
By limiting amplification to the face, background flickering and noise are largely suppressed, while pulse-related color variations become more visible and interpretable.

This masking step plays a key role in making the overall pipeline more robust and closer to the methodology described in the original Eulerian Video Magnification paper.


In [4]:
# ===============================================
# STEP 4: FACE MASK GENERATION (Haar Cascades)
# ===============================================
def generate_face_mask(frame):
    gray = (frame[:, :, 0] * 255).astype(np.uint8)
    face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
    faces = face_cascade.detectMultiScale(gray, 1.3, 5)
    mask = np.zeros_like(gray, dtype=np.float32)
    for (x, y, w, h) in faces:
        mask[y:y + h, x:x + w] = 1.0
    return np.expand_dims(mask, axis=-1)


## Step 5: Laplacian Pyramid Construction

At this stage, I decompose each frame into a **Laplacian pyramid** in order to separate image information across different spatial scales. The motivation here is simple: subtle physiological signals, such as tiny facial color variations, tend to appear at specific spatial frequencies. Amplifying the entire image uniformly would only increase noise and visual artifacts.

By working with a coarse-to-fine representation, I gain much better control over which spatial details are later amplified and which ones are left untouched.

---

### Why a Laplacian Pyramid?

In Eulerian Video Magnification, applying amplification directly in the pixel domain often leads to unstable results. Using a Laplacian pyramid helps because:

- Fine details and smooth structures are separated across different levels.
- Noise amplification can be limited by controlling the contribution of high-frequency levels.
- Relevant physiological signals can be isolated more cleanly.

In practice, this makes the amplification process more stable and visually realistic.

---

### Building the Pyramid

For each pyramid level, the following steps are applied:

1. The current image is **downsampled** using `pyrDown`.
2. The downsampled image is **upsampled back** using `pyrUp`.
3. The difference between the original image and its upsampled version forms a **Laplacian level**, capturing details at that spatial scale.

Formally, this can be written as:

\[
L_i = I_i - \text{pyrUp}(\text{pyrDown}(I_i))
\]

where \(I_i\) denotes the image at level \(i\), and \(L_i\) contains the spatial details lost during downsampling.

The final downsampled image is stored as the coarsest level of the pyramid.

---

### Pyramid Reconstruction

After temporal filtering and amplification, the image is reconstructed by starting from the coarsest level and iteratively upsampling and adding back the stored Laplacian details. This reconstruction step preserves the global structure of the frame while reintroducing the amplified fine-scale information.

---

### Key Takeaways

- **Multiscale representation** allows selective amplification at different spatial frequencies.
- **Smooth reconstruction** using `pyrDown` and `pyrUp` avoids visible seams or distortions.
- **Better control** over noise and artifacts compared to single-scale amplification.

This step plays a crucial role in making Eulerian Video Magnification reliable when applied to real-world videos, especially when targeting subtle facial signals rather than large motions.


In [5]:
# ===============================================
# STEP 5: LAPLACIAN PYRAMID CONSTRUCTION
# ===============================================
def build_laplacian_pyramid(frame, levels=5):
    pyramid = []
    current = frame
    for _ in range(levels):
        down = cv2.pyrDown(current)
        up = cv2.pyrUp(down, dstsize=(current.shape[1], current.shape[0]))
        lap = current - up
        pyramid.append(lap)
        current = down
    pyramid.append(current)
    return pyramid

def collapse_laplacian_pyramid(pyramid):
    output = pyramid[-1]
    for lvl in reversed(range(len(pyramid) - 1)):
        up = cv2.pyrUp(output, dstsize=(pyramid[lvl].shape[1], pyramid[lvl].shape[0]))
        output = pyramid[lvl] + up
    return output


## Step 6: Temporal Butterworth Bandpass Filter

At this stage, the focus shifts from *space* to *time*.  
Instead of asking *where* changes happen in the image, I ask **which temporal patterns persist over time** and whether they match the expected physiological frequencies.

The idea is simple: if a signal is caused by a heartbeat or blood flow, it should oscillate within a very narrow and predictable frequency range. Everything outside that range is likely noise, lighting fluctuation, or unrelated motion.

---

### Why Temporal Filtering?

In raw video data, each pixel changes over time for many reasons:
- illumination variations,
- small camera movements,
- sensor noise,
- and, occasionally, real physiological signals.

Direct amplification would boost **all of them**, which is rarely what we want.  
Temporal bandpass filtering allows me to **isolate only the frequency band of interest** before amplification.

For this project, the target range is chosen based on typical physiological rates (e.g. heart rate), while suppressing slower trends and high-frequency noise.

---

### Butterworth Filter: Practical Choice

I use a **Butterworth bandpass filter** because of its smooth frequency response.  
Unlike sharper filters, Butterworth filters avoid strong ripples in the passband, which helps keep the amplified signal visually stable.

Another important design choice is applying the filter with `filtfilt` instead of a single forward pass.  
This performs **zero-phase filtering**, meaning:

- no temporal delay is introduced,
- rising and falling patterns remain aligned in time,
- the amplified signal looks natural rather than “shifted”.

This matters a lot when visualizing periodic biological signals.

---

### Continuous vs. Discrete Perspective (Intuition Only)

In theory, a Butterworth bandpass filter can be seen as the combination of:
- a low-pass filter (removing fast fluctuations),
- and a high-pass filter (removing slow trends).

In practice, I do **not** work with the continuous formula directly.  
Instead, cutoff frequencies are normalized relative to the **Nyquist frequency** (half the frame rate), which is standard in discrete signal processing.

---

### Implementation Insight

The filtering is applied **along the temporal axis** for each pixel independently.  
To make this efficient, the video tensor is reshaped so that time becomes the leading dimension, the filter is applied once, and the original shape is restored afterward.

This approach keeps the implementation simple while remaining faithful to the theoretical model described in the original Eulerian Video Magnification paper.

Overall, this step is crucial: without proper temporal filtering, amplification quickly becomes unstable and visually unconvincing.


In [6]:
# ===============================================
# STEP 6: TEMPORAL BUTTERWORTH BANDPASS FILTER
# ===============================================
def butter_bandpass_filter(data, fps, freq_min, freq_max, order=3):
    nyquist = 0.5 * fps
    low, high = freq_min / nyquist, freq_max / nyquist
    b, a = butter(order, [low, high], btype='band')
    original_shape = data.shape
    reshaped = data.reshape((original_shape[0], -1))
    filtered = filtfilt(b, a, reshaped, axis=0)
    return filtered.reshape(original_shape)


## Step 7: Face Mask Generation (Resized per Pyramid Level)

At this point, I refine the face mask generation to make sure that the amplification is applied **only where it makes sense**.  
Instead of magnifying the entire frame, I explicitly restrict the process to the facial region, which is where the physiological color changes of interest are expected to appear.

A key issue here is that the video is processed at multiple spatial scales due to the Laplacian pyramid.  
If the face mask is not resized accordingly at each level, the region of interest becomes misaligned, leading to artifacts or partial amplification of background areas.

---

### Practical Idea Behind the Mask

The approach follows a simple logic:

1. **Detect the face once per frame** using a classical Haar Cascade detector.
2. **Create a binary mask**:
   - pixels inside the detected face region are set to 1,
   - pixels outside are set to 0.
3. **Resize the mask** to match each pyramid level before applying amplification.

By doing this, the magnification remains spatially consistent across all pyramid levels.

---

### Why This Step Matters

Without a properly resized face mask:
- background regions can start to flicker after amplification,
- noise outside the face may dominate the signal,
- and subtle color changes on the face become harder to interpret.

Restricting the amplification to the facial area significantly improves both **visual quality** and **signal-to-noise ratio**, especially when targeting weak physiological signals such as pulse-related color variations.

This step acts as a spatial constraint that complements the temporal filtering applied earlier.


In [7]:
# ===============================================
# STEP 7: FACE MASK GENERATION (Resized per level)
# ===============================================
def generate_face_mask(frame):
    """
    Detects the face in the frame using Haar cascades and returns a binary mask.
    """
    gray = cv2.cvtColor((frame * 255).astype(np.uint8), cv2.COLOR_BGR2GRAY)
    face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
    faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
    
    mask = np.zeros_like(gray, dtype=np.float32)
    for (x, y, w, h) in faces:
        mask[y:y + h, x:x + w] = 1.0
    return mask[..., np.newaxis]  # Add channel dimension


## Step 8: Multiscale Eulerian Video Magnification (With Resizing)

### Purpose  
In this final step, all previous components are combined into a **complete Eulerian Video Magnification (EVM) pipeline**.  
The goal is to amplify **subtle temporal variations** in the video, such as physiological signals (e.g. heartbeat), while preserving visual realism and avoiding noise or color artifacts.

Rather than amplifying raw RGB values directly, this pipeline relies on **color space separation**, **multiscale spatial decomposition**, and **temporal filtering**, closely following the principles described in the original EVM framework.

---

### Pipeline Overview  

The processing pipeline consists of the following steps:

1. **Load the input video** and extract frames along with the frame rate.  
2. **Convert frames to YIQ color space**, separating luminance (Y) from chrominance (I, Q) to improve color stability during amplification.  
3. **Detect the face region** in the first frame and generate a binary mask to restrict amplification to relevant areas.  
4. **Build a Laplacian pyramid** for each frame, decomposing it into multiple spatial frequency bands.  
5. **Resize and apply the face mask** at each pyramid level to ensure spatial alignment across scales.  
6. **Apply a temporal Butterworth bandpass filter** at each pyramid level to isolate the target frequency band.  
7. **Amplify the filtered signals**, using a scale-dependent amplification factor to reduce noise at finer spatial levels.  
8. **Collapse the Laplacian pyramid** to reconstruct the amplified frames.  
9. **Convert frames back to RGB** and save the final magnified video.

This multiscale approach allows subtle signals to be enhanced while minimizing artifacts and background noise.



## Key Equations and Intuition

### Eulerian Video Magnification: Core Idea

At the heart of Eulerian Video Magnification, the goal is to enhance **very small temporal variations** in pixel intensity, without explicitly tracking motion or computing optical flow.

Conceptually, the magnified signal can be written as:

$$
I'(x, t) = I(x, t) + \alpha \cdot \text{filtered}\bigl(I(x, t)\bigr)
$$

where:

- \( I(x, t) \) is the original pixel intensity at spatial location \(x\) and time \(t\),
- \( \text{filtered}\bigl(I(x, t)\bigr) \) is the temporally band-pass filtered version of the signal, isolating the frequency range of interest (e.g. heartbeat),
- \( \alpha \) is the amplification factor controlling how strongly the filtered signal is emphasized.

Instead of following motion trajectories, the Eulerian approach works directly at **fixed pixel locations**, amplifying subtle temporal changes that are otherwise invisible.

---

### Scale-Dependent (Spatially Varying) Amplification

Applying the same amplification at all spatial scales tends to amplify noise, especially at fine details.  
To avoid this, the amplification factor is adapted across the levels of the Laplacian pyramid:

$$
\alpha_{\text{level}} =
\frac{\alpha}{2^{(\text{pyramid\_levels} - \text{level} - 1)}}
$$

In practice:

- Coarser pyramid levels (large spatial structures) receive **stronger amplification**.
- Finer levels (small details) are amplified **less aggressively**.

This strategy helps suppress high-frequency noise and visual artifacts, while still revealing meaningful large-scale variations such as subtle head motion, breathing, or pulse-related color changes.

---

### Practical Insight

Using multiscale decomposition together with scale-dependent amplification produces noticeably cleaner results than uniform amplification.  
Subtle physiological signals remain visible, while skin tones and overall color appearance stay natural.

This balance is essential for building a **stable and robust Eulerian Video Magnification pipeline**, especially when targeting weak signals such as pulse-induced color variations.


In [8]:
# ===============================================
# STEP 8: MULTISCALE EULERIAN VIDEO MAGNIFICATION (With Resizing)
# ===============================================
def eulerian_video_magnification(
    input_path,
    output_path,
    freq_min=0.8,
    freq_max=1.0,
    amplification=50,
    pyramid_levels=5
):
    # 1) Load video
    video, fps = load_video(input_path)
    nframes, h, w, c = video.shape
    
    # 2) Convert to YIQ for color fidelity
    yiq_video = np.array([rgb_to_yiq(frame) for frame in video], dtype=np.float32)

    # 3) Generate face mask from the first frame
    original_face_mask = generate_face_mask(video[0])

    # 4) Build Laplacian pyramid for each frame, store timeseries
    pyramid_timeseries = [[] for _ in range(pyramid_levels + 1)]
    for i in range(nframes):
        pyr = build_laplacian_pyramid(yiq_video[i], levels=pyramid_levels)
        for level_idx in range(pyramid_levels + 1):
            # Resize face mask to match pyramid level dimensions
            resized_mask = cv2.resize(original_face_mask, (pyr[level_idx].shape[1], pyr[level_idx].shape[0]))
            resized_mask = resized_mask[..., np.newaxis]  # Ensure channel dimension
            pyramid_timeseries[level_idx].append(pyr[level_idx] * resized_mask)

    for level_idx in range(pyramid_levels + 1):
        pyramid_timeseries[level_idx] = np.stack(pyramid_timeseries[level_idx], axis=0)

    # 5) Apply temporal Butterworth bandpass filter + amplification
    for level_idx in range(pyramid_levels + 1):
        # Decrease amplification for higher spatial frequencies
        alpha = amplification / (2 ** (pyramid_levels - level_idx - 1))
        filtered = butter_bandpass_filter(pyramid_timeseries[level_idx], fps, freq_min, freq_max, order=3)
        pyramid_timeseries[level_idx] += filtered * alpha

    # 6) Reconstruct frames by collapsing the pyramid and converting back to RGB
    out_frames = []
    for i in range(nframes):
        recon_levels = [pyramid_timeseries[level_idx][i] for level_idx in range(pyramid_levels + 1)]
        recon_frame = collapse_laplacian_pyramid(recon_levels)
        out_frames.append(np.clip(yiq_to_rgb(recon_frame), 0, 1))

    out_frames = np.array(out_frames, dtype=np.float32)

    # 7) Save the magnified video
    save_video(out_frames, fps, output_path)


## Usage Example
After defining all the functions above, you can run the final pipeline by calling:

In [10]:
eulerian_video_magnification(
    input_path="/kaggle/input/cv-eulerian-videos/face.mp4",
    output_path="magnified_face_final_optimized.avi",
    freq_min=0.8,
    freq_max=1.0,
    amplification=50,
    pyramid_levels=5
)


Saved magnified_face_final_optimized.avi


In [11]:
from IPython.display import FileLink
FileLink("magnified_face_final_optimized.avi")


# Final Analysis of Eulerian Video Magnification Results

## Overview
In this final experiment, I applied the Eulerian Video Magnification (EVM) pipeline to a facial video with the aim of revealing very subtle temporal variations, mainly those linked to physiological activity such as pulse. By directly comparing the original video with the magnified output, it becomes clear that the method is able to make faint color fluctuations in the facial region visible—changes that are almost impossible to notice in the raw footage.

Overall, the pipeline manages to extract meaningful temporal signals while still keeping the video visually realistic, which was one of the main constraints throughout the implementation.

---

## Key Observations

### 1. Amplification of Subtle Color Variations
In the magnified video, periodic color changes become visible, especially around the forehead and cheek areas. These variations are consistent with what would be expected from blood flow–related pulsation, which suggests that the chosen frequency band (0.8–1.0 Hz) is well aligned with heartbeat dynamics.

An important point is that the amplification remains subtle. The color changes are noticeable when compared to the original video, but they do not introduce strong flickering or unnatural color shifts, which indicates that the amplification factor is reasonably balanced.

### 2. Presence of Black Borders in the Output
One visible artifact in the final output is the presence of thin black borders around the frame. This points to a slight spatial mismatch during the reconstruction stage, most likely introduced during the Laplacian pyramid collapse or the resizing steps when using `cv2.pyrUp`.

While this issue does not affect the temporal signal itself, it slightly reduces the visual cleanliness of the output and highlights the importance of careful spatial alignment in multiscale reconstruction.

### 3. Face Mask Localization
The face mask plays a key role in restricting amplification to the facial region. In practice, it successfully prevents most of the background from being amplified, which significantly improves the clarity of the result.

That said, a small amount of amplification leakage can still be observed near the boundaries of the mask. This suggests that the mask could be refined further, but even in its current form, it represents a clear improvement over applying uniform amplification to the entire frame.

### 4. Temporal Stability
The amplified signal evolves smoothly over time, without abrupt jumps or visible temporal jitter. This indicates that the Butterworth bandpass filter is operating as intended and that the target frequency range is being isolated consistently.

From a qualitative point of view, this temporal stability increases confidence that the observed color variations correspond to real underlying signals rather than filtering artifacts.

### 5. Noise and Over-Amplification Effects
In a few localized regions—particularly around sharp edges such as glasses or high-contrast background contours—slight over-amplification can be observed. These effects are most likely introduced at higher pyramid levels, where fine spatial details are more prominent but less relevant to the physiological signal of interest.

However, these artifacts remain limited in scope and do not dominate the overall visual impression of the video.

---

## Suggestions for Improvement
Although no further modifications were applied due to time constraints, several improvements could reasonably be explored:

1. **Correct Frame Size Mismatch**  
   Ensuring consistent spatial dimensions during pyramid reconstruction, especially when using `cv2.pyrUp`, would likely remove the black borders observed in the output.

2. **Refine the Face Mask Strategy**  
   Using frame-by-frame face detection or simple tracking instead of relying on the first frame only could improve mask accuracy in dynamic scenes. Applying basic morphological operations could also help smooth mask boundaries.

3. **Limit High-Frequency Amplification**  
   Further reducing the amplification factor at higher pyramid levels, or decreasing the total number of pyramid levels, could help suppress edge-related noise.

4. **Adapt the Frequency Range to the Target Signal**  
   If the focus shifts from pulse to slower physiological processes such as breathing, lowering the frequency band (e.g., 0.2–0.5 Hz) would be more appropriate.

---

## Conclusion
The final EVM pipeline succeeds in revealing subtle physiological color variations in facial video data while maintaining temporal stability and overall visual realism. Although minor artifacts such as black borders and localized over-amplification are present, they do not undermine the main objective of the experiment.

Overall, this implementation shows that even a relatively simple, paper-inspired Eulerian Video Magnification pipeline can effectively expose weak temporal signals. With modest refinements in spatial alignment, masking, and scale-dependent amplification, the quality and robustness of the results could be improved further.
