In [1]:
pwd()

'C:\\Users\\aakas\\CV_Project\\HR_using_dlib'

In [8]:
!pip install scipy

Collecting scipy
  Using cached scipy-1.10.1-cp38-cp38-win_amd64.whl (42.2 MB)
Installing collected packages: scipy
Successfully installed scipy-1.10.1


# Pre-Processing

In [2]:
import cv2
import numpy as np
import dlib

# Load face detector from dlib
detector = dlib.get_frontal_face_detector()

# Read in and simultaneously preprocess video
def read_video(path):
    cap = cv2.VideoCapture(path)
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    video_frames = []
    face_rects = ()

    while cap.isOpened():
        ret, img = cap.read()
        if not ret:
            break
        gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
        roi_frame = img

        # Detect face
        if len(video_frames) == 0:
            dets = detector(gray, 1)
            face_rects = [(d.left(), d.top(), d.right()-d.left(), d.bottom()-d.top()) for d in dets]

        # Select ROI
        if len(face_rects) > 0:
            for (x, y, w, h) in face_rects:
                roi_frame = img[y:y + h, x:x + w]
            if roi_frame.size != img.size:
                roi_frame = cv2.resize(roi_frame, (500, 500))
                frame = np.ndarray(shape=roi_frame.shape, dtype="float")
                frame[:] = roi_frame * (1. / 255)
                video_frames.append(frame)

    frame_ct = len(video_frames)
    cap.release()

    return video_frames, frame_ct, fps


* This code defines a function read_video that takes a video path as input and returns a tuple consisting of the video frames, frame count, and frames per second (fps). 

* The function reads in the video using OpenCV's cv2.VideoCapture method and preprocesses each frame by converting it to grayscale. 

* It then detects the face in the first frame of the video using dlib's face detector and selects the region of interest (ROI) containing the face. 

* The ROI is resized to 500x500, normalized to values between 0 and 1, and added to the list of video frames. 

* This process is repeated for each frame in the video until all frames have been processed.

* It is important to note that this code assumes there is only one face in the video, and that it is present in the first frame. * If there are multiple faces, or if the face is not present in the first frame, the code may not work as expected. 

* Additionally, the face detector used in this code is not perfect, so it may not always accurately detect the face in the video.

* Import the dlib module
* Load the face detector using dlib.get_frontal_face_detector()
* Replace the OpenCV face detector faceCascade.detectMultiScale() with the dlib face detector detector(). 
* The output of detector() is a list of dlib.rectangle objects, which need to be converted to (x, y, w, h) format that is used by OpenCV functions.
* Convert the dlib.rectangle objects to (x, y, w, h) format and store them in face_rects.

Explaination of above code:

* First, we import the necessary modules: OpenCV (cv2), NumPy (numpy), and dlib (dlib).

In [4]:
# import cv2
# import numpy as np
# import dlib

* Next, we load the face detector from dlib using the dlib.get_frontal_face_detector() function. 
* This returns a face detector object that we can use to detect faces in images.

In [5]:
# detector = dlib.get_frontal_face_detector()

* This function takes a video path as input and initializes some variables: a cv2.VideoCapture object cap to read the video frames, 
* an integer fps to store the frames per second of the video, an empty list video_frames to store the processed frames, and an empty tuple face_rects to store the face bounding box coordinates.

In [1]:
# def read_video(path):
#     cap = cv2.VideoCapture(path)
#     fps = int(cap.get(cv2.CAP_PROP_FPS))
#     video_frames = []
#     face_rects = ()

* This loop iterates through each frame in the video until all frames have been processed. 
* For each frame, it reads the frame using cap.read(), converts it to grayscale using cv2.cvtColor(), 
* and sets roi_frame to be the original image

In [2]:
#     while cap.isOpened():
#         ret, img = cap.read()
#         if not ret:
#             break
#         gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
#         roi_frame = img

* This code only runs for the first frame of the video, since len(video_frames) is 0 initially. 
* It uses the detector object to detect faces in the grayscale image, and converts the face bounding box coordinates from dlib.rectangle objects to tuples of left, top, width, and height values.

In [3]:
#         if len(video_frames) == 0:
#             dets = detector(gray, 1)
#             face_rects = [(d.left(), d.top(), d.right()-d.left(), d.bottom()-d.top()) for d in dets]


* This code runs for each frame after the first frame. 
* It selects the ROI containing the face using the bounding box coordinates, and resizes it to 500x500 using cv2.resize(). 
* It then normalizes the pixel values to be between 0 and 1, and stores the result in a numpy array frame, which is added to the video_frames list.

In [5]:
#         if len(face_rects) > 0:
#             for (x, y, w, h) in face_rects:
#                 roi_frame = img[y:y + h, x:x + w]
#             if roi_frame.size != img.size:
#                 roi_frame = cv2.resize(roi_frame, (500, 500))
#                 frame = np.ndarray(shape=roi_frame.shape, dtype="float")
#                 frame[:] = roi_frame * (1. / 255)
#                 video_frames.append(frame)

* Finally, the function calculates the number of frames processed and releases the cv2.VideoCapture object. 
* It returns a tuple containing the video_frames list, the number of frames processed frame_ct, and the frames per second of the video fps

In [6]:
#     frame_ct = len(video_frames)
#     cap.release()
#     return video_frames, frame_ct, fps


* **Overall**, 
* this code reads in a video, detects the face in the first frame, selects the region of interest containing the face in each subsequent frame, and preprocesses the ROI by resizing and normalizing it. 
* The resulting processed frames are stored in a list and returned as output along with the number of frames and fps.

* **The reason for converting the image to grayscale**
* is to simplify the image processing pipeline and reduce the computational complexity of the algorithm. 
* In general, grayscale images have a single channel (compared to the 3 channels of RGB color images) and only represent the intensity of the image at each pixel, without color information.

* In this particular code, the grayscale conversion is also useful for the face detection step, as many face detection algorithms (including the get_frontal_face_detector() method from the dlib library used in this code) are designed to work on grayscale images. 
* Converting the image to grayscale also reduces the amount of variation in the image that is not relevant to the face detection task, such as changes in color or lighting conditions.

* Overall, converting the image to grayscale simplifies the image processing pipeline and makes it easier to detect faces in the image.

* pyramids.py - Contains functions to generate and collapse image/video pyramids (Gaussian/Laplacian)

In [6]:
import cv2
import numpy as np


# Build Gaussian image pyramid
def build_gaussian_pyramid(img, levels):
    float_img = np.ndarray(shape=img.shape, dtype="float")
    float_img[:] = img
    pyramid = [float_img]

    for i in range(levels-1):
        float_img = cv2.pyrDown(float_img)
        pyramid.append(float_img)

    return pyramid


# Build Laplacian image pyramid from Gaussian pyramid
def build_laplacian_pyramid(img, levels):
    gaussian_pyramid = build_gaussian_pyramid(img, levels)
    laplacian_pyramid = []

    for i in range(levels-1):
        upsampled = cv2.pyrUp(gaussian_pyramid[i+1])
        (height, width, depth) = upsampled.shape
        gaussian_pyramid[i] = cv2.resize(gaussian_pyramid[i], (height, width))
        diff = cv2.subtract(gaussian_pyramid[i],upsampled)
        laplacian_pyramid.append(diff)

    laplacian_pyramid.append(gaussian_pyramid[-1])

    return laplacian_pyramid


# Build video pyramid by building Laplacian pyramid for each frame
def build_video_pyramid(frames):
    lap_video = []

    for i, frame in enumerate(frames):
        pyramid = build_laplacian_pyramid(frame, 3)
        for j in range(3):
            if i == 0:
                lap_video.append(np.zeros((len(frames), pyramid[j].shape[0], pyramid[j].shape[1], 3)))
            lap_video[j][i] = pyramid[j]

    return lap_video


# Collapse video pyramid by collapsing each frame's Laplacian pyramid
def collapse_laplacian_video_pyramid(video, frame_ct):
    collapsed_video = []

    for i in range(frame_ct):
        prev_frame = video[-1][i]

        for level in range(len(video) - 1, 0, -1):
            pyr_up_frame = cv2.pyrUp(prev_frame)
            (height, width, depth) = pyr_up_frame.shape
            prev_level_frame = video[level - 1][i]
            prev_level_frame = cv2.resize(prev_level_frame, (height, width))
            prev_frame = pyr_up_frame + prev_level_frame

        # Normalize pixel values
        min_val = min(0.0, prev_frame.min())
        prev_frame = prev_frame + min_val
        max_val = max(1.0, prev_frame.max())
        prev_frame = prev_frame / max_val
        prev_frame = prev_frame * 255

        prev_frame = cv2.convertScaleAbs(prev_frame)
        collapsed_video.append(prev_frame)

    return collapsed_video

* The above code defines a set of functions that build Gaussian and Laplacian pyramids for images and videos, and use them to create and collapse pyramids of video frames. 
* These pyramids are a way of representing an image or video at multiple levels of detail, 
* where the higher levels contain less detail but more global information, and the lower levels contain more detail but less global information

* The build_gaussian_pyramid function takes an input image and a number of levels and returns a list of images, 
* where each image is a smoothed and downsampled version of the previous level. 
* This can be useful for various image processing tasks, such as image blending or texture synthesis.

* The build_laplacian_pyramid function takes an input image and a number of levels and returns a list of images, 
* where each image represents the difference between the corresponding level of the Gaussian pyramid and the upsampled and resized version of the next level of the Gaussian pyramid. 
* This can be useful for tasks such as image compression or feature extraction.

* The build_video_pyramid function takes a list of frames and builds a Laplacian pyramid for each frame, returning a list of lists of images representing the pyramids for each frame at each level. 
* This can be useful for tasks such as video compression or motion analysis

* The collapse_laplacian_video_pyramid function takes a Laplacian video pyramid and collapses it by adding up the levels of each frame's pyramid,
* starting with the lowest level and working up to the highest. 
* This can be useful for tasks such as video denoising or super-resolution.

* **Overall**
* these functions provide a powerful toolset for representing and manipulating images and videos at different levels of detail,
* allowing for a wide range of image processing and computer vision applications.

* The above code defines several functions that are used to build and collapse Laplacian image pyramids.

* build_gaussian_pyramid(img, levels) function builds a Gaussian pyramid for an input image. A Gaussian pyramid is a multi-resolution image pyramid that contains successive reduced images with decreasing resolution. It is built by repeatedly applying a Gaussian blur filter to the image and down-sampling the image. The function takes an input image and the number of levels in the pyramid as arguments and returns a list of images that make up the pyramid.

* build_laplacian_pyramid(img, levels) function builds a Laplacian pyramid for an input image. A Laplacian pyramid is a multi-resolution image pyramid that contains successive high-pass filtered images. It is built by subtracting each level of the Gaussian pyramid from its up-sampled version. The function takes an input image and the number of levels in the pyramid as arguments and returns a list of images that make up the pyramid.

* build_video_pyramid(frames) function builds a Laplacian pyramid for each frame in a video. The function takes a list of frames as an argument and returns a list of Laplacian pyramids, one for each frame.

* collapse_laplacian_video_pyramid(video, frame_ct) function collapses a Laplacian pyramid for each frame in a video. The function takes a list of Laplacian pyramids and the number of frames in the video as arguments and returns a list of images, one for each frame in the video.

* Overall, these functions can be used to build and collapse image pyramids, which can be useful in various computer vision and image processing applications, such as image compression, object detection, and feature extraction.

* eulerian.py - Contains function for a temporal bandpass filter that uses a Fast-Fourier Transform

In [9]:
import numpy as np
import scipy.fftpack as fftpack


# Temporal bandpass filter with Fast-Fourier Transform
def fft_filter(video, freq_min, freq_max, fps):
    fft = fftpack.fft(video, axis=0)
    frequencies = fftpack.fftfreq(video.shape[0], d=1.0 / fps)
    bound_low = (np.abs(frequencies - freq_min)).argmin()
    bound_high = (np.abs(frequencies - freq_max)).argmin()
    fft[:bound_low] = 0
    fft[bound_high:-bound_high] = 0
    fft[-bound_low:] = 0
    iff = fftpack.ifft(fft, axis=0)
    result = np.abs(iff)
    result *= 100  # Amplification factor

    return result, fft, frequencies

* This code defines a function fft_filter that applies a temporal bandpass filter to a video using Fast-Fourier Transform.

* The input parameters are:

    * video: a 4D numpy array representing a video with dimensions (frames, height, width, channels)
    * freq_min: the lower frequency cutoff in Hz
    * freq_max: the upper frequency cutoff in Hz
    * fps: the frames per second of the video


* The function first applies FFT along the first dimension of the video array to obtain the frequency spectrum. 
* Then, it sets all frequencies outside the range [freq_min, freq_max] to zero in the spectrum. 
* Finally, it applies inverse FFT along the first dimension to obtain the filtered video. 
* The filtered video is then multiplied by an amplification factor of 100.

* The function returns a tuple containing:

    * result: the filtered video as a 4D numpy array with the same dimensions as the input video
    * fft: the frequency spectrum of the input video
    * frequencies: the frequencies corresponding to each element of the frequency spectrum.

* The code above defines a function fft_filter that applies a temporal bandpass filter to a video using Fast Fourier Transform (FFT). The input arguments are:

    * video: A numpy array containing the video frames, where the first dimension corresponds to the frame index and the remaining dimensions correspond to the frame shape.
    * freq_min: A float specifying the lower frequency cutoff of the filter in Hz.
    * freq_max: A float specifying the upper frequency cutoff of the filter in Hz.
    * fps: A float specifying the frame rate of the video in frames per second.

* The function first applies the FFT to the video frames along the temporal dimension using the fftpack.fft function from the scipy package. 
* It then calculates the frequencies of the FFT using the fftpack.fftfreq function. 
* The lower and upper bounds of the filter are determined based on the specified freq_min and freq_max values using the np.abs and argmin functions. 
* The frequencies outside of the bounds are set to zero in the FFT. 
* The inverse FFT is then applied using the fftpack.ifft function, and the absolute value of the result is computed and amplified by a factor of 100. 
* The resulting filtered video is returned along with the FFT and frequencies.

* heartrate.py - Contains function to calculate heart rate from FFT results

In [10]:
from scipy import signal


# Calculate heart rate from FFT peaks
def find_heart_rate(fft, freqs, freq_min, freq_max):
    fft_maximums = []

    for i in range(fft.shape[0]):
        if freq_min <= freqs[i] <= freq_max:
            fftMap = abs(fft[i])
            fft_maximums.append(fftMap.max())
        else:
            fft_maximums.append(0)

    peaks, properties = signal.find_peaks(fft_maximums)
    max_peak = -1
    max_freq = 0

    # Find frequency with max amplitude in peaks
    for peak in peaks:
        if fft_maximums[peak] > max_freq:
            max_freq = fft_maximums[peak]
            max_peak = peak

    return freqs[max_peak] * 60

* This code defines a function called find_heart_rate that takes as input the result of a Fast Fourier Transform (fft), 
* the frequencies that correspond to each point in the FFT (freqs), 
* the minimum and maximum frequency values of interest (freq_min and freq_max), and returns an 
* estimated heart rate in beats per minute.

* First, the function calculates the maximum amplitude of the FFT for each frequency, 
* but only considers the maximums within the frequency range of interest (freq_min and freq_max), and stores these values in the list fft_maximums.

* Next, the function uses the find_peaks function from the signal module in SciPy to find the indices of the local maxima in fft_maximums. 
* These peaks correspond to the most prominent frequencies in the FFT.

* Finally, the function loops over the indices of the peaks to find the peak with the highest amplitude in fft_maximums. 
* This peak corresponds to the frequency with the highest amplitude in the FFT, and therefore the most likely heart rate. 
*  The function returns this frequency in beats per minute, which is calculated by multiplying the frequency by 60

* The code defines a function find_heart_rate() that takes in the Fast Fourier Transform (FFT) results, frequencies, the minimum and maximum frequency of the signal of interest, and calculates the heart rate from the FFT peaks.

* The function first initializes an empty list fft_maximums to store the maximum FFT values for the frequencies in the range between freq_min and freq_max. It then iterates over the FFT results fft along its first dimension (i.e., time axis), and for each frequency in the range, finds the maximum value in the corresponding FFT map and adds it to fft_maximums. If the frequency is outside the range, 0 is added to fft_maximums.

* Next, the function uses signal.find_peaks() from the SciPy signal processing module to find the peaks in fft_maximums. The function then finds the maximum peak in fft_maximums and returns the corresponding frequency, multiplied by 60 to convert it to beats per minute, as the heart rate.

* In summary, the find_heart_rate() function finds the heart rate from the FFT peaks of a signal in the frequency range of interest.

* The main.py file contains the main program that utilizes all of the other modules defined in the other code files to read in the input video, run Eulerian magnification on it, and to display the results.

In [15]:
import cv2
# import pyramids
# import heartrate
# import preprocessing
# import eulerian

# Frequency range for Fast-Fourier Transform
freq_min = 1
freq_max = 1.8

# Preprocessing phase
print("Reading + preprocessing video...")
# video_frames, frame_ct, fps = preprocessing.read_video("videos/rohin_active.mov")
video_frames, frame_ct, fps = read_video("videos/rohin_active.mov")

# Build Laplacian video pyramid
print("Building Laplacian video pyramid...")
# lap_video = pyramids.build_video_pyramid(video_frames)
lap_video = build_video_pyramid(video_frames)

amplified_video_pyramid = []

for i, video in enumerate(lap_video):
    if i == 0 or i == len(lap_video)-1:
        continue

    # Eulerian magnification with temporal FFT filtering
    print("Running FFT and Eulerian magnification...")
#     result, fft, frequencies = eulerian.fft_filter(video, freq_min, freq_max, fps)
    result, fft, frequencies = fft_filter(video, freq_min, freq_max, fps)
    lap_video[i] += result

    # Calculate heart rate
    print("Calculating heart rate...")
#     heart_rate = heartrate.find_heart_rate(fft, frequencies, freq_min, freq_max)
    heart_rate = find_heart_rate(fft, frequencies, freq_min, freq_max)

# Collapse laplacian pyramid to generate final video
print("Rebuilding final video...")
# amplified_frames = pyramids.collapse_laplacian_video_pyramid(lap_video, frame_ct)
amplified_frames = collapse_laplacian_video_pyramid(lap_video, frame_ct)

# Output heart rate and final video
print("Heart rate: ", heart_rate, "bpm")
print("Displaying final video...")

for frame in amplified_frames:
    cv2.imshow("frame", frame)
    cv2.waitKey(20)

Reading + preprocessing video...
Building Laplacian video pyramid...
Running FFT and Eulerian magnification...
Calculating heart rate...
Rebuilding final video...
Heart rate:  62.30031948881789 bpm
Displaying final video...


* This is a Python code for reading and pre-processing video frames using OpenCV and dlib. The code performs the following tasks:
    * Load the face detector from dlib.
    * Read in the video from the given path and calculate the FPS of the video.
    * Pre-process each frame of the video by converting it to grayscale and detecting the face in the first frame.
    * Select the region of interest (ROI) around the detected face and resize it to (500, 500).
    * Normalize the ROI to a range of [0, 1] and append it to a list of pre-processed frames.
    
* The function read_video() takes a single argument, path, which is the path to the video file. The function returns a tuple consisting of:
    * a list of pre-processed frames
    * the total number of frames in the video
    * the FPS of the video.
    
* Note that the code assumes that there is only one face in the video, and it only detects the face in the first frame. 
* If there are multiple faces or the face moves out of the frame, the code will not be able to detect it.