# Convolution: A Fundamental Operation for Motion Energy Models

## Overview

Convolution is a mathematical operation that is central to understanding how motion energy models work. In this section, we'll explore convolution in detail, focusing on its application to visual processing and motion detection.

### What we'll cover:
- The mathematical definition of convolution
- The intuition behind convolution in signal processing
- 1D and 2D convolution operations
- Implementing convolution from scratch
- Convolution with various kernels and filters
- The relationship between convolution and visual processing in the brain

## Setting Up

Let's import the libraries we'll need for this section.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as signal
import sys
from matplotlib import animation

# Add the utils package to the path
sys.path.append('../../..')
try:
    from motionenergy.utils import stimuli_generation, visualization
except ImportError:
    print("Note: utils modules not found. This is expected if you haven't implemented them yet.")

# For interactive plots
%matplotlib inline
from IPython.display import HTML, display

# Set plotting style
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12

## 1. Introduction to Convolution

Convolution is a mathematical operation that combines two functions to produce a third function that represents how the shape of one is modified by the other. In signal processing, convolution is often used to describe how an input signal is affected by a system or filter.

### The Convolution Equation

For continuous functions, the convolution of functions $f$ and $g$ is defined as:

$(f * g)(t) = \int_{-\infty}^{\infty} f(\tau) g(t - \tau) d\tau$

For discrete functions (like our digital images and signals), the convolution becomes a sum:

$(f * g)[n] = \sum_{m=-\infty}^{\infty} f[m] g[n - m]$

While these equations may look intimidating, the intuition behind convolution is actually quite simple. Let's visualize it to build our understanding.

## 2. Intuitive Understanding of Convolution

Convolution can be intuitively understood as a process where one function is "flipped" and slid over another function, with the resulting function being the sum of the pointwise products at each position.

Let's visualize this process for a simple 1D example:

In [None]:
def visualize_1d_convolution(signal_func, kernel_func, signal_range, kernel_range, num_positions=5):
    """
    Visualize the convolution process for 1D signals.
    
    Parameters:
    -----------
    signal_func : function
        Function that generates the signal array
    kernel_func : function
        Function that generates the kernel array
    signal_range : tuple
        (start, end) of the signal domain
    kernel_range : tuple
        (start, end) of the kernel domain
    num_positions : int
        Number of kernel positions to visualize
    """
    # Generate the signal and kernel
    t_signal = np.linspace(signal_range[0], signal_range[1], 1000)
    signal = signal_func(t_signal)
    
    t_kernel = np.linspace(kernel_range[0], kernel_range[1], 100)
    kernel = kernel_func(t_kernel)
    
    # Normalize the kernel for visualization
    kernel = kernel / np.max(np.abs(kernel))
    
    # Compute the full convolution
    # We'll use scipy's convolve function with 'full' mode
    # In 'full' mode, the output is the full discrete linear convolution
    conv_result = signal.convolve(signal, kernel, mode='full')
    
    # Create a time axis for the convolution result
    dt_signal = t_signal[1] - t_signal[0]
    dt_kernel = t_kernel[1] - t_kernel[0]
    t_conv = np.linspace(
        t_signal[0] + t_kernel[0], 
        t_signal[-1] + t_kernel[-1], 
        len(conv_result)
    )
    
    # Select positions for visualization
    positions = np.linspace(0, len(t_signal) - 1, num_positions, dtype=int)
    
    # Create the figure
    fig, axes = plt.subplots(num_positions + 1, 1, figsize=(10, 2 + 2 * num_positions))
    
    # Plot the signal and kernel at different positions
    for i, pos in enumerate(positions):
        ax = axes[i]
        
        # Plot the signal
        ax.plot(t_signal, signal, 'b-', label='Signal')
        
        # Position for the kernel
        t_pos = t_signal[pos]
        
        # Plot the flipped and shifted kernel
        # In convolution, the kernel is flipped and shifted
        t_kernel_shifted = t_pos + np.flip(-t_kernel + t_kernel[0] + t_kernel[-1])
        kernel_shifted = np.flip(kernel)
        ax.plot(t_kernel_shifted, kernel_shifted, 'r-', label='Kernel')
        
        # Fill the area representing the product
        # We need to interpolate the signal values at the kernel positions
        y_interp = np.interp(t_kernel_shifted, t_signal, signal)
        product = y_interp * kernel_shifted
        ax.fill_between(t_kernel_shifted, 0, product, alpha=0.3, color='purple')
        
        # Mark the convolution result for this position
        conv_pos = pos + len(t_kernel) - 1
        if conv_pos < len(t_conv):
            ax.plot([t_pos], [conv_result[conv_pos]], 'ko', markersize=5)
            ax.text(t_pos, conv_result[conv_pos], f'  ({t_pos:.1f}, {conv_result[conv_pos]:.2f})')
        
        ax.set_title(f'Kernel Position {i+1}')
        ax.legend()
        ax.grid(True)
    
    # Plot the full convolution result
    ax = axes[-1]
    ax.plot(t_conv, conv_result, 'g-', label='Convolution Result')
    
    # Mark the positions we visualized above
    for i, pos in enumerate(positions):
        t_pos = t_signal[pos]
        conv_pos = pos + len(t_kernel) - 1
        if conv_pos < len(t_conv):
            ax.plot([t_pos], [conv_result[conv_pos]], 'ko', markersize=5)
    
    ax.set_title('Full Convolution Result')
    ax.legend()
    ax.grid(True)
    
    plt.tight_layout()
    return fig

In [None]:
# Example 1: Square wave signal and rectangular kernel
def square_wave(t):
    return (t > -2) & (t < 2)

def rectangular_kernel(t):
    return (t > -0.5) & (t < 0.5)

# Visualize the convolution
fig = visualize_1d_convolution(
    square_wave, rectangular_kernel, 
    signal_range=(-5, 5), kernel_range=(-1, 1), 
    num_positions=5
)
plt.suptitle('Convolution of Square Wave with Rectangular Kernel', fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.97])  # Make space for the suptitle

Let's try another example with a Gaussian kernel, which is commonly used in signal processing and vision:

In [None]:
# Example 2: Square wave signal and Gaussian kernel
def gaussian_kernel(t, sigma=0.5):
    return np.exp(-t**2 / (2 * sigma**2))

# Visualize the convolution
fig = visualize_1d_convolution(
    square_wave, gaussian_kernel, 
    signal_range=(-5, 5), kernel_range=(-2, 2), 
    num_positions=5
)
plt.suptitle('Convolution of Square Wave with Gaussian Kernel', fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.97])  # Make space for the suptitle

## 3. Implementing 1D Convolution from Scratch

To deepen our understanding, let's implement the convolution operation from scratch for 1D signals. This will help us see exactly what's happening at each step.

In [None]:
def convolve_1d(signal, kernel):
    """
    Implement 1D convolution from scratch.
    
    Parameters:
    -----------
    signal : ndarray
        Input signal
    kernel : ndarray
        Convolution kernel
        
    Returns:
    --------
    output : ndarray
        Convolved signal (using 'full' mode)
    """
    # Flip the kernel for convolution
    kernel_flipped = np.flip(kernel)
    
    # Output size for 'full' convolution
    output_size = len(signal) + len(kernel) - 1
    output = np.zeros(output_size)
    
    # Perform the convolution
    for i in range(output_size):
        # Determine the overlap between the signal and kernel at position i
        kernel_start = max(0, i - len(signal) + 1)
        kernel_end = min(len(kernel), i + 1)
        signal_start = max(0, i - len(kernel) + 1)
        signal_end = min(len(signal), i + 1)
        
        # Extract the overlapping parts
        k = kernel_flipped[kernel_start:kernel_end]
        s = signal[signal_start:signal_end]
        
        # Compute the sum of the pointwise product
        output[i] = np.sum(k * s)
    
    return output

Let's compare our implementation with the built-in convolution function:

In [None]:
# Create a simple signal and kernel
signal = np.array([1, 2, 3, 4, 5])
kernel = np.array([0.5, 0.5, 0.5])

# Compute the convolution using our function
our_result = convolve_1d(signal, kernel)

# Compute the convolution using scipy's function
scipy_result = signal.convolve(signal, kernel, mode='full')

# Compare the results
fig, axes = plt.subplots(3, 1, figsize=(10, 8))

# Plot the signal and kernel
axes[0].stem(range(len(signal)), signal, 'b', markerfmt='bo', basefmt=' ')
axes[0].set_title('Signal')

axes[1].stem(range(len(kernel)), kernel, 'r', markerfmt='ro', basefmt=' ')
axes[1].set_title('Kernel')

# Plot the results
x = range(len(our_result))
axes[2].stem(x, our_result, 'g', markerfmt='go', basefmt=' ', label='Our Convolution')
axes[2].plot(x, scipy_result, 'k--', label='SciPy Convolution')
axes[2].set_title('Convolution Results')
axes[2].legend()

plt.tight_layout()

## 4. 2D Convolution and Its Applications in Image Processing

Now let's extend our understanding to 2D convolution, which is essential for image processing and will be critical for our motion energy models.

In 2D convolution, we slide a 2D kernel over a 2D image, computing the sum of the element-wise products at each position. The mathematical formula is similar to the 1D case, but with double summations for rows and columns:

$(I * K)[i, j] = \sum_{m} \sum_{n} I[i-m, j-n] K[m, n]$

Where $I$ is the image and $K$ is the kernel.

Let's implement 2D convolution and explore some common image processing kernels:

In [None]:
def convolve_2d(image, kernel):
    """
    Implement 2D convolution from scratch.
    
    Parameters:
    -----------
    image : ndarray
        Input image (2D array)
    kernel : ndarray
        Convolution kernel (2D array)
        
    Returns:
    --------
    output : ndarray
        Convolved image (using 'full' mode)
    """
    # Get dimensions
    image_rows, image_cols = image.shape
    kernel_rows, kernel_cols = kernel.shape
    
    # Flip the kernel for convolution
    kernel_flipped = np.flip(np.flip(kernel, 0), 1)
    
    # Output dimensions for 'full' convolution
    output_rows = image_rows + kernel_rows - 1
    output_cols = image_cols + kernel_cols - 1
    output = np.zeros((output_rows, output_cols))
    
    # Perform the convolution
    for i in range(output_rows):
        for j in range(output_cols):
            # Determine the overlap between the image and kernel at position (i, j)
            kernel_row_start = max(0, i - image_rows + 1)
            kernel_row_end = min(kernel_rows, i + 1)
            kernel_col_start = max(0, j - image_cols + 1)
            kernel_col_end = min(kernel_cols, j + 1)
            
            img_row_start = max(0, i - kernel_rows + 1)
            img_row_end = min(image_rows, i + 1)
            img_col_start = max(0, j - kernel_cols + 1)
            img_col_end = min(image_cols, j + 1)
            
            # Extract the overlapping parts
            k = kernel_flipped[kernel_row_start:kernel_row_end, kernel_col_start:kernel_col_end]
            img = image[img_row_start:img_row_end, img_col_start:img_col_end]
            
            # Compute the sum of the pointwise product
            output[i, j] = np.sum(k * img)
    
    return output

Let's create some common image processing kernels and see them in action:

In [None]:
# Define some common kernels
kernels = {
    'Box blur': np.ones((3, 3)) / 9,
    'Gaussian blur': np.array([[1, 2, 1], [2, 4, 2], [1, 2, 1]]) / 16,
    'Sharpen': np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]]),
    'Edge detection': np.array([[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]]),
    'Sobel X': np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]),
    'Sobel Y': np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]])
}

# Visualize the kernels
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
axes = axes.flatten()

for i, (name, kernel) in enumerate(kernels.items()):
    im = axes[i].imshow(kernel, cmap='RdBu', vmin=-1, vmax=1)
    axes[i].set_title(name)
    axes[i].set_xticks([])
    axes[i].set_yticks([])
    plt.colorbar(im, ax=axes[i])

plt.tight_layout()

Let's create a simple test image and apply these kernels to see their effects:

In [None]:
# Create a simple test image
def create_test_image(size=64):
    # Create an empty image
    image = np.zeros((size, size))
    
    # Add a square
    image[size//4:3*size//4, size//4:3*size//4] = 0.5
    
    # Add a smaller square with higher intensity
    image[3*size//8:5*size//8, 3*size//8:5*size//8] = 1.0
    
    return image

# Create the test image
test_image = create_test_image(64)

# Apply the kernels and display the results
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
axes = axes.flatten()

# Display the original image
axes[0].imshow(test_image, cmap='gray')
axes[0].set_title('Original Image')
axes[0].set_xticks([])
axes[0].set_yticks([])

# Apply each kernel and display the result
for i, (name, kernel) in enumerate(kernels.items(), 1):
    # Apply the kernel using SciPy's convolve2d function
    # We use 'same' mode to get an output of the same size as the input
    filtered = signal.convolve2d(test_image, kernel, mode='same', boundary='symm')
    
    # Display the filtered image
    im = axes[i].imshow(filtered, cmap='gray')
    axes[i].set_title(name)
    axes[i].set_xticks([])
    axes[i].set_yticks([])

plt.tight_layout()

## 5. Separable Convolution: A Computational Advantage

In some cases, a 2D convolution kernel can be expressed as the outer product of two 1D kernels. Such kernels are called "separable" and provide significant computational advantages, as the 2D convolution can be performed as two sequential 1D convolutions.

For example, a 2D Gaussian kernel is separable into the outer product of two 1D Gaussian kernels:

$G(x, y) = G_x(x) \cdot G_y(y)$

Let's demonstrate this with a Gaussian filter:

In [None]:
# Create 1D Gaussian kernels
def gaussian_1d(size, sigma=1.0):
    x = np.linspace(-size//2, size//2, size)
    return np.exp(-x**2 / (2 * sigma**2))

# Create a separable 2D Gaussian
size = 9
sigma = 1.5

g_x = gaussian_1d(size, sigma)
g_y = gaussian_1d(size, sigma)

# Create the 2D Gaussian as the outer product
g_2d_separable = np.outer(g_y, g_x)

# Normalize
g_2d_separable /= np.sum(g_2d_separable)

# Create a non-separable 2D Gaussian (but it should be very similar)
x, y = np.meshgrid(np.linspace(-size//2, size//2, size), np.linspace(-size//2, size//2, size))
g_2d_direct = np.exp(-(x**2 + y**2) / (2 * sigma**2))
g_2d_direct /= np.sum(g_2d_direct)

# Compare the two methods
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# 1D kernels
axes[0].plot(g_x, 'b-', label='G_x')
axes[0].plot(g_y, 'r--', label='G_y')
axes[0].set_title('1D Gaussian Kernels')
axes[0].legend()

# 2D separable Gaussian
im1 = axes[1].imshow(g_2d_separable, cmap='viridis')
axes[1].set_title('2D Gaussian (Separable)')
plt.colorbar(im1, ax=axes[1])

# 2D direct Gaussian
im2 = axes[2].imshow(g_2d_direct, cmap='viridis')
axes[2].set_title('2D Gaussian (Direct)')
plt.colorbar(im2, ax=axes[2])

plt.tight_layout()

Let's compare the computational efficiency of separable vs. direct 2D convolution:

In [None]:
import time

# Create a larger test image for better timing
large_image = create_test_image(256)

# Function to apply separable convolution
def apply_separable_convolution(image, kernel_x, kernel_y):
    # Apply 1D convolution along rows
    temp = np.zeros_like(image)
    for i in range(image.shape[0]):
        temp[i, :] = signal.convolve(image[i, :], kernel_x, mode='same')
    
    # Apply 1D convolution along columns
    result = np.zeros_like(image)
    for j in range(image.shape[1]):
        result[:, j] = signal.convolve(temp[:, j], kernel_y, mode='same')
    
    return result

# Time the methods
# Direct 2D convolution
start_time = time.time()
result_direct = signal.convolve2d(large_image, g_2d_direct, mode='same', boundary='symm')
direct_time = time.time() - start_time

# Separable 2D convolution
start_time = time.time()
result_separable = apply_separable_convolution(large_image, g_x, g_y)
separable_time = time.time() - start_time

print(f"Direct 2D convolution time: {direct_time:.4f} seconds")
print(f"Separable 2D convolution time: {separable_time:.4f} seconds")
print(f"Speedup: {direct_time / separable_time:.2f}x")

# Compare the results
error = np.max(np.abs(result_direct - result_separable))
print(f"Maximum absolute error: {error:.10f}")

# Visualize the results
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

axes[0].imshow(large_image, cmap='gray')
axes[0].set_title('Original Image')
axes[0].set_xticks([])
axes[0].set_yticks([])

axes[1].imshow(result_direct, cmap='gray')
axes[1].set_title(f'Direct 2D Convolution\n({direct_time:.4f} s)')
axes[1].set_xticks([])
axes[1].set_yticks([])

axes[2].imshow(result_separable, cmap='gray')
axes[2].set_title(f'Separable 2D Convolution\n({separable_time:.4f} s)')
axes[2].set_xticks([])
axes[2].set_yticks([])

plt.tight_layout()

## 6. Convolution in the Frequency Domain

One of the important properties of convolution is that it corresponds to multiplication in the frequency domain. The Fourier transform of the convolution of two functions is equal to the product of their Fourier transforms:

$\mathcal{F}\{f * g\} = \mathcal{F}\{f\} \cdot \mathcal{F}\{g\}$

This property is particularly useful for understanding how filtering works in the frequency domain and for implementing efficient convolution using the Fast Fourier Transform (FFT). We'll explore this more in the next section on Fourier transforms.

## 7. Convolution in Visual Processing

In the context of visual neuroscience and motion energy models, convolution plays a crucial role in modeling how neurons process visual information.

### Receptive Fields

Neurons in the visual cortex have spatially localized receptive fields, which determine how they respond to visual stimuli. These receptive fields can be modeled as convolution kernels that are applied to the visual input. The output of the convolution represents the neuron's response to the visual stimulus.

### Hierarchical Processing

The visual system processes information hierarchically, with each level of processing building upon the previous one. Convolution provides a mathematical framework for understanding this hierarchical processing:

1. At the retinal level, center-surround receptive fields can be modeled as difference-of-Gaussians kernels
2. In the primary visual cortex (V1), simple cells have oriented receptive fields that can be modeled as Gabor filters
3. Complex cells in V1 integrate the responses of multiple simple cells, which can be modeled as a nonlinear combination of the outputs of multiple convolutions

### Motion Energy Models

In motion energy models, convolution with spatiotemporal filters is used to extract motion information from the visual input. These filters are selective for specific directions and speeds of motion. We'll explore these spatiotemporal filters in more detail in later sections of the course.

## 8. Summary

In this section, we've explored convolution, a fundamental operation in signal processing and computational neuroscience. Here's a summary of what we've learned:

1. **Convolution basics**: Convolution combines two functions to produce a third function that represents how one function modifies the shape of the other.

2. **1D and 2D convolution**: We implemented and visualized both 1D and 2D convolution operations, gaining an intuitive understanding of how they work.

3. **Image processing applications**: We explored common image processing kernels like Gaussian blur, edge detection, and Sobel filters, which are all based on convolution.

4. **Separable convolution**: We learned about separable kernels, which can significantly reduce the computational cost of 2D convolution.

5. **Convolution in the frequency domain**: We briefly touched on the relationship between convolution and multiplication in the frequency domain, which we'll explore further in the next section.

6. **Convolution in visual processing**: We discussed how convolution models the way neurons in the visual system process information, from retinal ganglion cells to complex cells in the visual cortex.

In the next section, we'll dive into the Fourier transform, which will provide us with another powerful tool for understanding motion perception and motion energy models.

## Further Reading

- Smith, S. W. (1997). The Scientist and Engineer's Guide to Digital Signal Processing. California Technical Publishing. Chapter 6: Convolution.
- Marr, D. (1982). Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W. H. Freeman and Company.
- Carandini, M., Demb, J. B., Mante, V., Tolhurst, D. J., Dan, Y., Olshausen, B. A., ... & Rust, N. C. (2005). Do we know what the early visual system does? Journal of Neuroscience, 25(46), 10577-10597.