# Lab 7: Image Compression

### 1. Objectives
This lab focuses on understanding and applying various image compression techniques. You will explore the differences between lossless and lossy compression, evaluate compression performance using different metrics, and analyze the impact of compression on image data.

### 2. Submission Guidelines
- **File Format**: Jupyter Notebook (.ipynb)
- **Naming**: Lab7_StudentFullName_StudentID.ipynb
- **Submission**: Compress the Jupyter Notebook file into a .zip archive and upload it to Moodle.

### 3. Preparation

Run the following cell to install the required packages if you don't have them already.

In [1]:
!pip install numpy opencv-python matplotlib pillow

Collecting numpy
  Using cached numpy-2.3.2-cp313-cp313-win_amd64.whl.metadata (60 kB)
Collecting opencv-python
  Using cached opencv_python-4.12.0.88-cp37-abi3-win_amd64.whl.metadata (19 kB)
Collecting matplotlib
  Using cached matplotlib-3.10.5-cp313-cp313-win_amd64.whl.metadata (11 kB)
Collecting pillow
  Using cached pillow-11.3.0-cp313-cp313-win_amd64.whl.metadata (9.2 kB)
Collecting numpy
  Using cached numpy-2.2.6-cp313-cp313-win_amd64.whl.metadata (60 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Using cached contourpy-1.3.3-cp313-cp313-win_amd64.whl.metadata (5.5 kB)
Collecting cycler>=0.10 (from matplotlib)
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading fonttools-4.59.1-cp313-cp313-win_amd64.whl.metadata (111 kB)
Collecting kiwisolver>=1.3.1 (from matplotlib)
  Using cached kiwisolver-1.4.9-cp313-cp313-win_amd64.whl.metadata (6.4 kB)
Collecting pyparsing>=2.3.1 (from matplotlib)
  Using cached

In [None]:
import cv2
import numpy as np
import os
from matplotlib import pyplot as plt
from PIL import Image

### 4. Tasks

For all tasks, we will use the uncompressed BMP image `1-bmp-sample-2.bmp` as the baseline for all compression operations.

#### Task 1: Compression Ratio (C) and Relative Data Redundancy (R)

**Theory:**
Image compression is about reducing the amount of data required to represent a digital image. We can measure the effectiveness of a compression algorithm using two key metrics:
- **Compression Ratio (C):** This ratio compares the size of the original uncompressed data to the size of the compressed data. A higher compression ratio indicates better compression performance. It is calculated as:
  > C = `original_size` / `compressed_size`
- **Relative Data Redundancy (R):** This metric quantifies how much data was redundant in the original image and was removed during compression, expressed as a value between 0 and 1. A value closer to 1 means more redundancy was removed. It's calculated from the compression ratio:
  > R = 1 - (1 / C)

**Guidance:**
1.  **Get Original Size:** Use `os.path.getsize()` to find the size of the original BMP file in bytes.
2.  **Compress and Save:**
    - For **PNG**, use `cv2.imwrite()` with the `cv2.IMWRITE_PNG_COMPRESSION` flag. The compression level ranges from 0 (min compression) to 9 (max compression).
    - For **JPEG**, use `cv2.imwrite()` with the `cv2.IMWRITE_JPEG_QUALITY` flag. The quality level ranges from 0 (max compression, lowest quality) to 100 (min compression, highest quality).
3.  **Get Compressed Size:** After saving the new files, use `os.path.getsize()` to get their sizes.
4.  **Calculate C and R:** Apply the formulas above using the obtained sizes.
5.  **Analyze:** Compare the C and R values for PNG and JPEG at different compression levels.

In [None]:
bmp_path = '1-bmp-sample-2.bmp'
img = cv2.imread(bmp_path)
original_size = os.path.getsize(bmp_path)
print(f"Original BMP Size: {original_size / 1024:.2f} KB\n")

# --- PNG Compression ---
print("--- PNG Compression ---")

# TODO: Define compression levels in a dictionary
png_compression_levels = None

for name, level in png_compression_levels.items():
    path = f'output_png_{name}.png'
    
    # TODO: Compress and save to PNG file
    pass
    
    compressed_size = os.path.getsize(path)

    # TODO: Calculate compression ratio (C) and Relative Data Redundancy (R)
    C = None
    R = None
    
    print(f"Level '{name}' ({level}): Size = {compressed_size / 1024:.2f} KB, C = {C:.2f}, R = {R:.2f}")

# --- JPEG Compression ---
print("\n--- JPEG Compression ---")

# TODO: Define compression levels in a dictionary
# Note: For JPEG, quality 100 is min compression, 0 is max compression.
jpeg_quality_levels = None

for name, level in jpeg_quality_levels.items():
    path = f'output_jpeg_{name}.jpg'
    
    # TODO: Compress and save to JPEG file
    pass
    
    compressed_size = os.path.getsize(path)

    # TODO: Calculate compression ratio (C) and Relative Data Redundancy (R)
    C = None
    R = None
    
    print(f"Level '{name}' ({level}): Size = {compressed_size / 1024:.2f} KB, C = {C:.2f}, R = {R:.2f}")

#### Task 2: Fidelity Criteria

**Theory:**
When using lossy compression (like JPEG), some information from the original image is lost. Fidelity criteria are metrics used to measure how different the compressed image is from the original. A lower error value signifies a higher fidelity (i.e., the compressed image is more faithful to the original).

- **Root Mean Square Error (RMSE):** This metric calculates the square root of the average of the squared differences between pixel values of the original and compressed images. It is sensitive to large errors.
  > RMSE = sqrt( (1 / (M * N)) * Σ(I_original(i, j) - I_compressed(i, j))^2 )

- **Mean Absolute Error (MAE):** This metric calculates the average of the absolute differences between pixel values. It treats all errors equally.
  > MAE = (1 / (M * N)) * Σ|I_original(i, j) - I_compressed(i, j)|

**Guidance:**
1.  **Load Images:** Use `cv2.imread()` to load the original BMP and the compressed PNG/JPEG images you created in Task 1.
2.  **Convert to Floating Point:** Before calculating differences, convert the image data type to `float` to prevent overflow issues (e.g., using `astype(np.float64)`).
3.  **Calculate Errors:**
    - For **RMSE**, calculate the difference between the images, square the result, find the mean of all pixels, and then take the square root.
    - For **MAE**, calculate the absolute difference between the images and then find the mean of all pixels.
4.  **Analyze:** Observe how RMSE and MAE values change with different compression levels for PNG (lossless) and JPEG (lossy).

In [None]:
# Load the original BMP image and convert to float64 for calculations
original_img_float = cv2.imread(bmp_path).astype(np.float64)

# --- a. Root Mean Square Error (RMSE) for PNG ---
print("--- RMSE for PNG ---")
for name in png_compression_levels.keys():
    path = f'output_png_{name}.png'
    compressed_img_float = cv2.imread(path).astype(np.float64)

    # TODO: Calculate Root Mean Square Error (RMSE)
    rmse = None

    print(f"Level '{name}': RMSE = {rmse:.4f}")
    
print("\n*Note: Since PNG is a lossless format, the RMSE should be 0.0, indicating no difference.*\n")

# --- b. Mean Absolute Error (MAE) for JPEG ---
print("\n--- MAE for JPEG ---")
for name in jpeg_quality_levels.keys():
    path = f'output_jpeg_{name}.jpg'
    compressed_img_float = cv2.imread(path).astype(np.float64)

    # TODO: Calculate Mean Absolute Error (MAE)
    mae = None
    
    print(f"Level '{name}': MAE = {mae:.4f}")

#### Task 3: Lossless and Lossy Approach

**Theory:**
Image compression techniques can be broadly categorized into two types:
- **Lossless Compression (e.g., PNG):** This method reduces file size without losing any image data. When the image is decompressed, it is an exact, pixel-by-pixel replica of the original. This is achieved by finding more efficient ways to represent the data, such as identifying patterns.
- **Lossy Compression (e.g., JPEG):** This method achieves higher compression ratios by permanently discarding some data that is less perceptible to the human eye. This results in a smaller file size but also a reduction in image quality.

**Guidance:**

**a. Matrix Difference:**
1.  **Load Images:** Read the original BMP image and the compressed PNG/JPEG images (at min and max compression) as NumPy arrays.
2.  **Calculate Difference:** Compute the absolute difference between the original image's matrix and each compressed image's matrix using `cv2.absdiff()`.
3.  **Analyze:**
    - For a **lossless** format like PNG, the difference matrix should be all zeros, indicating no data was lost.
    - For a **lossy** format like JPEG, the difference matrix will contain non-zero values, representing the data that was discarded during compression.

**b. Histogram Comparison:**
1.  **Calculate Histograms:** Use `cv2.calcHist()` to compute the histogram for the grayscale version of the original image and each compressed image.
2.  **Display Histograms:** Use `matplotlib.pyplot.plot()` to display the histograms on the same chart for easy comparison.
3.  **Analyze:**
    - For **lossless** PNG, the histogram will be identical to the original's histogram.
    - For **lossy** JPEG, the histogram will be similar but slightly different from the original, reflecting the changes in pixel intensity values due to data loss.

In [None]:
# --- a. Matrix Difference ---
print("--- Matrix Difference Analysis ---")
original_img = cv2.imread(bmp_path)
files_to_check = {
    'PNG (min compression)': 'output_png_min.png',
    'PNG (max compression)': 'output_png_max.png',
    'JPEG (min compression)': 'output_jpeg_min.jpg',
    'JPEG (max compression)': 'output_jpeg_max.jpg'
}

for name, path in files_to_check.items():
    compressed_img = cv2.imread(path)
    
    # TODO: Calculate the sum of absolute differences between the original and compressed image
    total_diff = None
    
    print(f"Total absolute difference for {name}: {total_diff}")
    if total_diff == 0:
        print("  -> Result: Images are identical (Lossless).\n")
    else:
        print("  -> Result: Images are different (Lossy).\n")

# --- b. Histogram Comparison ---
print("\n--- Histogram Comparison ---")

# TODO: Convert original to grayscale and calculate the histogram
original_gray = None
hist_original = None

# TODO: Define compression levels to compare
compression_levels_to_compare = None

for level in compression_levels_to_compare:
    # TODO: Load compressed PNG images and convert to grayscale then calculate the histogram
    pass
    
    # TODO: Load compressed JPEG images and convert to grayscale then calculate the histogram
    pass
    
    plt.figure(figsize=(10, 6))
    plt.title(f'Histogram Comparison ({level} compression)')
    plt.xlabel('Pixel Intensity')
    plt.ylabel('Number of Pixels')

    # TODO: Plot the histograms of the original and compressed images in PNG and JPEG formats
    pass
    
    plt.legend()
    plt.xlim([0, 256])
    plt.grid(True)
    plt.show()