# Shape Classifer
### **Introduction**
This script is designed to process imaging data from **MRI** or **CT** scans. It analyzes **TIFF image slices** contained in the group folder folder and extracts relevant geometric and statistical information about a segmented organ.

The script does the following calculations for each image slice:
- **Center of the Organ ("Center Glob")**:  
  Computes the center-of-mass of the non-black pixels, giving the approximate centroid location of the organ.
- **Minimum and Maximum Radius**:  
  Measures the shortest and longest distances from the center to the outermost edges of the segmented organ.  
- **Intensity Standard Deviation**:  
  Calculates how much pixel intensity varies, which may indicate texture differences or tissue heterogeneity.
- **Organ Area**:  
  Computes the number of non-black pixels and converts it into **mm²** using the known pixel-to-mm scale.  
- **Estimated Mass Calculation (Second Script Only)**:  
  - Uses per-pixel intensity to interpolate an approximate tissue density.
  - Converts the pixel area into **cm³** volume using **CT/MRI-specific slice thickness**.
  - Computes the **total mass** of the segmented organ.
  - Estimates the **confidence interval** for the total mass by propagating measurement uncertainties.

| Min/Max Radius | Sample Area |
|---------------|------------|
| ![Min/Max Radius](https://raw.githubusercontent.com/agadin/QP2_big_data_project_tools/refs/heads/main/img/min_max_radius.png) | ![Sample Area](https://raw.githubusercontent.com/agadin/QP2_big_data_project_tools/refs/heads/main/img/sample_area.png) |


---

### **Instructions**
1. **Run the script and select the root directory**  
   - This should be the folder that contains subfolders with **MRI** or **CT** in their names.
   - Each subfolder should contain **TIFF image slices** representing the segmented organ.

2. **Choose whether to process "MRI" or "CT"**  
   - The script will only process the selected scan type (either MRI or CT).
   - It will look for **four** folders that match the scan type.

3. **Wait for processing to complete**  
   - The script will analyze each image in the selected folders.
   - Results will be saved to a **CSV file** named `image_analysis_output.csv` or `mass_estimation_output.csv` in the selected directory.

4. **Review results**  
   - The CSV file contains **calculated measurements** for each image.
   - The second script will additionally display **total estimated mass per folder** and the **overall mass with uncertainty**.

After processing, the results can be used for further statistical or graphical analysis of organ properties across slices.


In [1]:
!pip install SimpleITK numpy
!pip show SimpleITK


Name: SimpleITK
Version: 2.4.1
Summary: SimpleITK is a simplified interface to the Insight Toolkit (ITK) for image registration and segmentation
Home-page: http://simpleitk.org/
Author: Insight Software Consortium
Author-email: insight-users@itk.org
License: Apache
Location: /Users/colehanan/anaconda3/lib/python3.11/site-packages
Requires: 
Required-by: 


In [4]:
import sys
import os
import glob
import csv
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from scipy.ndimage import center_of_mass
import SimpleITK as sitk

# Pre-defined pixel scale (mm per pixel)
PIXEL_SCALE = 10.0 / 17.53  # 0.571 mm per pixel

def load_tiff_with_tifffile(input_path):
    """
    Loads a TIFF image using tifffile. If the image is multichannel, it converts it to grayscale.
    If the image is not uint8 (for example, a 64-bit float image), it is normalized to the [0, 255] range.
    """
    try:
        import tifffile
        from skimage.color import rgb2gray
    except ImportError as e:
        raise ImportError("Please install tifffile and scikit-image packages.") from e

    image_array = tifffile.imread(input_path)
    
    # If the TIFF is multichannel (RGB or RGBA), convert to grayscale
    if image_array.ndim == 3:
        if image_array.shape[-1] in [3, 4]:
            image_array = rgb2gray(image_array)
            image_array = (image_array * 255).astype(np.uint8)
        else:
            # Otherwise assume the third dimension is not color channels; take the first slice.
            image_array = image_array[:, :, 0]
    
    # If image data is not uint8, normalize to 0-255.
    if image_array.dtype != np.uint8:
        min_val = np.nanmin(image_array)
        max_val = np.nanmax(image_array)
        if max_val - min_val > 0:
            image_array = (image_array - min_val) / (max_val - min_val)
        else:
            image_array = np.zeros_like(image_array)
        image_array = (image_array * 255).astype(np.uint8)
    return image_array

def process_image(image_path):
    """
    Loads an image (PNG or TIFF) and converts it to grayscale.
    If the file is a TIFF, it first tries using our tifffile-based loader (to handle 64-bit samples).
    Then it computes basic statistics on non-black pixels.
    Implements multiple fallback methods if an image fails to open.
    """
    data = None

    # For TIFF files, try using tifffile first.
    if image_path.lower().endswith('.tif'):
        try:
            data = load_tiff_with_tifffile(image_path)
        except Exception as e:
            print(f"Warning: tifffile failed to open {image_path}. Error: {e}")

    # If still None, try PIL.
    if data is None:
        try:
            image = Image.open(image_path).convert('L')
            data = np.array(image)
        except Exception as e:
            print(f"Warning: PIL failed to open {image_path}. Error: {e}")

    # Fallback 1: try OpenCV (if installed)
    if data is None:
        try:
            import cv2
            data = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
            if data is None:
                raise ValueError("cv2.imread returned None")
        except Exception as e:
            print(f"Warning: OpenCV failed to open {image_path}. Error: {e}")

    # Fallback 2: try SimpleITK.
    if data is None:
        try:
            sitk_image = sitk.ReadImage(image_path)
            data = sitk.GetArrayFromImage(sitk_image)
        except Exception as e:
            print(f"Skipping {image_path} due to conversion error.")
            print(f"Error reading image with SimpleITK: {e}")
            return None

    if data is None:
        print(f"Skipping {image_path} due to persistent read failure.")
        return None

    # --- Process Image Data ---
    total_pixels = data.size
    black_pixels = np.sum(data == 0)
    black_ratio = black_pixels / total_pixels

    if black_ratio > 0.99:
        # Skip if almost all pixels are black.
        return None

    # Get indices of non-black pixels.
    indices = np.argwhere(data != 0)
    # Compute the mean (center) of the non-black pixels (in pixel coordinates)
    center_pixels = indices.mean(axis=0)  # [row, col]
    # Convert to mm
    center = center_pixels * PIXEL_SCALE

    # Compute distances (in pixels) from the center.
    distances_pixels = np.sqrt(((indices - center_pixels) ** 2).sum(axis=1))
    min_radius = distances_pixels.min() * PIXEL_SCALE  # mm
    max_radius = distances_pixels.max() * PIXEL_SCALE  # mm

    non_black_intensities = data[data != 0]
    intensity_std = non_black_intensities.std()

    area_pixels = len(non_black_intensities)
    area = area_pixels * (PIXEL_SCALE ** 2)  # in mm²

    return {
        'center': center,          # [row (mm), col (mm)]
        'min_radius': min_radius,  # in mm
        'max_radius': max_radius,  # in mm
        'intensity_std': intensity_std,
        'area': area               # in mm²
    }

def process_folder(folder_path):
    """
    Processes all TIFF images in a folder and returns a list of result dictionaries.
    Each dictionary includes the image filename.
    """
    # Change extension to *.tif to process TIFF files.
    tif_files = sorted(glob.glob(os.path.join(folder_path, '*.tif')))
    results = []
    for idx, tif_file in enumerate(tif_files):
        res = process_image(tif_file)
        if res is not None:
            res['filename'] = os.path.basename(tif_file)
            res['index'] = idx
            results.append(res)
    return results

def write_csv(output_path, folder_results):
    """
    Writes the results to a CSV file at output_path.
    The CSV includes columns: Folder, Image Index, Filename, Center_Y (mm), Center_X (mm),
    Min_Radius (mm), Max_Radius (mm), Intensity_STD, Area (mm²).
    """
    header = ['Folder', 'Image Index', 'Filename', 'Center_Y (mm)', 'Center_X (mm)',
              'Min_Radius (mm)', 'Max_Radius (mm)', 'Intensity_STD', 'Area (mm²)']
    with open(output_path, mode='w', newline='') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(header)
        for folder_name, results in folder_results.items():
            for res in results:
                row = [
                    folder_name,
                    res['index'],
                    res['filename'],
                    f"{res['center'][0]:.2f}",
                    f"{res['center'][1]:.2f}",
                    f"{res['min_radius']:.2f}",
                    f"{res['max_radius']:.2f}",
                    f"{res['intensity_std']:.2f}",
                    f"{res['area']:.2f}"
                ]
                writer.writerow(row)
    print(f"CSV file saved to: {output_path}")

def get_target_folders(parent_path, scan_type):
    """
    Returns a list of folders to process.
    If the selected folder itself contains TIFF files, it is returned as the only target folder.
    Otherwise, it searches for subdirectories whose names contain the scan type.
    """
    tif_files = glob.glob(os.path.join(parent_path, '*.tif'))
    if tif_files:
        return [parent_path]

    target_folders = []
    for d in os.listdir(parent_path):
        d_full = os.path.join(parent_path, d)
        if os.path.isdir(d_full) and scan_type.lower() in d.lower():
            target_folders.append(d_full)
    return target_folders

def main():
    import tkinter as tk
    from tkinter import filedialog
    root = tk.Tk()
    root.withdraw()

    # Ask the user to select a directory.
    path = filedialog.askdirectory(title='Select Directory containing TIFF files or subfolders')
    if not path:
        print("No directory selected.")
        return

    scan_type = input("Enter scan type to process (CT or MRI): ").strip().upper()
    if scan_type not in ["CT", "MRI"]:
        print("Invalid input. Please enter 'CT' or 'MRI'.")
        return

    target_folders = get_target_folders(path, scan_type)
    if not target_folders:
        print(f"No target folders found containing '{scan_type}', and no TIFF files found in the selected folder.")
        return

    folder_results = {}
    for folder in target_folders:
        folder_name = os.path.basename(folder)
        print(f"Processing folder: {folder_name}")
        results = process_folder(folder)
        if results:
            folder_results[folder_name] = results
        else:
            print(f"  No valid images found in {folder_name}.")

    if not folder_results:
        print("No valid images found in any target folders.")
        return

    output_csv = os.path.join(path, "image_analysis_output.csv")
    write_csv(output_csv, folder_results)

if __name__ == '__main__':
    main()

Enter scan type to process (CT or MRI): MRI
Processing folder: MRI_2
CSV file saved to: /Users/colehanan/Desktop/group_10/MRI_2/image_analysis_output.csv


# Estimate Organ Mass
### Input Approximate Densities for Tissue Types

In this cell, you can set the approximate densities (in g/cm³) for:
- **Bright spots:** regions that are brighter on CT/MRI (e.g., calcifications or bone)
- **Dark spots:** regions that are darker (e.g., soft tissue)

Look up the densities of what you beleive the dark/light spots are and include those in your report if you report the estimate mass. Make sure to look up values relative to CT and MRI. A linear gradient is used: intensity 1 corresponds approximately to `density_dark` and intensity 255 to `density_bright`.

In [None]:
# Set the approximate densities (in g/cm³)
density_bright = 1 # Change None to your value (somehwere around 1 for MRI)
density_dark   = 0.3  # Change None to your value (somehwere around 0.3 for example for CT)

### Calculate Estimated Mass Based on Image Analysis and Tissue Densities

In this cell:
1. Select the root directory that contains subfolders with "CT" or "MRI" in their names.
2. For each TIFF image in those folders, load the image in grayscale.
3. For each pixel:
    - If the pixel is black (intensity = 0), its mass contribution is 0.
    - Otherwise, we compute a linear interpolation:
        factor = (I - 1) / 254.0   (maps intensity from 1–255 to 0–1) 
        pixel_density = density_dark + factor * (density_bright - density_dark)
4. Multiply the per-pixel density by the voxel volume to get the mass contribution.
5. Sum the contributions across the image, and then across all images.

**Imaging Parameters:**  
- **pixel_spacing:** The size of one pixel in mm.  
- **slice_thickness:** The thickness (distance between slices) in mm.

The results (mass per image and total mass) are saved to a CSV file.

In [5]:
import os
import glob
import numpy as np
from PIL import Image
import pandas as pd
import tkinter as tk
from tkinter import filedialog
import math

# For TIFF loading, we use tifffile and skimage’s rgb2gray.
try:
    import tifffile
    from skimage.color import rgb2gray
except ImportError:
    raise ImportError("Please install the 'tifffile' and 'scikit-image' packages to load TIFF files.")

# Imaging parameters
PIXEL_SCALE = 10.0 / 17.53  # mm per pixel

def load_tiff_with_tifffile(input_path):
    """
    Loads a TIFF image using tifffile.
    If the image is multichannel (RGB/RGBA), converts it to grayscale.
    If the image is not uint8 (e.g. 64-bit float), it is normalized to 0-255.
    Returns an 8-bit grayscale numpy array.
    """
    image_array = tifffile.imread(input_path)
    
    # If multichannel, convert to grayscale.
    if image_array.ndim == 3:
        if image_array.shape[-1] in [3, 4]:
            image_array = rgb2gray(image_array)
            image_array = (image_array * 255).astype(np.uint8)
        else:
            # Otherwise assume the third dimension is not color channels.
            image_array = image_array[:, :, 0]
    
    # Normalize to uint8 if needed.
    if image_array.dtype != np.uint8:
        min_val = np.nanmin(image_array)
        max_val = np.nanmax(image_array)
        if max_val - min_val > 0:
            image_array = (image_array - min_val) / (max_val - min_val)
        else:
            image_array = np.zeros_like(image_array)
        image_array = (image_array * 255).astype(np.uint8)
    return image_array

def compute_image_mass_and_error(image_path, density_dark, density_bright, pixel_spacing, slice_thickness, delta_p, delta_t, delta_density):
    """
    Loads an image slice (PNG or TIFF), converts it to grayscale, and computes its mass
    contribution and propagated error.
    
    For TIFF files (which may have 64-bit samples), the function uses tifffile to load the image.
    
    Parameters:
      image_path      : Path to the image (PNG or TIFF).
      density_dark    : Density for the darkest (non-zero) pixels (g/cm³).
      density_bright  : Density for the brightest pixels (g/cm³).
      pixel_spacing   : Pixel spacing (mm per pixel).
      slice_thickness : Slice thickness (mm).
      delta_p         : Uncertainty in pixel spacing (mm).
      delta_t         : Uncertainty in slice thickness (mm).
      delta_density   : Uncertainty in density (g/cm³).
      
    Returns:
      A tuple (mass, mass_error) in grams.
    """
    data = None

    # Use tifffile loader for TIFF files.
    if image_path.lower().endswith('.tif'):
        try:
            data = load_tiff_with_tifffile(image_path)
        except Exception as e:
            print(f"Warning: tifffile failed to open {image_path}. Trying PIL. Error: {e}")

    # Otherwise (or if TIFF loading failed), try using PIL.
    if data is None:
        try:
            image = Image.open(image_path).convert('L')
            data = np.array(image, dtype=np.float32)
        except Exception as e:
            print(f"Warning: PIL failed to open {image_path}. Error: {e}")

    # If still no data, try OpenCV.
    if data is None:
        try:
            import cv2
            data = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
            if data is None:
                raise ValueError("cv2.imread returned None")
            data = data.astype(np.float32)
        except Exception as e:
            print(f"Warning: OpenCV failed to open {image_path}. Error: {e}")

    # Finally, try SimpleITK.
    if data is None:
        try:
            import SimpleITK as sitk
            sitk_image = sitk.ReadImage(image_path)
            data = sitk.GetArrayFromImage(sitk_image).astype(np.float32)
        except Exception as e:
            print(f"Skipping {image_path} due to conversion error.")
            print(f"Error reading image with SimpleITK: {e}")
            return None

    if data is None:
        print(f"Skipping {image_path} due to persistent read failure.")
        return None

    # Process image data.
    total_pixels = data.size
    black_pixels = np.sum(data == 0)
    black_ratio = black_pixels / total_pixels

    # Skip if almost entirely black.
    if black_ratio > 0.99:
        return 0.0, 0.0

    # Get indices of non-black pixels.
    indices = np.argwhere(data != 0)
    # Compute the mean (center) in pixel coordinates.
    center_pixels = indices.mean(axis=0)
    # Convert center to mm.
    center = center_pixels * PIXEL_SCALE

    # Compute distances from center (in pixels).
    distances_pixels = np.sqrt(((indices - center_pixels) ** 2).sum(axis=1))
    min_radius = distances_pixels.min() * PIXEL_SCALE  # mm
    max_radius = distances_pixels.max() * PIXEL_SCALE  # mm

    # Compute standard deviation of non-black intensities.
    non_black_intensities = data[data != 0]
    intensity_std = non_black_intensities.std()

    # Compute area (number of non-black pixels) in mm².
    area_pixels = len(non_black_intensities)
    area = area_pixels * (pixel_spacing**2) / 1000.0  # Convert mm³ to cm³ for voxel volume later

    # Compute per-pixel effective density via linear interpolation.
    # Map intensity range: 1->0 and 255->1.
    factor = (data[mask := (data > 0)] - 1) / 254.0
    pixel_densities = density_dark + factor * (density_bright - density_dark)
    
    # Compute voxel volume (mm³ converted to cm³):
    voxel_volume_cm3 = (pixel_spacing**2 * slice_thickness) / 1000.0

    # Total mass for the slice:
    mass = np.sum(pixel_densities) * voxel_volume_cm3  # in grams

    # --- Error Propagation ---
    # Voxel volume V = (pixel_spacing² * slice_thickness)/1000.
    # Relative error: δV/V = sqrt((2*δp/p)² + (δt/t)²).
    rel_err_volume = math.sqrt((2 * delta_p / pixel_spacing)**2 + (delta_t / slice_thickness)**2)
    
    # For effective density d, approximate relative error using the average density.
    avg_density = (density_dark + density_bright) / 2.0
    rel_err_density = delta_density / avg_density
    
    # Combine relative errors.
    rel_err = math.sqrt(rel_err_volume**2 + rel_err_density**2)
    mass_error = mass * rel_err

    return mass, mass_error

def get_target_folders(parent_path, scan_type):
    """
    Returns a list of folders to process.
    If the selected folder contains PNG files directly, it is returned as the only target folder.
    Otherwise, it searches for subdirectories whose names (case-insensitive) contain the scan type.
    """
    png_files = glob.glob(os.path.join(parent_path, '*.png'))
    if png_files:
        return [parent_path]

    target_folders = []
    for d in os.listdir(parent_path):
        d_full = os.path.join(parent_path, d)
        if os.path.isdir(d_full) and scan_type.lower() in d.lower():
            target_folders.append(d_full)
    return target_folders

# --- User selects the root directory ---
root = tk.Tk()
root.withdraw()
selected_dir = filedialog.askdirectory(title='Select Directory containing PNG/TIFF files or subfolders')
if not selected_dir:
    raise Exception("No directory selected.")

# Ask the user whether to process CT or MRI folders.
scan_type = input("Enter scan type to process (CT or MRI): ").strip().upper()
if scan_type not in ["CT", "MRI"]:
    raise Exception("Invalid input. Please enter 'CT' or 'MRI'.")

target_folders = get_target_folders(selected_dir, scan_type)
if not target_folders:
    raise Exception(f"No target folders found containing '{scan_type}' or PNG files in the selected directory.")

# Define imaging and density parameters.
pixel_spacing = PIXEL_SCALE  # mm per pixel (10/17.53)
delta_p = 0.01               # uncertainty in pixel spacing (mm)
delta_density = 0.02         # uncertainty in density (g/cm³)

# Set density values (example).
density_dark = 0.3   # g/cm³ for darkest non-zero pixels
density_bright = 1.0 # g/cm³ for brightest pixels

mass_records = []
total_mass_squared_error = 0.0

# Process each target folder.
for folder in target_folders:
    folder_name = os.path.basename(folder)
    # Set slice thickness and its uncertainty based on folder name.
    if 'CT' in folder_name.upper():
        slice_thickness = 4.0
        delta_t = 0.2
    elif 'MRI' in folder_name.upper():
        slice_thickness = 1.0
        delta_t = 0.1
    else:
        slice_thickness = 1.0
        delta_t = 0.1

    # Look for PNG and TIFF files.
    # (You may adjust the glob pattern if you wish to process TIFFs exclusively.)
    image_files = sorted(glob.glob(os.path.join(folder, '*.png')) + glob.glob(os.path.join(folder, '*.tif')))
    if not image_files:
        print(f"No PNG or TIFF files found in {folder_name}.")
        continue

    for idx, image_file in enumerate(image_files):
        result = compute_image_mass_and_error(
            image_file,
            density_dark,
            density_bright,
            pixel_spacing,
            slice_thickness,
            delta_p,
            delta_t,
            delta_density
        )
        if result is None:
            print(f"Skipping {image_file} due to conversion error.")
            continue
        mass, mass_error = result
        mass_records.append({
            'Folder': folder_name,
            'Image Index': idx,
            'Filename': os.path.basename(image_file),
            'Slice Thickness (mm)': slice_thickness,
            'Mass (g)': mass,
            'Mass Error (g)': mass_error
        })
        total_mass_squared_error += mass_error**2

# Create a DataFrame with the results.
mass_df = pd.DataFrame(mass_records)

# Compute and print the total estimated mass per folder.
print("\nTotal Estimated Mass per Folder:")
folder_mass_totals = mass_df.groupby('Folder')['Mass (g)'].sum()
for folder, total_mass in folder_mass_totals.items():
    print(f"  {folder}: {total_mass:.3f} g")

# Compute overall mass and combined error.
total_mass = mass_df['Mass (g)'].sum()
average_mass = total_mass / len(target_folders)
total_mass_error = math.sqrt(total_mass_squared_error)

print(f"\nOverall Average Estimated Mass: {average_mass:.3f} g ± {total_mass_error:.3f} g")

# Save the mass estimation details to a CSV file.
mass_csv = os.path.join(selected_dir, "mass_estimation_output.csv")
mass_df.to_csv(mass_csv, index=False)
print(f"Mass estimation details saved to: {mass_csv}")

Enter scan type to process (CT or MRI): MRI

Total Estimated Mass per Folder:
  MRI_4_converted: 46.581 g

Overall Average Estimated Mass: 46.581 g ± 0.632 g
Mass estimation details saved to: /Users/colehanan/Desktop/group_10/MRI_4/mass_estimation_output.csv
