```markdown
# Required Libraries & Installation  

Before running the scripts, install the necessary dependencies:  

```sh
pip install openslide-python pillow opencv-python numpy
```

### **Libraries Used**  
- `openslide` → Reads whole-slide images (NDPI files)  
- `PIL (Pillow)` → Handles image processing  
- `cv2 (OpenCV)` → Resizes and stitches images  
- `numpy` → Manages image arrays efficiently  

Ensure OpenSlide is installed on your system for `.ndpi` file support. 🚀  
```

---

### **NDPI to JPEG Tile Converter** 

This script processes **NDPI whole-slide images** from histopathology scans, extracting **high-resolution image tiles** for machine learning and analysis.  

#### **Key Features**  
✅ **Processes NDPI files**: Reads large **histopathology whole-slide images** using OpenSlide.  
✅ **Extracts tiles efficiently**: Converts the images into **1400x1400 px JPEG tiles** to balance quality and storage.  
✅ **Multi-Level Support**: Extracts tiles from **Level 1** to reduce file size while maintaining detail.  
✅ **Storage Optimization**: Saves **tiles as JPEG (90% quality)** to optimize space.  

This is useful for **deep learning applications**, particularly in medical imaging research like **glioblastoma survival prediction**. 🚀

In [None]:
import os
import glob
import openslide
from PIL import Image

# Define input and output directories
input_dir = r"D:\MLPR DATASET\PKG - UPENN-GBM_v2\data"
output_root = r"D:\MLPR DATASET\converted_tiles"

# Create output root folder if it doesn't exist
os.makedirs(output_root, exist_ok=True)

# Define tile parameters
tile_size = 1400  # Adjusted for ~3MB per tile
level = 1  # Use Level 1 to reduce file size
jpeg_quality = 90  # Save as JPEG to reduce storage usage

# Get list of all NDPI files in the input directory
ndpi_files = glob.glob(os.path.join(input_dir, "*.ndpi"))
print(f"Found {len(ndpi_files)} NDPI file(s) to process.")

# Process each NDPI file
for ndpi_file in ndpi_files:
    try:
        # Extract base name and create output folder
        base_name = os.path.splitext(os.path.basename(ndpi_file))[0]
        output_folder = os.path.join(output_root, base_name)
        os.makedirs(output_folder, exist_ok=True)
        
        print(f"\nProcessing: {ndpi_file}")
        slide = openslide.OpenSlide(ndpi_file)
    except Exception as e:
        print(f"Error opening {ndpi_file}: {e}")
        continue

    # Get dimensions at the selected resolution level
    width, height = slide.level_dimensions[level]
    print(f"Slide dimensions (Level {level}): {width} x {height}")
    
    tile_count = 0
    # Loop over slide to extract tiles
    for y in range(0, height, tile_size):
        for x in range(0, width, tile_size):
            try:
                tile = slide.read_region((x * 2, y * 2), level, (tile_size, tile_size))  # Adjust for level scaling
                tile = tile.convert("RGB")  # Convert to RGB
                tile_filename = os.path.join(output_folder, f"tile_{x}_{y}.jpg")
                tile.save(tile_filename, "JPEG", quality=jpeg_quality)
                tile_count += 1
            except Exception as e:
                print(f"Error processing tile ({x},{y}) in {ndpi_file}: {e}")
    
    print(f"Finished processing {ndpi_file}. Total tiles created: {tile_count}")

Found 71 NDPI file(s) to process.

Processing: D:\MLPR DATASET\PKG - UPENN-GBM_v2\data\7316UP-109.ndpi
Slide dimensions (Level 1): 72960 x 54144
Finished processing D:\MLPR DATASET\PKG - UPENN-GBM_v2\data\7316UP-109.ndpi. Total tiles created: 2067

Processing: D:\MLPR DATASET\PKG - UPENN-GBM_v2\data\7316UP-1108.ndpi
Slide dimensions (Level 1): 55680 x 38016
Finished processing D:\MLPR DATASET\PKG - UPENN-GBM_v2\data\7316UP-1108.ndpi. Total tiles created: 1120

Processing: D:\MLPR DATASET\PKG - UPENN-GBM_v2\data\7316UP-1110.ndpi
Slide dimensions (Level 1): 57600 x 41472
Finished processing D:\MLPR DATASET\PKG - UPENN-GBM_v2\data\7316UP-1110.ndpi. Total tiles created: 1260

Processing: D:\MLPR DATASET\PKG - UPENN-GBM_v2\data\7316UP-1135.ndpi
Slide dimensions (Level 1): 61440 x 49536
Finished processing D:\MLPR DATASET\PKG - UPENN-GBM_v2\data\7316UP-1135.ndpi. Total tiles created: 1584

Processing: D:\MLPR DATASET\PKG - UPENN-GBM_v2\data\7316UP-1206.ndpi
Slide dimensions (Level 1): 48000 

### **Tile Stitching & Compression Script**  

This script reconstructs **whole-slide histopathology images** from previously extracted **image tiles** while reducing file size for efficient storage and visualization.  

#### **Key Features**  
✅ **Stitches Tiles Back**: Reads **JPEG tiles** and reconstructs the **original image layout** using tile coordinates.  
✅ **Downscaling for Compression**: Reduces resolution by a **factor of 10** to save space and improve processing efficiency.  
✅ **Optimized Storage**: Saves the stitched images as **JPEG (50% quality)** for better memory management.  
✅ **Automated Processing**: Loops through all tile folders and **recreates stitched images** automatically.  

This is especially useful for **machine learning preprocessing, pathology research, and efficient visualization** of large histology datasets. 🚀🔬

In [None]:
import os
import re
import cv2
import numpy as np

# Define input and output folders
tiles_root_folder = r"D:\MLPR DATASET\converted_tiles"
stitch_output_folder = r"D:\MLPR DATASET\Converted_Images"
os.makedirs(stitch_output_folder, exist_ok=True)

def extract_coords(filename):
    """
    Extract (x, y) coordinates from a filename formatted as 'tile_x_y.jpg'.
    """
    match = re.search(r"tile_(\d+)_(\d+)\.jpg", filename)
    if match:
        return int(match.group(1)), int(match.group(2))
    return None

def stitch_and_compress(tiles_folder, output_filename, scale=10):
    """
    Reads all JPEG tiles in a folder, downsizes each tile by the given scale,
    stitches them into one image based on their coordinates, and saves the final
    compressed image.
    """
    # List all .jpg files in the folder
    tiles = [f for f in os.listdir(tiles_folder) if f.endswith(".jpg")]
    tile_dict = {}
    tile_size = None

    # Build a dictionary mapping coordinates to tile filenames, and determine tile size
    for tile in tiles:
        coord = extract_coords(tile)
        if coord:
            tile_dict[coord] = tile
            if tile_size is None:
                img = cv2.imread(os.path.join(tiles_folder, tile))
                if img is not None:
                    tile_size = img.shape[:2]  # (height, width)
    
    if tile_size is None:
        print(f"No valid tiles found in {tiles_folder}")
        return

    # Get unique x and y coordinate values from tile filenames
    x_coords = sorted({coord[0] for coord in tile_dict.keys()})
    y_coords = sorted({coord[1] for coord in tile_dict.keys()})

    # Compute downscaled tile dimensions
    tile_height, tile_width = tile_size
    small_tile_height = tile_height // scale
    small_tile_width = tile_width // scale

    # Compute final stitched image size at downscaled resolution
    stitched_width_small = len(x_coords) * small_tile_width
    stitched_height_small = len(y_coords) * small_tile_height

    # Create a blank canvas for the downscaled stitched image
    small_grid = np.zeros((stitched_height_small, stitched_width_small, 3), dtype=np.uint8)

    # Place each tile in its correct position on the downscaled canvas
    for x in x_coords:
        for y in y_coords:
            tile_name = tile_dict.get((x, y))
            if tile_name:
                img = cv2.imread(os.path.join(tiles_folder, tile_name))
                if img is None:
                    scaled_tile = np.zeros((small_tile_height, small_tile_width, 3), dtype=np.uint8)
                else:
                    # Downscale the tile
                    scaled_tile = cv2.resize(img, (small_tile_width, small_tile_height), interpolation=cv2.INTER_AREA)
            else:
                scaled_tile = np.zeros((small_tile_height, small_tile_width, 3), dtype=np.uint8)
            # Determine the position indices in the downscaled grid
            x_idx = x_coords.index(x)
            y_idx = y_coords.index(y)
            small_grid[y_idx * small_tile_height:(y_idx + 1) * small_tile_height,
                       x_idx * small_tile_width:(x_idx + 1) * small_tile_width] = scaled_tile

    # Save the compressed stitched image (JPEG quality 50)
    cv2.imwrite(output_filename, small_grid, [cv2.IMWRITE_JPEG_QUALITY, 50])
    print(f"Compressed stitched image saved at: {output_filename}")

    # Optionally, display the compressed image (press any key to close)
    cv2.imshow("Stitched Image", small_grid)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

# Process each folder in the root folder and save the resulting stitched image in the output folder
for folder_name in os.listdir(tiles_root_folder):
    folder_path = os.path.join(tiles_root_folder, folder_name)
    if os.path.isdir(folder_path):
        output_filename = os.path.join(stitch_output_folder, f"{folder_name}_stitched_compressed.jpg")
        print(f"Processing folder: {folder_name}")
        stitch_and_compress(folder_path, output_filename, scale=10)

Processing folder: 7316UP-109
Compressed stitched image saved at: D:\MLPR DATASET\Converted_Images\7316UP-109_stitched_compressed.jpg
Processing folder: 7316UP-1108
Compressed stitched image saved at: D:\MLPR DATASET\Converted_Images\7316UP-1108_stitched_compressed.jpg
Processing folder: 7316UP-1110
Compressed stitched image saved at: D:\MLPR DATASET\Converted_Images\7316UP-1110_stitched_compressed.jpg
Processing folder: 7316UP-1135
Compressed stitched image saved at: D:\MLPR DATASET\Converted_Images\7316UP-1135_stitched_compressed.jpg
Processing folder: 7316UP-1206
Compressed stitched image saved at: D:\MLPR DATASET\Converted_Images\7316UP-1206_stitched_compressed.jpg
Processing folder: 7316UP-1220
Compressed stitched image saved at: D:\MLPR DATASET\Converted_Images\7316UP-1220_stitched_compressed.jpg
Processing folder: 7316UP-1273
Compressed stitched image saved at: D:\MLPR DATASET\Converted_Images\7316UP-1273_stitched_compressed.jpg
Processing folder: 7316UP-1282
Compressed stitched