# HistoSweep: Full sweep of H&E images to identify good quality super-pixels for downstream ST analysis

### Compatible file types: .jpg, .png, .tif, .svs, .ome.tif, .ndpi

- `he-raw.*` – Original H&E image (not scaled or preprocessed) 
- `he-scaled.*` – Scaled H&E image  
- `he.*` – Final preprocessed image


##  *** Please enter input parameters ***

In [1]:
# ===== USER-DEFINED INPUT PARAMETERS =====

# Path prefix to your H&E image folder
HE_prefix = 'HE/demo/'

# Directory for output 
output_directory = f"{HE_prefix}/HistoSweep_Output" #Folder for HistoSweep output/results

# Flag for whether to rescale the image 
need_scaling_flag = True  # True if image resolution ≠ 0.5µm (or desired size) per pixel

# Flag for whether to preprocess the image 
need_preprocessing_flag = True  # True if image dimensions are not divisible by patch_size

# The pixel size (in microns) of the raw H&E image 
pixel_size_raw = 0.5  # Typically provided by the scanner/metadata (e.g., 0.25 µm/pixel for 40x)

# Parameter used determine amount of density filtering (e.g artifacts) (consider lowering for VERY large images)
density_thresh = 100 # Typically 100 works well, but may need to increase if artifacts are not being effectively removed (e.g. fiducial marker)

# Flag for whether to clean background (i.e. remove isolated debris and small specs outside tissue)
clean_background_flag = True # Set to False if you want to preserve fibrous regions that are otherwise being incorrectly filtered out

# Parameter used to remove isolated debris and small specs outside tissue
min_size = 10 # Decrease if there are many fibrous areas (e.g. adipose) in the tissue that you wish to retain (e.g. 5), increase if lots of/larger debris you wish to remove (e.g.50)


# ===== Additional PARAMETERS (typically do not need to change) =====

# Size of one square patch (superpixel) used throughout processing
patch_size = 16  # 16x16 pixels → typically 8µm if pixel_size = 0.5

# Target pixel size (in microns)
pixel_size = 0.5  # Final desired resolution; keep as 0.5 µm for standardization


Please store your raw histology image as 'he-raw.jpg', scaled image as 'he-scaled.jpg', and final preprocessed image as 'he.jpg' (if using the scaling and preprocessing function provided, this will automatically be done)

## Load in packages and basic functions

In [2]:
%load_ext autoreload
%autoreload 2

import os, subprocess
from utils import load_image, get_image_filename, find_he
from saveParameters import saveParams
#from computeMetrics import compute_metrics
from computeMetrics import compute_metrics_fast
from densityFiltering import compute_low_density_mask
#from textureAnalysis import run_texture_analysis
from textureAnalysis import run_texture_analysis_optimized
from ratioFiltering import run_ratio_filtering
from generateMask import generate_final_mask
from additionalPlots import generate_additionalPlots

## Scale and preprocess H&E image 
Preprocess the image: <br>
(1) Scale so that each pixel is size 0.5 µm (he-scaled.jpg)<br>
(2) Pad the scaled image so its height and width are divisible by patch_size (he.jpg)<br>

In [3]:
# rescale and preprocess image 
if need_scaling_flag:
    subprocess.run([
        "python", "rescale.py", "--image",
        "--prefix", HE_prefix,
        "--pixelSizeRaw", str(pixel_size_raw),
        "--pixelSize", str(pixel_size),
        "--outputDir", output_directory,
    ], check=True)
else:
    print(" *Skipping rescale (need_scaling_flag=False).")

if need_preprocessing_flag:
    subprocess.run([
        "python", "preprocess.py", "--image",
        "--prefix", HE_prefix,
        "--patchSize", str(patch_size),
        "--outputDir", output_directory,
        "--pixelSizeRaw", str(pixel_size_raw),
        "--pixelSize", str(pixel_size),
    ], check=True)
else:
    print(" *Skipping preprocess (need_preprocessing_flag=False).")


Looking for files with prefix: HE/demo/he-raw
All matching files: ['HE/demo/he-raw.jpg']
Checking: HE/demo/he-raw.jpg
✅ Found: HE/demo/he-raw.jpg

Loading image from: HE/demo/he-raw.jpg
read image with pyvips.....
Image Width: 8448, Image Height: 18688, Image Bands: 3

 ***Rescaling image (scale: 1.000)...***
Rescaling took 0 sec
Image size: 8448 x 18688
Image size: 18688x8448
✅ Saved as TIFF: HE/demo//HistoSweep_Output/he-scaled.tiff

 ***Preprocessing image...***

Looking for files with prefix: HE/demo//HistoSweep_Output/he-scaled
All matching files: ['HE/demo//HistoSweep_Output/he-scaled.tiff']
✅ Found: HE/demo//HistoSweep_Output/he-scaled.tiff
Loading image from: HE/demo//HistoSweep_Output/he-scaled.tiff
read image with pyvips.....
Image Width: 8448, Image Height: 18688, Image Bands: 3
Saving preprocessed image to: HE/demo//HistoSweep_Output/he.tiff
Image size: 18688x8448
✅ Saved as TIFF: HE/demo//HistoSweep_Output/he.tiff


In [4]:
# ----- Load Final Preprocessed Image  -----

he_img = (find_he(output_directory, 'he') or
          find_he(HE_prefix, 'he'))

if he_img is None:
    raise FileNotFoundError(
        f"No usable H&E image found. Checked for 'he.*' in:\n"
        f"  {output_directory}\n  {HE_prefix}"
    )

print(f"✅ Using image: {he_img}")
image = load_image(he_img)
#print(f" Loaded image shape: {image.shape}")

✅ Using image: HE/demo//HistoSweep_Output/he.tiff

filename = HE/demo//HistoSweep_Output/he.tiff,ext=.tiff
this is tif|tiff file......
Image loaded from HE/demo//HistoSweep_Output/he.tiff


## Patchify image into super-pixels and compute metrics

In [5]:
if not os.path.exists(f"{output_directory}"):
    os.makedirs(f"{output_directory}")

In [6]:
saveParams(HE_prefix, output_directory, need_scaling_flag, need_preprocessing_flag, pixel_size_raw,density_thresh,clean_background_flag,min_size,patch_size,pixel_size)


✅ Parameters saved to: HE/demo//HistoSweep_Output/HistoSweep_parameters.txt


In [7]:
he_std_norm_image_, he_std_image_, z_v_norm_image_, z_v_image_, ratio_norm_, ratio_norm_image_ = compute_metrics_fast(image, patch_size=patch_size)



  Chunk 1/4: 200,000 patches

  Chunk 2/4: 200,000 patches

  Chunk 3/4: 200,000 patches

  Chunk 4/4: 16,704 patches
[compute_metrics_fast] Current memory: 0.0243 GB; Peak memory: 0.4670 GB
compute_metrics_fast costs: 3.3816s


## Define threshold critera:

### (1) Low density superpixels 

In [8]:
# identify low density superpixels
mask1_lowdensity = compute_low_density_mask(z_v_image_, he_std_image_, ratio_norm_, density_thresh=density_thresh)


[compute_low_density_mask] Current memory: 0.0006 GB; Peak memory: 0.0280 GB


In [9]:
print('Total selected for density filtering: ', mask1_lowdensity.sum())


Total selected for density filtering:  8079


In [10]:
# perform texture analysis 
mask1_lowdensity_update = run_texture_analysis_optimized(prefix=HE_prefix, image=image, tissue_mask=mask1_lowdensity, output_dir=output_directory, patch_size=patch_size, glcm_levels=64)


Converting image to grayscale...
**************************************************
RGB-gray converts timt: 0.71 seconds
**************************************************
Running ultra-fast texture analysis...
[compute_all_texture_features_perfect] Current memory: 0.0000 GB; Peak memory: 0.0040 GB
[compute_all_texture_features_perfect] Current memory: 0.0001 GB; Peak memory: 0.0002 GB

 Entropy map saved as 'glcm_entropy_map_colored.png'

 Energy map saved as 'glcm_energy_map_colored.png'

 Homogeneity map saved as 'glcm_homogeneity_map_colored.png'
Running GMM clustering...

=== GLCM Metric Means ===
   homogeneity    energy   entropy
0     0.586861  0.403446  0.552667
1     0.253880  0.055033  0.871688
2     0.457087  0.197041  0.700667
3     0.357534  0.106133  0.796654


### (2) Low ratio superpixels

In [11]:
# identify low ratio superpixels
mask2_lowratio, otsu_thresh = run_ratio_filtering(ratio_norm_, mask1_lowdensity_update)

[run_ratio_filtering] Current memory: 0.0006 GB; Peak memory: 0.0092 GB


## Generate final selection of superpixels

In [12]:
generate_final_mask(prefix=HE_prefix, he=image, 
                    mask1_updated = mask1_lowdensity_update, mask2 = mask2_lowratio, output_dir = output_directory,
                    clean_background = clean_background_flag, 
                    super_pixel_size=patch_size, minSize = min_size)



 np.sum(mask_final): 21695873280
np.sum(cleaned): 84749505
✅ Final masks saved in: HE/demo//HistoSweep_Output
[generate_final_mask] Current memory: 0.0000 GB; Peak memory: 0.3004 GB


## Generate additional plots
These plots are optional and are intended to provide further insights into the filtering process. Generating them  takes a bit of additional time. This step is not required for the core HistoSweep method.

In [13]:
generate_additionalPlots(prefix = HE_prefix, 
                         he = image,
                         he_std_image = he_std_image_,
                         he_std_norm_image = he_std_norm_image_, 
                         z_v_image = z_v_image_,
                         z_v_norm_image = z_v_norm_image_, 
                         ratio_norm = ratio_norm_,
                         ratio_norm_image = ratio_norm_image_,
                         mask1 = mask1_lowdensity,
                         mask1_updated = mask1_lowdensity_update,
                         mask2 = mask2_lowratio,
                         super_pixel_size=patch_size, output_dir = output_directory,
                         generate_masked_plots = False)

✅ Created folders:
- HE/demo//HistoSweep_Output/AdditionalPlots/maskedHE_plots
- HE/demo//HistoSweep_Output/AdditionalPlots/filtering_plots
- HE/demo//HistoSweep_Output/AdditionalPlots/textureAnalysis_plots/masks
