# 1. Search Studies

- [Benchmarking the translational potential of spatial gene expression prediction from histology](https://www.nature.com/articles/s41467-025-56618-y)  
- [DeepSpot: Leveraging Spatial Context for Enhanced Spatial Transcriptomics Prediction from H&E Images](https://www.medrxiv.org/content/10.1101/2025.02.09.25321567v1)  

> These papers survey state-of-the-art techniques for predicting spatial gene expression from H&E histology images.

---

# 2. Exploratory Data Analysis (EDA)

- https://www.kaggle.com/code/tarundirector/histology-eda-spotnet-visual-spatial-dl  
- https://www.kaggle.com/code/dalloliogm/eda-exploring-cell-type-abundance  
- https://www.kaggle.com/code/prajwaldongreonly-eda-you-need-to-understand-this-data  

> Use these notebooks to visualize spot distributions, cell-type abundances, and other spatial features.

---

# 3. Data Preprocessing

## 3.1 Image Data  
1. **Stain normalization**  (One of the most important step) 
   - Essential for reducing color variability across slides.  
   - Sample patches from all images to compute stable normalization parameters.  
2. **Background masking**  
   - Grayscale: `gray = rgb2gray(image)`  
   - Threshold: `mask = gray <= mean_threshold`  
   - Closing: `closing(mask, disk(2))`  
   - Fill holes: `remove_small_holes(..., area_threshold=5000)`  
   - **Output:** clean binary tissue mask  

## 3.2 Spot Data  
1. **Expression ranking**  (One of the most important step) 
   - Replace raw counts with rank values. (I also check UMAP with them to see which combination or replacement would be better to let the ML to predict. In the end replace with rank us the most promising one.)
2. **Spot realignment**  
   - Minor effect if background spots are already filtered out.

---

# 4. Model Preparation

## 4.1 Encoder

### DeepTileEncoder  
- **Input size:** `[B, 3, 78, 78]`  
- **Purpose:** Processes the entire 78×78 tile (from spot coordinates).  
- **Residual blocks:** Stacked `ResidualBlock`s (two 3×3 Conv + BN + SiLU; 1×1 Conv shortcut if needed)  
- **Multi-scale pooling:**  
  - Global: `AdaptiveAvgPool2d(1×1)` → `[B,256]`  
  - Mid-scale: `AdaptiveAvgPool2d(3×3)` → `[B,256×3×3]`  
- **MLP (3 layers):**  


### SubtileEncoder  
- **Input size:** `[B, 9, 3, 26, 26]`  
- **Purpose:** Captures local details from 9 subtiles.  
- **Residual blocks:** Applied after reshaping to `[B×9, 3, 26, 26]`  
- **Multi-scale pooling (per patch):**  
  - Global 1×1 → `[B×9,128]`  
  - Mid 2×2 → `[B×9,128×2×2]`  
  - Large 3×3 → `[B×9,128×3×3]`  
- **Aggregation:**  
  1. Reshape to `[B, 9, total_dim]`  
  2. Average over 9 → `[B, total_dim]`  
- **MLP (2 layers):**  


### CenterSubtileEncoder  
- **Input size:** `[B, 3, 26, 26]`  
- **Purpose:** Focuses on the single central patch.  
- **Structure:** Same as SubtileEncoder but only for the center.  
- **Multi-scale pooling:** 1×1, 2×2, 3×3 → concatenate → Flatten  
- **MLP (2 layers):** 

---

## 4.2 Decoder

1. **Concatenate:** `[tile_dim] + [subtile_dim] + [subtile_dim]` → `[B, tile_dim+2×subtile_dim]`  
2. **Four-layer MLP:**  
3. **Output:** Multi-task prediction vector of length `output_dim`, which is 35.

**Loss function:** Mean Squared Error (MSE)

---

# 5. Training Strategy

## 5.1 Leave-One-Out (LOO) Cross-Validation  
- Six folds → six models.

## 5.2 Model Stacking  
1. Use each of the six models to predict on all samples → produce a dataframe of shape `(spots_n, 35 × 6)`.  
2. Train a small MLP on this dataframe → final output `(spots_n, 35)`.  
   - Learns to weight each fold’s prediction adaptively.

---

# Optimization Notes

1. **Model structure:** Residual + multi-scale design proved most stable.  
2. **Input size:** 78×78 tile + 9×26×26 subtiles + center 26×26 gave best results.  
3. **Patch size choice:** Mean spot spacing ≈ 26 px, so 3×26 covers full receptive field. That's why I use 26*3 as the global vision for input and 26 as the local vesion.

# Future Improvements

- Diversify stacked models to increase ensemble variance.  