<h1 style="text-align:center;">Demo for PathRWKV</h1>

<div align="center">

![Python](https://img.shields.io/badge/Python-3.12.12-3776AB?style=for-the-badge&logo=python&logoColor=white)
![PyTorch](https://img.shields.io/badge/PyTorch-2.9.1-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white)
![CUDA](https://img.shields.io/badge/CUDA-12.8-76B900?style=for-the-badge&logo=nvidia&logoColor=white)

</div>

---
## Summary

This notebook demonstrated the complete PathRWKV pipeline:

| Step | Description | Output |
|------|-------------|--------|
| Preprocessing | WSI → Tiles | `.jpeg` images + `dataset.csv` |
| Embedding | Tiles → Features | `.safetensors` files |
| Training | Features → Model | Checkpoints + TensorBoard logs |
| Testing | Model → Metrics | `results.json` |
| Visualization | WSI → Heatmap | `.jpeg image` |

## Google Colab Setup

Run this cell only if you are using Google Colab.

In [None]:
import os
import sys

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("Running in Google Colab")
    %cd /content
    !git clone https://github.com/Puzzle-Logic/PathRWKV.git
    %cd PathRWKV

    !apt-get update && apt-get install -y openslide-tools
    !pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
    !pip install pytorch-lightning torchmetrics timm monai polars pyyaml safetensors ninja
    !pip install scikit-survival openslide-python pillow tqdm scipy tensorboard matplotlib awscli
    print("Colab setup complete")
else:
    print("Running locally")

## Import Libraries

In [None]:
import sys
import torch
from PIL import Image
from pathlib import Path
import matplotlib.pyplot as plt

if IN_COLAB:
    PROJECT_ROOT = Path('/content/PathRWKV')
else:
    PROJECT_ROOT = Path('.').resolve()

sys.path.insert(0, str(PROJECT_ROOT))

print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Project Root: {PROJECT_ROOT}")

---
## Step 1: WSI Preparation

This step downloads a test sample (test_001.tif) from CAMELYON16 dataset on AWS3, and converts it into small tiles for feature embedding.

### Key Parameters:
- `tile_size`: Size of each tile (default: 224×224 pixels)
- `target_mpp`: Microns per pixel (default: 0.5 for 20x magnification)
- `t_occupancy`: Minimum tissue occupancy threshold (default: 0.1)

In [None]:
from pathlib import Path

wsi_dir = Path("/content/PathRWKV/CAMELYON16/images")
wsi_dir.mkdir(exist_ok=True, parents=True)
target_file = wsi_dir / "test_001.tif"

if not target_file.exists():
    print(f"Downloading {target_file.name} from AWS Open Data (CAMELYON16)...")
    !aws s3 cp s3://camelyon-dataset/CAMELYON16/images/test_001.tif {wsi_dir} --no-sign-request
    print(f"Download complete! File download to {wsi_dir}")
else:
    print(f"{target_file.name} already exists.")

In [None]:
# Preprocessing WSI sample. This will take 2-3 minutes
!python UpStream/preprocess.py \
  --input_dir "/content/PathRWKV/CAMELYON16/images" \
  --output_dir "/content/PathRWKV/CAMELYON16/tiles" \
  --gen_thumbnails

### Visualize Tiling Results

In [None]:
import math
from pathlib import Path

slide_dir = Path("/content/PathRWKV/CAMELYON16/tiles/thumbnails")
tile_files = sorted(list(slide_dir.glob('*ROI*.jpeg')))
print(f"Found {len(tile_files)} ROI tiles.")

n_cols = 5
n_rows = math.ceil(len(tile_files) / n_cols)
fig, axes = plt.subplots(n_rows, n_cols, figsize=(16, 4 * n_rows))
axes = axes.flatten()

for idx, ax in enumerate(axes):
    if idx < len(tile_files):
        tile_path = tile_files[idx]
        img = Image.open(tile_path)
        ax.imshow(img)
        filename = tile_path.name
        short_title = filename.split("original_")[-1]
        ax.set_title(short_title, fontsize=10)

    ax.axis('off')

plt.suptitle(f'All ROI Tiles from {slide_dir.name}', fontsize=16, fontweight='bold', y=1.005)
plt.tight_layout()
plt.show()

---
## Step 2: Feature Embedding

Directly download embedded features from Hugging Face.

In [None]:
from huggingface_hub import hf_hub_download

repo_id = "PuzzleLogic/CAMELYON16_Embeddings"
repo_type = "dataset"
filename = "tiles-embeddings/test_001.safetensors"
target_base_dir = "/content/PathRWKV/CAMELYON16"
file_path = hf_hub_download(
    repo_id=repo_id,
    filename=filename,
    repo_type=repo_type,
    local_dir=target_base_dir,
    local_dir_use_symlinks=False,
)
print(f"File successfully downloaded to:{file_path}")


Or extract features from tiles using Prov-GigaPath.

In [None]:
# Embedding the tiles
# ⚠️️ This requires access to the model and takes up to 5 minutes on Colab
os.environ['HF_TOKEN'] = "YOUR_HF_TOKEN_HERE"
!python UpStream/embed.py \
  --input_dir "/content/PathRWKV/CAMELYON16/tiles" \
  --output_dir "/content/PathRWKV/CAMELYON16/tiles-embeddings" \
  --model_name "hf_hub:prov-gigapath/prov-gigapath" \
  --batch_size 16 \
  --num_workers 2 \
  --disable_compile \
  --disable_bf16


### Inspect Embedding Results

In [None]:
from safetensors.torch import safe_open

path = Path("/content/PathRWKV/CAMELYON16/tiles-embeddings/test_001.safetensors")
with safe_open(path, framework='pt', device='cpu') as f:
    features = f.get_tensor('features')
    coords = f.get_tensor('coords_yx')

print(f".   File: {path.name}")
print(f"  • Number of tiles: {features.shape[0]}")
print(f"  • Feature dimension: {features.shape[1]}")
print(f"  • Coordinates shape: {coords.shape}")
print(f"  • Feature stats: mean={features.mean():.4f}, std={features.std():.4f}")

# Visualize coordinate distribution
fig, axes = plt.subplots(1, 2, figsize=(10, 4))

axes[0].scatter(coords[:, 1], coords[:, 0], alpha=0.5, s=5)
axes[0].set_xlabel('X coordinate')
axes[0].set_ylabel('Y coordinate')
axes[0].set_title('Tile Positions')
axes[0].invert_yaxis()

axes[1].hist(features.flatten().numpy(), bins=100, color="steelblue", edgecolor="none")
axes[1].set_xlabel("Feature Value")
axes[1].set_ylabel("Count")
axes[1].set_title("Feature Value Distribution")
axes[1].set_yscale('log')

plt.tight_layout()
plt.show()

---
## Step 3: Training and Testing PathRWKV

Training and testing PathRWKV. To be noticed, this is not available on free Colab account due to RAM constraints!

In [None]:
# ⚠️️ This is not available on free Colab account due to RAM constraints!
from huggingface_hub import snapshot_download
local_path = snapshot_download(
    repo_id="PuzzleLogic/CAMELYON16_Embeddings",
    repo_type="dataset",
    local_dir="/content/PathRWKV/CAMELYON16",
    allow_patterns="tiles-embeddings/*",
    local_dir_use_symlinks=False,
    resume_download=True
)

!python DownStream/main.py \
  --data_path "/content/PathRWKV/CAMELYON16/tiles-embeddings" \
  --lr 1e-03