---
# **How to Run Notebook**
---


1. Set up `virtual conda environment` if you have not already done so. Uncomment to run.

In [None]:
# !conda create conda create ../environments/environment.yml --no-builds
# !conda activate TILSEG_PROJECT2024
# !cd TILSEG_PROJECT2024

2. Update the `respository_path` variable to use the 'TILSEG_PROJECT2024' Cloned Github Folder path. 
This path is needed to access the example files used in the notebook.

In [1]:
import os
directory_path = os.getcwd()
repository_path = os.path.dirname(directory_path)

3. Run the `Initalization Block`. This is necessary as Python adds a directory for this notebook to the list of locations where modules can be searched from when importing.

In [2]:
import sys
sys.path.append(repository_path)

4. Import the needed modules in the `Import Block`

In [3]:
# External library imports
import matplotlib.pyplot as plt
import numpy as np

5. Data download block. Used to access data from google drive.

In [1]:
!pip install gdown #only need to run once on laptop

Collecting gdown
  Downloading gdown-5.1.0-py3-none-any.whl.metadata (5.7 kB)
Collecting beautifulsoup4 (from gdown)
  Downloading beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting filelock (from gdown)
  Downloading filelock-3.13.1-py3-none-any.whl.metadata (2.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4->gdown)
  Downloading soupsieve-2.5-py3-none-any.whl.metadata (4.7 kB)
Downloading gdown-5.1.0-py3-none-any.whl (17 kB)
Downloading beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.9/147.9 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading filelock-3.13.1-py3-none-any.whl (11 kB)
Downloading soupsieve-2.5-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, filelock, beautifulsoup4, gdown
Successfully installed beautifulsoup4-4.12.3 filelock-3.13.1 gdown-5.1.0 soupsieve-2.5


### **Current repository contains the following file strucutre of the Example Folder:**
#### These files will be used to walk through an example of using TILSEG_PROJECT2024 in analysis.
<img src= "Notebook_Images/Image_9.png" style="width: 600px;">

---
# **Core Features Overview**
---

### The TILSEG_PROJECT2024 software package is intended for use in breast cancer slide segmentation analysis, aimed at accelerating breast cancer detection. This package consists for 4 main components:

## From 2023 Capstone (OLD COMPONENTS):
### 1. <u>Preprocessing (preprocessing.py):</u>
#### creates superpatch .tif file from cropped 3000 by 4000 pixel patches from a stained breast cancer slide. The original image is segmented into all possible patches where a select number (default: 6) are chosen that represent different sections of grays scales from a guassian distribution.

Sub-Components:
* test

<span style="background-color: rgba(255, 255, 0, 0.5)">UPDATES/BUG FIXES FROM 2024 PROJECT: </span>
* Changed the os handling to read in the full filepaths of each .svs image since the original code was using only the filename (this led to filepath exception errors)

<img src= "Notebook_Images/image_7.png" style="width: 600px;">
<img src= "Notebook_Images/image_8.png" style="width: 597px;">

### 2. <u>Image Segmentation (seg.py >> def segment_TILs) </u>
#### Applies a clustering model (e.g. KMeans) on a superpath and applies the model to a folder of patches to generate the following files: TILs overlayed on the original H&E patch, binary segmentation masks of each cluster, individual clusters overlayed on the original patch, image of all the clusters, and a CSV file containing countour information of each TIL segmented from the patch. Currently accepts fitted and non-fitted 'KMeans', 'DBSCAN', 'OPTICS', 'BIRCH' algorithms.

Sub-Components:
* Test

<span style="background-color: rgba(255, 255, 0, 0.5)">UPDATES/BUG FIXES FROM 2024 PROJECT: </span>
* def segment_TILS was updated to take in a `multiple_images` flag to be able to be able to fit a kmeans model to a patch rather than just a superpatch

* def immune_cluster_analyzer (def segment_TILS << def image_postprocessing << def immune_cluster_analyzer) was updated to return the `cluster mask` of the highest TIL contour count to be able to do further segmenetation using dbscan (explained in next section)

* def draw_til_images (def segment_TILS << def image_postprocessing << def draw_til_images) bug for use of wrong array type was fixed

* def segment_TILS was fixed to only check for only .tif in a patches folder (avoid errors of hidden .ipynb or files)

## From 2024 Software Project (NEW COMPONENTS):

### 3. <u>Spatial Modeling (refine_kmeans.py >> def kmean_to_spatial_model wrappers) </u>
#### Created wrappers to run def segment_TILS on a folder of patches and use the output kmeans labels of the highest contour cluster to do further clustering with dbscan. Similarily, a wrapper was created to run segment_TILS on a single patch as both the superpath and patch to run dbscan on-itself and generate a ground truth scoring dbscan classification on the cluster.

Sub-Components:
* mask_to_features
* km_dbscan_wrapper


### 4. <u>Scoring / Preprocessing Updates (functions HERE) </u>
#### Hanson and Stanley add information about what you did

### 4. <u>Bug Fixes from Original Code</u>

---
# **Example Walkthrough**
---

## 1) Pre-Preprocessing Step on Slide Image

#### Downloading Sample Raw Slide Image (.Svs)

In [10]:
!gdown 'https://drive.google.com/uc?id=1_aR-Vwd0B3suQW214zfkLudl6HK3w4q3' -O "Image_Files/TCGA-A2-A0CW-01Z-00-DX1.svs"

Downloading...
From (original): https://drive.google.com/uc?id=1_aR-Vwd0B3suQW214zfkLudl6HK3w4q3
From (redirected): https://drive.usercontent.google.com/download?id=1_aR-Vwd0B3suQW214zfkLudl6HK3w4q3&confirm=t&uuid=e7898a4d-117e-4d43-950e-cead52cbeefb
To: /Users/laurenfrank/TilsegV2/Example/Image_Files/TCGA-A2-A0CW-01Z-00-DX1.svs
100%|████████████████████████████████████████| 667M/667M [00:23<00:00, 28.4MB/s]


Svs Slide Image was Saved to the `Image_Files` Folder

<img src= "Notebook_Images/Image_4.png" style="width: 165px;">, <span style="font-size: 6em;">&rarr;</span> <img src= "Notebook_Images/image_5.png" style="width: 170px;">

#### Creating Superpatch and Patch Images Using `Preprocess` Function in `Tilseg.Processing` Module

In [4]:
from tilseg.preprocessing import preprocess
path = repository_path + '/Example/Image Files'
superpatch = preprocess(path, patches=6, training=True, save_im=True)

/Users/laurenfrank/TILseg_Project2024/Example/Image Files/TCGA-A2-A0CW-01Z-00-DX1.svs
Percent of pixels lost in pre-processing for TCGA-A2-A0CW-01Z-00-DX1.svs:                       1.7593642775049286e-06 %


| Before     | After    |
|--------------|--------------|
| <img src= "Notebook_Images/Image_4.png" style="width: 165px;">, <span style="font-size: 6em;">&rarr;</span> <img src= "Notebook_Images/image_5.png" style="width: 170px;"> | <img src= "Notebook_Images/Image_4.png" style="width: 170px;">, <span style="font-size: 6em;">&rarr;</span> <img src= "Notebook_Images/image_6.png" style="width: 500px;"> |


#### For sake of time, only three images from the creates folder "TCGA-A2-..." will be used in model construction. The 3 patches chosen had a good ratio of pink (breast tissue) to slide background (white), which will be useful in downstream analysis:
* position_7_8tissue.tif: /Users/laurenfrank/TilsegV2/Example/Image_Files/TCGA-A2-A0CW-01Z-00-DX1/position_7_8tissue.tif
* position_14_20tissue.tif: /Users/laurenfrank/TilsegV2/Example/Image_Files/TCGA-A2-A0CW-01Z-00-DX1/position_14_20tissue.tif
* position_6_16tissue.tif: /Users/laurenfrank/TilsegV2/Example/Image_Files/TCGA-A2-A0CW-01Z-00-DX1/position_6_16tissue.tif

#### Creates Three_Patches_Example Folder & Single_Patch_Example Folders and Move Patches to these Folders

In [18]:

!mkdir Image_Files/Three_Patches_Example
!mv Image_Files/TCGA-A2-A0CW-01Z-00-DX1/position_7_8tissue.tif Image_Files/TCGA-A2-A0CW-01Z-00-DX1/position_14_20tissue.tif Image_Files/TCGA-A2-A0CW-01Z-00-DX1/position_6_16tissue.tif Image_Files/Three_Patches_Example

!mkdir Image_Files/Single_Patch_Example
!cp Image_Files/Three_Patches_Example/position_7_8tissue.tif Image_Files/Single_Patch_Example/position_7_8tissue.tif

mkdir: Image_Files/Three_Patches_Example: File exists


mv: Image_Files/TCGA-A2-A0CW-01Z-00-DX1/position_7_8tissue.tif: No such file or directory
mv: Image_Files/TCGA-A2-A0CW-01Z-00-DX1/position_14_20tissue.tif: No such file or directory
mv: Image_Files/TCGA-A2-A0CW-01Z-00-DX1/position_6_16tissue.tif: No such file or directory


<img title="a title" alt="Alt text" src="Notebook_Images/image_10.png" width="180">  
<span style="font-size: 6em;">&rarr;</span>
<img title="a title" alt="Alt text" src="Notebook_Images/image_11.png" width="440"><br>
<img title="a title" alt="Alt text" src="Notebook_Images/image_13.png" width="180">
<span style="font-size: 6em;">&rarr;</span>
<img title="a title" alt="Alt text" src="Notebook_Images/image_14.png" width="180">  

## 2) Single Image (Testing Model Accuracy)

### - Running Segment_TILS on Single Patch - KMeans Only

`position_7_8tissues.tif`

<img src= "Notebook_Images/image_12.png" style="width: 600px;">

#### Run segment_TILS
    Applies a clustering model to patches and generates multiple files: TILs
    overlayed on the original H&E patch, binary segmentation masks of each
    cluster, individual clusters overlayed on the original patch, image of all
    the clusters, and a CSV file containing countour information of each TIL
    segmented from the patch. These images are saved locally inside a "ClusteringResults" 
    folder for each image.

In [32]:
from tilseg.seg import segment_TILs
from tilseg.model_selection import opt_kmeans
from tilseg.refine_kmeans import KMeans_superpatch_fit

#Opens Superpatch Image / Retrieves Pixel Data
superpatch_path = 'comparison/superpatch_training.tif'
img = Image.open(superpatch_path)
numpy_img = np.array(img)
numpy_img_reshape = np.float32(numpy_img.reshape((-1, 3))/255.)

hyperparameter_dict = opt_kmeans(numpy_img_reshape,n_clusters = [1,2,3,4,6,7,8])
kmeans_fit = KMeans_superpatch_fit(superpatch_path,hyperparameter_dict)

TIL_count_dict, kmean_labels_dict, cluster_mask_dict = segment_TILs(in_dir_path = repository_path + '/Example/comparison/TCGA-A2-A0CW-01Z-00-DX1/position_7_8tissue.tif',
                                                        out_dir_path = repository_path + '/Example/comparison',
                                                        hyperparameter_dict = None,
                                                        algorithm = 'KMeans',
                                                        model = kmeans_fit,
                                                        save_TILs_overlay = True,
                                                        save_cluster_masks = True,
                                                        save_cluster_overlays = True,
                                                        save_all_clusters_img = True,
                                                        save_csv = True,
                                                        multiple_images = False)

In [None]:
#Contour
from PIL import Image
import matplotlib.pyplot as plt

# Open the images
image1 = Image.open('comparison/TCGA-A2-A0CW-01Z-00-DX1/position_7_8tissue.tif')
image2 = Image.open('path_to_second_image')
image3 = Image.open('path_to_third_image')

# Create a figure and axis objects
fig, axs = plt.subplots(3, 1, figsize=(8, 8))

# Display the images on separate axes
axs[0].imshow(image1)
axs[0].axis('off')
axs[0].set_title('Image 1')

axs[1].imshow(image2)
axs[1].axis('off')
axs[1].set_title('Image 2')

axs[2].imshow(image3)
axs[2].axis('off')
axs[2].set_title('Image 3')

# Adjust layout to prevent overlap
plt.tight_layout()

# Display the images
plt.show()

### Running Kmeans-Dbscan Model on Same Patch - Kmeans fed into Dbscan

## From 2024 Software Project (NEW COMPONENTS):

## 1) Multiple Images (Predicting Superpatch Model on Superpatches)

### Running Segment_TILS on Folder of Patches from Slide - KMeans Only

### Running Kmeans-Dbscan Model on Superpatch and Folder of Patches - KMeans fed into Dbscan

In [5]:
from tilseg.seg import kmean_to_spatial_model_superpatch_wrapper
im_labels, dbscan_model, cluster_mask_dict = kmean_to_spatial_model_superpatch_wrapper(superpatch_path = repository_path + '/Example/Image Files/superpatch_training.tif',
                                            in_dir_path = repository_path + 'Example/Image Files/TCGA-A2-A0CW-01Z-00-DX1',
                                            spatial_hyperparameters= {'eps': 15,'min_samples': 100},
                                            n_clusters = [1,2,4,5,6,7,8,9],
                                            out_dir_path = repository_path + 'Example/Results',
                                            save_TILs_overlay = True,
                                            save_cluster_masks = True,
                                            save_cluster_overlays =  True,
                                            save_all_clusters_img = True,
                                            save_csv = True)

Found hyperparameters. Time took: 4.0150078694025675 minutes.


KeyboardInterrupt: 

CLustering Results should have been saved to the `Results` Folder

<img title="a title" alt="Alt text" src="Notebook_Images/Image_1.png" width="200">  
<span style="font-size: 6em;">&rarr;</span>
<img title="a title" alt="Alt text" src="Notebook_Images/image_2.png" width="200">
<span style="font-size: 6em;">&rarr;</span>
<img title="a title" alt="Alt text" src="Notebook_Images/image_3.png" width="600">

### BREAK