# **LoupeBrowser annotated output parser**
> **Author:** Cevi Bainton\
> **Date:** 2/8/2024\

This notebook is forked from `loupebrowser_parser.ipynb` made by Kacper Maciejewski for the `WSI-ST_framework` project.

### Load libraries and specify settings

In [1]:
import os
import pandas as pd

In [2]:
INPUT_PATH = R'..\original_data\Pathologist_annotations_LoupeBrowser\Pathologist_annotations_LoupeBrowser\exported_annotation_csvs'

CSV_INPUTS = {}

for lb_path in os.listdir(INPUT_PATH):
    short_name = lb_path[1:3] + lb_path[4]
    CSV_INPUTS[short_name] = os.path.join(INPUT_PATH, lb_path)

In [3]:
# Specify a directory path to save parsed CSVs
SAVE_DIR_PATH = "cleaned_classification_testing"

# Specify file paths of LoupeBrowser annotation outputs to parse with their names 
LB_PATH = CSV_INPUTS

# Specify the category name of exported labels
# if 'Graph-based' (as in first version of Malak annotations), script will automatically
# extract ST_cluster (based on "Cluster_" before every cluster label) and so it's ready for existing data
CATEGORY = "Graph-based"

# Specify clinical clusters with their labels
CLINICAL_LABELS = {
    0: "normal",
    1: "cancer"
    }

In [4]:
all_classifications = []
for file, path in LB_PATH.items():
    annotations = pd.read_csv(path)
    if CATEGORY == "Graph-based":
        short_list = annotations[CATEGORY].str.replace(R'Cluster \d', '', regex=True).unique()
        short_list  = [x.lower() for x in short_list]
        for label in short_list:
            if label not in all_classifications:
                all_classifications.append(label)
preclass_translator = pd.DataFrame({'annotations' : all_classifications})

if not os.path.exists("label_clean_classifiers.csv"):
    preclass_translator.to_csv("PREMADE_label_clean_classifiers.csv")

Add your own translations in `PREMADE_label_clean_classfiers.csv` and save premade version as `label_clean_classifiers.csv`. 
Columns will be:
0. Pandas rownames, no header
1. `annotations`: old annotation; string no quotes
2. `new_label` : new annotation [`normal`, `cancer`, `DCIS`]; string no quotes
3. `mapping` : number of classification [`normal` --> 0, `cancer` --> 1, `DCIS` --> 2]; int

My assumptions:
* Anything with question mark or a mix is more severe version
* Calcification is normal

My annotations:

0. Normal
1. Cancer
2. DCIS
For the analysis below, DCIS and Cancer will be grouped

In [5]:
class_translator = pd.read_csv("label_clean_classifiers.csv")

class_translator_dict = {}
class_mapper_dict = {}

for row_num in class_translator.index:
    # class_translator_dict
    row = class_translator.loc[row_num]
    class_translator_dict[row["annotations"]] = row["new_label"]
    if not row["new_label"] in class_mapper_dict.keys():
        class_mapper_dict[row["new_label"]]= int(row["mapping"])


NO_USER_INPUT = True


NOTE: _The following annotations don't seem to work: 70c, 74a, 74b, 84b, 86a._

### Parse and save all the files

In [7]:
# Iterate over files to parse
for file_name, path in LB_PATH.items():

    # Read the file into a dataframe
    print(f"Parsing {file_name}...")
    file = pd.read_csv(path)

    # Split cluster strings into their numbers and names
    if CATEGORY == "Graph-based":
        file[CATEGORY] = file[CATEGORY].str.replace(r'^Cluster\s+', '', regex=True)
        file["ST_cluster"] = file[CATEGORY].str.extract(r'(\d+)')
        file["ST_label"] = file[CATEGORY].str.replace(r'^\d+', '', regex=True)
    else:
        file["ST_cluster"] = 0
    # file["ST_cluster"] = pd.to_numeric(file['ST_cluster'], downcast="integer")
    file["ST_cluster"] = file["ST_cluster"].map((lambda x : -1 if type(x) == float else x)) # this removes Nans. Be careful!
    file["ST_cluster"]= file["ST_cluster"].map((lambda x : int(x) if type(x) == str else x))
    file.drop(columns=[CATEGORY], inplace=True)

    # Make everything lowercase
    file["ST_label"] = file["ST_label"].str.lower()

    # Iterate over all labels and correct their names
    labels = file['ST_label'].unique()
    print(f"Labels found: {labels}")
    for n in range(len(labels)):
        if NO_USER_INPUT:
            if labels[n] == "":
                print("somethings up")
                new_label = " "
            else:
                new_label = class_translator_dict[labels[n]]
        else:
            new_label = input(f"Rename '{labels[n]}'")
        if new_label:
            file["ST_label"] = file["ST_label"].replace(labels[n], new_label)
            labels[n] = new_label

    # Iterate over new labels and classify them into clinical categories
    labels = list(set(labels))
    print(f"Labels to classify: {labels} with classifiers: {CLINICAL_LABELS}")
    if NO_USER_INPUT:
        mapping = class_mapper_dict
    else:
        mapping = {}
        for label in labels:
            while True:
                clinical = input(f"Classify clinically '{label}' with numeric label")
                try:
                    if int(clinical) in list(CLINICAL_LABELS):
                        mapping[label] = int(clinical)
                        break
                except ValueError:
                    print("Enter numerical value!")

    # Map user input into new columns
    file['clinical_cluster'] = pd.to_numeric(file['ST_label'].map(mapping))
    file['clinical_label'] = file['clinical_cluster'].map(CLINICAL_LABELS)

    print(len(file.columns))
    # Save parsed document
    file.to_csv(os.path.join(SAVE_DIR_PATH, f"{file_name}.csv"), index=False, header=False)

Parsing 33A...
Labels found: ['fat' 'fat+stroma']
Labels to classify: ['normal'] with classifiers: {0: 'normal', 1: 'cancer'}
5
Parsing 33B...
Labels found: ['fat+stroma' 'fat']
Labels to classify: ['normal'] with classifiers: {0: 'normal', 1: 'cancer'}
5
Parsing 33C...
Labels found: ['invasive ca' 'invasive ca with mucin']
Labels to classify: ['cancer'] with classifiers: {0: 'normal', 1: 'cancer'}
5
Parsing 33D...
Labels found: ['invasive ca with some mucin' 'invasive ca' 'fat']
Labels to classify: ['cancer', 'normal'] with classifiers: {0: 'normal', 1: 'cancer'}
5
Parsing 34A...
Labels found: ['fat' 'stroma']
Labels to classify: ['normal'] with classifiers: {0: 'normal', 1: 'cancer'}
5
Parsing 34B...
Labels found: ['fat+stroma' 'calcification']
Labels to classify: ['normal'] with classifiers: {0: 'normal', 1: 'cancer'}
5
Parsing 34C...
Labels found: ['invasive ca' 'fat' 'dcis?']
Labels to classify: ['cancer', 'normal', 'dcis'] with classifiers: {0: 'normal', 1: 'cancer'}
5
Parsing 34