# **LoupeBrowser annotated output parser**
> **Author:** Kacper Maciejewski\
> **Supervisor:** prof. Carsten Daub\
> **Date:** 05.09.2023\
> Daub Lab, Karolinska Institutet, Sweden

This tool parses exported CSV files from LoupeBrowser into an expected format of `image.ipynb`. It's purpose is to automate manual correction of expert annotations (in order to achive coherent label names) and manual classification of expert labels into clinical classifiers (such as `cancer`/`non-cancer`). Remember that 'image.ipynb' will mach fatched barcodes to the ones already available in other `image_config.csv` files. It means that you may manually remove unwanted spots in LoupeBrowser not to include them in ML-analysis.

> **NOTE:** See the example input [here](/WSI-ST_framework/example_loupebrowser_70B.csv) (that is, exported annotations from LoupeBrowser) and example output [here](/WSI-ST_framework/example_parsed_output_70B.csv) (compatible *annotation file*).

### Load libraries and specify settings

In [12]:
import os
import pandas as pd


'c:\\Users\\cbainton\\Desktop\\ST_project\\image_inputters'

In [26]:
INPUT_PATH = R'..\original_data\Pathologist_annotations_LoupeBrowser\Pathologist_annotations_LoupeBrowser\exported_annotation_csvs'

CSV_INPUTS = {}

for lb_path in os.listdir(INPUT_PATH):
    short_name = lb_path[1:3] + lb_path[4]
    CSV_INPUTS[short_name] = os.path.join(INPUT_PATH, lb_path)

33A
33B
33C
33D
34A
34B
34C
34D
35A
35B
35C
35D
36A
36B
36C
36D
70A
70B
70C
70D
71A
71B
71C
71D
73A
73B
73C
73D
74A
74B
74C
74D
83A
83B
83C
83D
84A
84B
84C
84D
85A
85B
85C
85D
86A
86B
86C
86D


In [27]:
# Specify a directory path to save parsed CSVs
SAVE_DIR_PATH = "cleaned_classification"

# Specify file paths of LoupeBrowser annotation outputs to parse with their names 
LB_PATH = CSV_INPUTS

# Specify the category name of exported labels
# if 'Graph-based' (as in first version of Malak annotations), script will automatically
# extract ST_cluster (based on "Cluster_" before every cluster label) and so it's ready for existing data
CATEGORY = "Graph-based"

# Specify clinical clusters with their labels
CLINICAL_LABELS = {
    0: "normal",
    1: "cancer"
    }

### Parse and save all the files

In [28]:
# Iterate over files to parse
for file_name, path in LB_PATH.items():

    # Read the file into a dataframe
    print(f"Parsing {file_name}...")
    file = pd.read_csv(path)

    # Split cluster strings into their numbers and names
    if CATEGORY == "Graph-based":
        file[CATEGORY] = file[CATEGORY].str.replace(r'^Cluster\s+', '', regex=True)
        file["ST_cluster"] = file[CATEGORY].str.extract(r'(\d+)')
        file["ST_label"] = file[CATEGORY].str.replace(r'^\d+', '', regex=True)
    else:
        file["ST_cluster"] = 0
    file["ST_cluster"] = pd.to_numeric(file['ST_cluster'])
    file.drop(columns=[CATEGORY], inplace=True)

    # Make everything lowercase
    file["ST_label"] = file["ST_label"].str.lower()

    # Iterate over all labels and correct their names
    labels = file['ST_label'].unique()
    print(f"Labels found: {labels}")
    for n in range(len(labels)):
        new_label = input(f"Rename '{labels[n]}'")
        if new_label:
            file["ST_label"] = file["ST_label"].replace(labels[n], new_label)
            labels[n] = new_label

    # Iterate over new labels and classify them into clinical categories
    labels = list(set(labels))
    print(f"Labels to classify: {labels} with classifiers: {CLINICAL_LABELS}")
    mapping = {}
    for label in labels:
        while True:
            clinical = input(f"Classify clinically '{label}' with numeric label")
            try:
                if int(clinical) in list(CLINICAL_LABELS):
                    mapping[label] = int(clinical)
                    break
            except ValueError:
                print("Enter numerical value!")

    # Map user input into new columns
    file['clinical_cluster'] = pd.to_numeric(file['ST_label'].map(mapping))
    file['clinical_label'] = file['clinical_cluster'].map(CLINICAL_LABELS)

    # Save parsed document
    file.to_csv(os.path.join(SAVE_DIR_PATH, f"{file_name}.csv"), index=False, header=False)

Parsing 33A...
Labels found: ['fat' 'fat+stroma']


Labels to classify: ['noncancer'] with classifiers: {0: 'normal', 1: 'cancer'}
Parsing 33B...
Labels found: ['fat+stroma' 'fat']
Labels to classify: ['noncancer'] with classifiers: {0: 'normal', 1: 'cancer'}
Parsing 33C...
Labels found: ['invasive ca' 'invasive ca with mucin']
Labels to classify: ['cancer'] with classifiers: {0: 'normal', 1: 'cancer'}
Parsing 33D...
Labels found: ['invasive ca with some mucin' 'invasive ca' 'fat']
Labels to classify: ['noncancer', 'cancer'] with classifiers: {0: 'normal', 1: 'cancer'}
Parsing 34A...
Labels found: ['fat' 'stroma']
Labels to classify: ['noncancer'] with classifiers: {0: 'normal', 1: 'cancer'}
Parsing 34B...
Labels found: ['fat+stroma' 'calcification']
Labels to classify: ['noncancer'] with classifiers: {0: 'normal', 1: 'cancer'}
Parsing 34C...
Labels found: ['invasive ca' 'fat' 'dcis?']
Labels to classify: ['noncancer', 'cancer'] with classifiers: {0: 'normal', 1: 'cancer'}
Parsing 34D...
Labels found: ['invasive ca' 'fat+stroma' 'fat']
