# IMC breast cancer

In this notebook we demonstrate how mosna can be used to analyze spatiallly resolved omics data.  
The data used is from the publication by [Danenberg et al., Nature Genetics, 2023](https://doi.org/10.1038/s41588-022-01041-y) "Breast tumor microenvironment structures are associated with genomic features and clinical outcome".  
Here 693 tumors of breast cancer were processed with the [Imaging Mass Cytometry](https://doi.org/10.1038/s43018-020-0026-6) method (cytometry by time of fligh, CyTOF) to produce maps of 37 proteins and study the relationship between cell types in tumors with respect to survival.  

## Imports and data loading

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from time import time
import warnings
import joblib
from pathlib import Path
from time import time
from tqdm import tqdm
import copy
import matplotlib as mpl
import napari
import colorcet as cc
import composition_stats as cs
from sklearn.impute import KNNImputer

from tysserand import tysserand as ty
from mosna import mosna

import matplotlib as mpl
mpl.rcParams["figure.facecolor"] = 'white'
mpl.rcParams["axes.facecolor"] = 'white'
mpl.rcParams["savefig.facecolor"] = 'white'

In [2]:
# If need to reload modules after their modification
from importlib import reload
ty = reload(ty)
mosna = reload(mosna)

In [3]:
RUN_LONG = False

### Objects data

Load files that contains all the detected objects (the cells) across all samples and clinical data.  
Data is available here: https://zenodo.org/records/7324285  
Images are available here: https://zenodo.org/records/6036188

In [None]:
data_dir = Path("../data/raw/IMC_Breast_cancer_Danenberg_2022")
objects_path = data_dir / "SingleCells.csv"

if objects_path.with_suffix('.parquet').exists():
    obj = pd.read_parquet(objects_path.with_suffix('.parquet'))
else:
    obj = pd.read_csv(objects_path)
    # for latter use
    obj.to_parquet(objects_path.with_suffix('.parquet'))
obj

`ObjectNumber` is the ID of cells, starting at 1 for each image (there are 797 ObjectNumber = 1).

In [None]:
obj.rename(columns={'Location_Center_X': 'x', 'Location_Center_Y': 'y'}, inplace=True)
sample_cols = ['ImageNumber', 'ObjectNumber', 'metabric_id']
all_epitopes = pd.read_csv(data_dir / 'markerStackOrder.csv').iloc[:, 1].values
# remove Histone H3 and DNA markers
marker_cols = all_epitopes[1:-2]
pos_cols = ['x', 'y']
cell_type_cols = [
    'is_epithelial',
    'is_tumour',
    'is_normal',
    'is_dcis',
    'is_interface',
    'is_perivascular',
    'is_hotAggregate',
    ]
nb_phenotypes = obj['cellPhenotype'].unique().size

print(f'nb phenotypes: {nb_phenotypes}')
print(f'nb used markers: {len(marker_cols)}')

In [None]:
# # Load published network
# neighbs = pd.read_csv(data_dir / 'CellNeighbours.csv')

In [None]:
clinical_path = data_dir / "IMCClinical.csv"
clin = pd.read_csv(clinical_path, index_col=0)
clin

In [None]:
# Show number of cells per sample
pd.set_option('display.max_rows', None)
obj[sample_cols].groupby(sample_cols[:-1]).count()