# Post Training Analyses

In this notebook we will demonstrate how to perform post training analyses as shown in our [manuscript](https://www.biorxiv.org/content/10.1101/2024.01.30.577989v1).

In [29]:
import pandas as pd
from metmhn.model import MetMHN
from metmhn.state import MetState

# load in mutation and annotation data
mut_handle = "../data/luad/G14_LUAD_Events.csv"
annot_handle = "../data/luad/G14_LUAD_sampleSelection.csv"
log_theta_handle = "../results/luad/luad_g14_cv_20muts_8cnvs.csv"

Let us first load some patient data:

In [2]:
mut_data = pd.read_csv(mut_handle, index_col=0)
mut_data.head()

Unnamed: 0,P.TP53 (M),M.TP53 (M),P.TERT/5p (Amp),M.TERT/5p (Amp),P.MCL1/1q (Amp),M.MCL1/1q (Amp),P.KRAS (M),M.KRAS (M),P.CDKN2A/9p (Del),M.CDKN2A/9p (Del),...,M.SETD2 (M),P.RB1 (M),M.RB1 (M),P.MET (M),M.MET (M),P.KMT2C (M),M.KMT2C (M),paired,P.AgeAtSeqRep,M.AgeAtSeqRep
GENIE-MSK-P-0000030,0,1,0,1,0,1,0,0,0,1,...,0,0,0,0,0,0,0,0,No primary included,68
GENIE-MSK-P-0000036,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,58,No metastasis included
GENIE-MSK-P-0000082,1,0,0,0,1,0,1,0,0,0,...,0,0,0,0,0,0,0,0,60,No metastasis included
GENIE-MSK-P-0000110,1,0,1,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,71,No metastasis included
GENIE-MSK-P-0000133,1,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,81,No metastasis included


The rows are the patients and all but the last three columns show whether a genomic event was found in a primary tumor `"P.event"` or metastasis `"M.event"` of the patient.
To get relevant metadata, let us look at the annotation dataframe:

In [3]:
annot_data = pd.read_csv(annot_handle, index_col=0)
annot_data.head()

Unnamed: 0_level_0,primID,metaID,paired,nPrim,nMeta,metaStatus,surgeryToLastContact,primTMB_GENIE,metaTMB_GENIE
patientID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
GENIE-MSK-P-0000030,,GENIE-MSK-P-0000030-T01-IM3,False,0,1,isMetastasis,,,0.981011
GENIE-MSK-P-0000036,GENIE-MSK-P-0000036-T01-IM3,,False,1,0,unknown,,0.588606,
GENIE-MSK-P-0000082,GENIE-MSK-P-0000082-T01-IM3,,False,1,0,present,9.95,1.177213,
GENIE-MSK-P-0000110,GENIE-MSK-P-0000110-T01-IM3,,False,1,0,present,4.35,1.471516,
GENIE-MSK-P-0000133,GENIE-MSK-P-0000133-T01-IM3,,False,1,0,present,3.92,0.196202,


Here, only two columns of the dataframe are important for us: `paired` and `metaStatus`. The first denotes whether the sample was paired (both primary tumor and metastasis) or not (only one of those).
The second shows what is known about potential metastases: For paired samples (`isPaired`) or unpaired metastases samples (`isMetastasis`) there was obviously a metastasis present, but for unpaired primary tumors this information is given by `absent`, `present` or `unknown`.

We will use a pretrained metMHN to reconstruct some patients histories. A metMHN is represented by its parameteres, which can be stored in a `.csv` file. From this we can create a `metMHN` object:

In [4]:
parameters = pd.read_csv(log_theta_handle, index_col=0)
obs1, obs2, log_theta = parameters.iloc[0], parameters.iloc[1], parameters.iloc[2:]
events = parameters.columns
metMHN = MetMHN(log_theta, obs1, obs2, events)

Let us now have a look at one patient:

In [21]:
annot_data[annot_data["metaStatus"] == "present"]

Unnamed: 0_level_0,primID,metaID,paired,nPrim,nMeta,metaStatus,surgeryToLastContact,primTMB_GENIE,metaTMB_GENIE
patientID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
GENIE-MSK-P-0000082,GENIE-MSK-P-0000082-T01-IM3,,False,1,0,present,9.95,1.177213,
GENIE-MSK-P-0000110,GENIE-MSK-P-0000110-T01-IM3,,False,1,0,present,4.35,1.471516,
GENIE-MSK-P-0000133,GENIE-MSK-P-0000133-T01-IM3,,False,1,0,present,3.92,0.196202,
GENIE-MSK-P-0000233,GENIE-MSK-P-0000233-T01-IM3,,False,1,0,present,4.69,0.098101,
GENIE-MSK-P-0000239,GENIE-MSK-P-0000239-T01-IM3,,False,1,0,present,3.49,0.392404,
...,...,...,...,...,...,...,...,...,...
GENIE-MSK-P-0078038,GENIE-MSK-P-0078038-T01-IM7,,False,1,0,present,,0.724760,
GENIE-MSK-P-0078819,GENIE-MSK-P-0078819-T02-IM7,,False,1,0,present,,0.072476,
GENIE-MSK-P-0078987,GENIE-MSK-P-0078987-T02-IM7,,False,1,0,present,,0.000000,
GENIE-MSK-P-0079198,GENIE-MSK-P-0079198-T03-IM7,,False,1,0,present,,1.956853,


In [26]:
patient = "GENIE-MSK-P-0000219"
mut_data.loc[patient]

P.TP53 (M)            0
M.TP53 (M)            0
P.TERT/5p (Amp)       0
M.TERT/5p (Amp)       1
P.MCL1/1q (Amp)       0
M.MCL1/1q (Amp)       1
P.KRAS (M)            0
M.KRAS (M)            0
P.CDKN2A/9p (Del)     0
M.CDKN2A/9p (Del)     0
P.EGFR/7p (Amp)       0
M.EGFR/7p (Amp)       0
P.EGFR (M)            1
M.EGFR (M)            1
P.RB1/13q (Del)       0
M.RB1/13q (Del)       0
P.TP53/17p (Del)      0
M.TP53/17p (Del)      0
P.STK11/19p (Del)     0
M.STK11/19p (Del)     0
P.STK11 (M)           0
M.STK11 (M)           0
P.KRAS/12p (Amp)      0
M.KRAS/12p (Amp)      0
P.KEAP1 (M)           0
M.KEAP1 (M)           0
P.RBM10 (M)           0
M.RBM10 (M)           0
P.SMARCA4 (M)         0
M.SMARCA4 (M)         0
P.ATM (M)             0
M.ATM (M)             0
P.NF1 (M)             0
M.NF1 (M)             0
P.PTPRD (M)           0
M.PTPRD (M)           0
P.PTPRT (M)           0
M.PTPRT (M)           1
P.ARID1A (M)          0
M.ARID1A (M)          0
P.BRAF (M)            1
M.BRAF (M)      

In [27]:
annot_data.loc[patient]

primID                  GENIE-MSK-P-0000219-T01-IM3
metaID                  GENIE-MSK-P-0000219-T02-IM6
paired                                         True
nPrim                                             1
nMeta                                             1
metaStatus                                 isPaired
surgeryToLastContact                            NaN
primTMB_GENIE                              0.392404
metaTMB_GENIE                              0.468237
Name: GENIE-MSK-P-0000219, dtype: object

Now let us find the most likely order in which the events accumulated according to metMHN.


In [40]:
state = MetState.from_seq(mut_data.loc[patient].to_numpy()[:-2])
order, likelyhood = metMHN.likeliest_order(state, met_status="isPaired", first_obs="PT")
mut_data.columns[list(order)]

Index(['P.EGFR (M)', 'M.EGFR (M)', 'P.BRAF (M)', 'M.BRAF (M)', 'paired',
       'M.PTPRT (M)', 'M.TERT/5p (Amp)', 'M.MCL1/1q (Amp)'],
      dtype='object')

metMHN allows us to reconstruct branched tumor histories:
We can put in the a patient's genomic events and metMHN will compute the order in which they most probably occurred: