# METABRIC Data Processing 
This notebook details how to take the expression data and clinical data we used for the ACM BCB 2022 and turn it into a working dataset with corresponding knowledge graphs. Note: the gene expression values have already been scaled. 

## Imports and Installations
Collection of any necessary imports/installations

In [131]:
import pandas as pd
import numpy as np

## Loading Gene Expression Dataset
This dataset of gene expressions will be used to run through a neural network and will not be used in the creation of knowledge graphs. Though, we will need this dataset to make sure we can map the clinical patient data back to the correct patients. 

In [132]:
gene_df = pd.read_csv("/large/metabric/expression_with_gene_ids_min_max.csv.gz")
ids = gene_df["Sample ID"].tolist()
gene_df = gene_df.drop(["Sample ID"], axis=1)
gene_df.index = ids
gene_df

Unnamed: 0,A1CF,A2M,A2ML1,A4GALT,A4GNT,AAAS,AACS,AACSP1,AADAC,AADACL2,...,ZWILCH,ZWINT,ZXDA,ZXDB,ZXDC,ZYG11A,ZYG11B,ZYX,ZZEF1,ZZZ3
MB-0362,1.219751,2.150558,1.372684,3.192991,2.993176,3.794798,4.126604,3.618550,1.012695,1.721840,...,2.215936,2.997841,3.623574,2.257687,3.675544,1.329378,3.281754,5.739061,5.390703,1.472493
MB-0346,1.925591,0.423069,2.044851,1.031714,3.152924,3.918240,5.442258,3.723501,0.871232,1.644038,...,1.563026,5.162684,4.896753,3.631203,4.579050,1.559356,1.215235,4.832701,4.114377,2.244343
MB-0386,1.290165,3.020347,1.042124,4.788568,3.076941,2.657606,1.911599,2.143955,1.244911,1.359461,...,1.671339,2.107369,4.203899,1.644563,2.680256,1.699300,3.581889,6.161312,5.155100,1.245346
MB-0574,1.865689,2.045825,0.849215,1.556568,2.595019,2.257794,2.762416,3.635041,1.562547,1.744279,...,1.772231,2.116088,6.696511,5.302001,5.487437,0.547193,3.649910,4.802795,5.649488,1.580370
MB-0185,1.107773,3.378350,1.338016,2.659079,5.552565,3.493465,2.222683,3.819486,0.581141,1.117930,...,3.824240,4.503181,6.446397,2.800815,3.050452,1.356709,2.598287,5.314861,5.035567,1.654862
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
MB-0812,2.635687,3.271664,1.494861,1.872801,4.675713,1.301922,2.111742,4.413416,2.503573,1.772415,...,1.875131,2.642349,5.778178,3.472448,6.653620,1.448253,5.384181,5.550904,7.651306,1.695729
MB-1076,1.860126,5.761932,1.318949,7.421969,3.096525,2.649793,2.362217,4.930373,3.852400,1.677142,...,0.919340,0.655563,3.437923,2.432257,4.132323,1.093682,5.714509,6.350396,6.499980,2.982819
MB-0814,2.997981,5.757508,1.064581,4.262213,4.196830,3.357376,3.401833,5.021007,1.339406,1.391563,...,0.826025,0.634754,4.924586,2.146230,5.000811,1.515802,5.384034,7.908487,7.665773,4.000816
MB-1087,1.125263,5.299443,0.925080,5.889678,3.916698,1.306057,2.208252,4.584160,5.786027,2.477522,...,0.559075,0.639761,3.427818,2.572478,5.036591,0.476961,7.130607,6.032467,6.567629,2.438793


## Knowledge Graph Construction Functions
Two functions are provided to turn clinical data into graphs. The first function, "generate_fully_connected_graph()", takes in two arguments: the dataframe for the clinical data and the feature we want to make a graph for. It creates a fully-connected/weighted graph in the form of an adjacency matrix that can be used for a Neural Graph Machine. The second function, "fc_to_knn()", takes in two arguments: the graph matrix from the first function and k, the number of neihgbors we want each patient to have (if you don't know, there is a default value of 4 already set so you only need to pass in the graph matrix). It creates an unweighted/undirected graph in the form of an adjacency matrix that can be used for a Graph Convolutional Neural Network.

In [133]:
def generate_fully_connected_graph(df, column):
    df1 = df[df[column].notna()]
    ids = df1["Patient ID"].tolist()
    df1 = df1.reset_index(drop=True)
    col = df1[column]
    col = (col - col.mean()) / col.std() 
    df1 = df1.drop([column], axis=1)
    adj = np.zeros([df1.shape[0], df1.shape[0]])
    for i in range(len(adj)):
        for j in range(len(adj[i])):
            if i != j:
                adj[i][j] = np.abs(col[i] - col[j])
                adj[j][i] = adj[i][j]
    adj = pd.DataFrame(adj)
    adj.columns = ids
    adj.index = ids
    col.index = ids
    return adj, df1, col

In [134]:
def fc_to_knn(fc, k=4):
    fc1 = fc.to_numpy()
    knn = np.zeros(shape=(fc1.shape[0],fc1.shape[0]))
    for i in range(len(fc1)):
        nearest = np.argpartition(fc1[i], -k)[-k:]
        for idx in nearest:
            knn[i][idx] = 1
    knn = pd.DataFrame(knn)
    knn.columns = fc.columns
    knn.index = fc.columns
    return knn

## Clinical Data Knowledge Graphs
For this conference, we plan to use data from the patients' clinical data. Other datasets can possibly be used, but the Sample IDs/Patient IDs must match correspondingly. 

In [135]:
clinical_df = pd.read_csv('brca_metabric_clinical_data.tsv',sep='\t')
clinical_df.head()

Unnamed: 0,Study ID,Patient ID,Sample ID,Age at Diagnosis,Type of Breast Surgery,Cancer Type,Cancer Type Detailed,Cellularity,Chemotherapy,Pam50 + Claudin-low subtype,...,Relapse Free Status (Months),Relapse Free Status,Number of Samples Per Patient,Sample Type,Sex,3-Gene classifier subtype,TMB (nonsynonymous),Tumor Size,Tumor Stage,Patient's Vital Status
0,brca_metabric,MB-0000,MB-0000,75.65,MASTECTOMY,Breast Cancer,Breast Invasive Ductal Carcinoma,,NO,claudin-low,...,138.65,0:Not Recurred,1,Primary,Female,ER-/HER2-,0.0,22.0,2.0,Living
1,brca_metabric,MB-0002,MB-0002,43.19,BREAST CONSERVING,Breast Cancer,Breast Invasive Ductal Carcinoma,High,NO,LumA,...,83.52,0:Not Recurred,1,Primary,Female,ER+/HER2- High Prolif,2.615035,10.0,1.0,Living
2,brca_metabric,MB-0005,MB-0005,48.87,MASTECTOMY,Breast Cancer,Breast Invasive Ductal Carcinoma,High,YES,LumB,...,151.28,1:Recurred,1,Primary,Female,,2.615035,15.0,2.0,Died of Disease
3,brca_metabric,MB-0006,MB-0006,47.68,MASTECTOMY,Breast Cancer,Breast Mixed Ductal and Lobular Carcinoma,Moderate,YES,LumB,...,162.76,0:Not Recurred,1,Primary,Female,,1.307518,25.0,2.0,Living
4,brca_metabric,MB-0008,MB-0008,76.97,MASTECTOMY,Breast Cancer,Breast Mixed Ductal and Lobular Carcinoma,High,YES,LumB,...,18.55,1:Recurred,1,Primary,Female,ER+/HER2- High Prolif,2.615035,40.0,2.0,Died of Disease


### "Age at Diagnosis" Fully Connected Graph (Adjacency Matrix)
Sample of creating a fully-connected graph for "Age of Diagnosis"

In [136]:
age_fc = pd.DataFrame(generate_fully_connected_graph(clinical_df, "Age at Diagnosis")[0])
age_fc

Unnamed: 0,MB-0000,MB-0002,MB-0005,MB-0006,MB-0008,MB-0010,MB-0014,MB-0020,MB-0022,MB-0025,...,MTS-T2418,MTS-T2419,MTS-T2421,MTS-T2423,MTS-T2424,MTS-T2425,MTS-T2426,MTS-T2427,MTS-T2428,MTS-T2429
MB-0000,0.000000,2.490601,2.054784,2.146091,0.101281,0.239392,1.473184,0.433515,1.030461,0.045270,...,1.743267,0.958337,0.530193,0.043735,0.510243,0.280058,0.867030,0.619198,0.429679,0.924576
MB-0002,2.490601,0.000000,0.435817,0.344510,2.591883,2.729994,1.017418,2.057086,3.521063,2.535871,...,0.747334,1.532265,3.020794,2.534336,1.980358,2.210543,1.623571,1.871404,2.060923,1.566025
MB-0005,2.054784,0.435817,0.000000,0.091307,2.156066,2.294177,0.581601,1.621269,3.085246,2.100054,...,0.311517,1.096448,2.584977,2.098520,1.544541,1.774726,1.187754,1.435587,1.625106,1.130208
MB-0006,2.146091,0.344510,0.091307,0.000000,2.247373,2.385484,0.672907,1.712576,3.176553,2.191361,...,0.402824,1.187754,2.676284,2.189826,1.635848,1.866033,1.279061,1.526894,1.716413,1.221515
MB-0008,0.101281,2.591883,2.156066,2.247373,0.000000,0.138111,1.574465,0.534796,0.929180,0.056012,...,1.844549,1.059618,0.428911,0.057546,0.611525,0.381340,0.968311,0.720479,0.530960,1.025858
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
MTS-T2425,0.280058,2.210543,1.774726,1.866033,0.381340,0.519451,1.193125,0.153457,1.310520,0.325328,...,1.463209,0.678278,0.810251,0.323794,0.230185,0.000000,0.586972,0.339139,0.149620,0.644518
MTS-T2426,0.867030,1.623571,1.187754,1.279061,0.968311,1.106422,0.606154,0.433515,1.897491,0.912300,...,0.876237,0.091307,1.397223,0.910765,0.356787,0.586972,0.000000,0.247832,0.437351,0.057546
MTS-T2427,0.619198,1.871404,1.435587,1.526894,0.720479,0.858590,0.853986,0.185683,1.649659,0.664467,...,1.124070,0.339139,1.149390,0.662933,0.108954,0.339139,0.247832,0.000000,0.189519,0.305379
MTS-T2428,0.429679,2.060923,1.625106,1.716413,0.530960,0.669071,1.043505,0.003836,1.460140,0.474948,...,1.313589,0.528658,0.959871,0.473414,0.080565,0.149620,0.437351,0.189519,0.000000,0.494898


### "Age at Diagnosis" K-Nearest Graph (Adjacency Matrix)
Sample of creating a k-nearest neighbors graph for "Age of Diagnosis"

In [137]:
age_knn = pd.DataFrame(fc_to_knn(age_fc, k=4))
age_knn

Unnamed: 0,MB-0000,MB-0002,MB-0005,MB-0006,MB-0008,MB-0010,MB-0014,MB-0020,MB-0022,MB-0025,...,MTS-T2418,MTS-T2419,MTS-T2421,MTS-T2423,MTS-T2424,MTS-T2425,MTS-T2426,MTS-T2427,MTS-T2428,MTS-T2429
MB-0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MB-0002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MB-0005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MB-0006,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MB-0008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
MTS-T2425,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MTS-T2426,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MTS-T2427,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MTS-T2428,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### "TMB" Fully Connected Graph (Adjacency Matrix)
Sample of creating a fully-connected graph for "TMB"

In [138]:
tmb_fc = pd.DataFrame(generate_fully_connected_graph(clinical_df, "TMB (nonsynonymous)")[0])
tmb_fc

Unnamed: 0,MB-0000,MB-0002,MB-0005,MB-0006,MB-0008,MB-0010,MB-0014,MB-0020,MB-0022,MB-0025,...,MTS-T2423,MTS-T2424,MTS-T2425,MTS-T2426,MTS-T2427,MTS-T2428,MTS-T2429,MTS-T2430,MTS-T2431,MTS-T2432
MB-0000,0.000000,0.491512,0.491512,0.245756,0.491512,0.983023,0.983023,0.000000,0.245756,1.228779,...,1.966046,0.737267,0.000000,0.491512,0.000000,0.491512,0.983023,1.474535,1.720290,1.228779
MB-0002,0.491512,0.000000,0.000000,0.245756,0.000000,0.491512,0.491512,0.491512,0.245756,0.737267,...,1.474535,0.245756,0.491512,0.000000,0.491512,0.000000,0.491512,0.983023,1.228779,0.737267
MB-0005,0.491512,0.000000,0.000000,0.245756,0.000000,0.491512,0.491512,0.491512,0.245756,0.737267,...,1.474535,0.245756,0.491512,0.000000,0.491512,0.000000,0.491512,0.983023,1.228779,0.737267
MB-0006,0.245756,0.245756,0.245756,0.000000,0.245756,0.737267,0.737267,0.245756,0.000000,0.983023,...,1.720290,0.491512,0.245756,0.245756,0.245756,0.245756,0.737267,1.228779,1.474535,0.983023
MB-0008,0.491512,0.000000,0.000000,0.245756,0.000000,0.491512,0.491512,0.491512,0.245756,0.737267,...,1.474535,0.245756,0.491512,0.000000,0.491512,0.000000,0.491512,0.983023,1.228779,0.737267
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
MTS-T2428,0.491512,0.000000,0.000000,0.245756,0.000000,0.491512,0.491512,0.491512,0.245756,0.737267,...,1.474535,0.245756,0.491512,0.000000,0.491512,0.000000,0.491512,0.983023,1.228779,0.737267
MTS-T2429,0.983023,0.491512,0.491512,0.737267,0.491512,0.000000,0.000000,0.983023,0.737267,0.245756,...,0.983023,0.245756,0.983023,0.491512,0.983023,0.491512,0.000000,0.491512,0.737267,0.245756
MTS-T2430,1.474535,0.983023,0.983023,1.228779,0.983023,0.491512,0.491512,1.474535,1.228779,0.245756,...,0.491512,0.737267,1.474535,0.983023,1.474535,0.983023,0.491512,0.000000,0.245756,0.245756
MTS-T2431,1.720290,1.228779,1.228779,1.474535,1.228779,0.737267,0.737267,1.720290,1.474535,0.491512,...,0.245756,0.983023,1.720290,1.228779,1.720290,1.228779,0.737267,0.245756,0.000000,0.491512


### "TMB" K-Nearest Graph (Adjacency Matrix)
Sample of creating a k-nearest neighborsgraph for "TMB"

In [139]:
tmb_knn = pd.DataFrame(fc_to_knn(tmb_fc, k=4))
tmb_knn

Unnamed: 0,MB-0000,MB-0002,MB-0005,MB-0006,MB-0008,MB-0010,MB-0014,MB-0020,MB-0022,MB-0025,...,MTS-T2423,MTS-T2424,MTS-T2425,MTS-T2426,MTS-T2427,MTS-T2428,MTS-T2429,MTS-T2430,MTS-T2431,MTS-T2432
MB-0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MB-0002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MB-0005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MB-0006,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MB-0008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
MTS-T2428,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MTS-T2429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MTS-T2430,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MTS-T2431,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Merge Subtype Column into Gene Expression Dataset
First we want to map the subtype categories for each patient from clinical data to gene expression data. Then, we want to make sure we are only using patients that have entries from both datasets. All patients will be labeled by their patient ID instead of row number. 

In [140]:
clinical_df.rename(columns={"Pam50 + Claudin-low subtype": "Subtypes"}, inplace=True)
clinical_df = clinical_df.dropna(subset=["Subtypes"])
clinical_df = clinical_df[clinical_df["Subtypes"] != "NC"]
ids = clinical_df["Sample ID"].tolist()
clinical_df = clinical_df.drop(["Patient ID", "Sample ID"], axis=1)
clinical_df.index = ids
clinical_df.head()

Unnamed: 0,Study ID,Age at Diagnosis,Type of Breast Surgery,Cancer Type,Cancer Type Detailed,Cellularity,Chemotherapy,Subtypes,Cohort,ER status measured by IHC,...,Relapse Free Status (Months),Relapse Free Status,Number of Samples Per Patient,Sample Type,Sex,3-Gene classifier subtype,TMB (nonsynonymous),Tumor Size,Tumor Stage,Patient's Vital Status
MB-0000,brca_metabric,75.65,MASTECTOMY,Breast Cancer,Breast Invasive Ductal Carcinoma,,NO,claudin-low,1.0,Positve,...,138.65,0:Not Recurred,1,Primary,Female,ER-/HER2-,0.0,22.0,2.0,Living
MB-0002,brca_metabric,43.19,BREAST CONSERVING,Breast Cancer,Breast Invasive Ductal Carcinoma,High,NO,LumA,1.0,Positve,...,83.52,0:Not Recurred,1,Primary,Female,ER+/HER2- High Prolif,2.615035,10.0,1.0,Living
MB-0005,brca_metabric,48.87,MASTECTOMY,Breast Cancer,Breast Invasive Ductal Carcinoma,High,YES,LumB,1.0,Positve,...,151.28,1:Recurred,1,Primary,Female,,2.615035,15.0,2.0,Died of Disease
MB-0006,brca_metabric,47.68,MASTECTOMY,Breast Cancer,Breast Mixed Ductal and Lobular Carcinoma,Moderate,YES,LumB,1.0,Positve,...,162.76,0:Not Recurred,1,Primary,Female,,1.307518,25.0,2.0,Living
MB-0008,brca_metabric,76.97,MASTECTOMY,Breast Cancer,Breast Mixed Ductal and Lobular Carcinoma,High,YES,LumB,1.0,Positve,...,18.55,1:Recurred,1,Primary,Female,ER+/HER2- High Prolif,2.615035,40.0,2.0,Died of Disease


In [141]:
df = gene_df.join(clinical_df["Subtypes"])
df = df.dropna(subset=["Subtypes"])
df

Unnamed: 0,A1CF,A2M,A2ML1,A4GALT,A4GNT,AAAS,AACS,AACSP1,AADAC,AADACL2,...,ZWINT,ZXDA,ZXDB,ZXDC,ZYG11A,ZYG11B,ZYX,ZZEF1,ZZZ3,Subtypes
MB-0362,1.219751,2.150558,1.372684,3.192991,2.993176,3.794798,4.126604,3.618550,1.012695,1.721840,...,2.997841,3.623574,2.257687,3.675544,1.329378,3.281754,5.739061,5.390703,1.472493,LumA
MB-0346,1.925591,0.423069,2.044851,1.031714,3.152924,3.918240,5.442258,3.723501,0.871232,1.644038,...,5.162684,4.896753,3.631203,4.579050,1.559356,1.215235,4.832701,4.114377,2.244343,Her2
MB-0386,1.290165,3.020347,1.042124,4.788568,3.076941,2.657606,1.911599,2.143955,1.244911,1.359461,...,2.107369,4.203899,1.644563,2.680256,1.699300,3.581889,6.161312,5.155100,1.245346,LumA
MB-0574,1.865689,2.045825,0.849215,1.556568,2.595019,2.257794,2.762416,3.635041,1.562547,1.744279,...,2.116088,6.696511,5.302001,5.487437,0.547193,3.649910,4.802795,5.649488,1.580370,LumA
MB-0185,1.107773,3.378350,1.338016,2.659079,5.552565,3.493465,2.222683,3.819486,0.581141,1.117930,...,4.503181,6.446397,2.800815,3.050452,1.356709,2.598287,5.314861,5.035567,1.654862,LumB
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
MB-5453,1.238993,7.584621,1.110820,3.889014,4.698544,2.651107,2.927911,3.748606,1.124726,1.430791,...,1.520239,3.399244,2.624860,3.980454,0.068855,6.457221,7.065948,6.739844,4.135328,Normal
MB-5471,0.868334,3.833235,1.481315,4.550401,4.245418,2.640081,3.230452,4.661994,1.337361,1.117667,...,3.360484,5.783653,3.606067,4.092534,1.349411,4.894363,7.840426,5.242286,3.462662,LumA
MB-5127,1.610597,3.229142,0.892671,2.490161,5.061196,3.474714,2.628163,3.134904,1.369727,1.242351,...,1.254301,6.126349,5.661292,4.973883,1.251135,7.088266,5.480289,5.373914,3.624108,LumB
MB-4313,1.684777,5.902019,1.183312,5.624849,4.646454,10.000000,0.990035,2.399242,1.161928,1.865723,...,0.927818,2.140717,0.512243,6.215700,1.532939,9.674210,6.452384,5.998580,1.480651,LumA


## Write DataFrames out to CSV files
Now that we're done, let's save our work so that we can access our files later. We will save the graph matrices into CSV files. 

In [143]:
df.to_csv("./expressions_and_subtypes.csv")  

In [144]:
age_fc = age_fc.loc[df.index]
age_fc = age_fc[df.index]
age_fc.to_csv("./age_fc_graph.csv") 
age_knn = age_knn.loc[df.index]
age_knn = age_knn[df.index]
age_knn.to_csv("./age_knn_graph.csv") 
tmb_fc = tmb_fc.loc[df.index]
tmb_fc = tmb_fc[df.index]
tmb_fc.to_csv("./tmb_fc_graph.csv") 
tmb_knn = tmb_knn.loc[df.index]
tmb_knn = tmb_knn[df.index]
tmb_knn.to_csv("./tmb_knn_graph.csv") 