# Preparations for CIBERSORTx

[CIBERSORTx](https://cibersortx.stanford.edu/) is a tool to estimate the abundance of cells of interest in a mixed cell sample, by imputing gene expression profiles, called signature matrices. In a first step, we prepared a gene expresion matrix from the TCGA BRCA cohort with genes from the [LM22](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5895181/) signature matrix, which allows to deconvolute the abundance of several immune cell types. The tables containing the gene expression data was retrieved via UCSC Xenabrowser. 

In [None]:
#import the necessary libraries

import numpy as np
import pandas as pd
import openpyxl


# import warnings filter
from warnings import simplefilter
# ignore all future warnings
simplefilter(action='ignore', category=FutureWarning)

In [3]:
df_samples = pd.read_csv("BRCA_LM22.tsv", sep= "\t", index_col= 0)
df_samples =df_samples.drop(["samples"], axis = 1).copy()
X = df_samples
X

The gene expression data from Xenabrowser was normalized by log(x+1). 
We had to get the data out of the logarithmic space to make it appropriate for CIBERSORTx.

In [5]:
df_raw = np.power(2, X)-1
pd.set_option("display.max_columns", 16)
df_raw.describe()

Unnamed: 0_level_0,ABCB4,ABCB9,ACAP1,ACHE,ACP5,ADAM28,ADAMDEC1,ADAMTS3,ADRB2,AIF1,...,ZBTB10,ZBTB32,ZFP36L2,ZNF135,ZNF165,ZNF204P,ZNF222,ZNF286A,ZNF324,ZNF442
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
TCGA-AN-A0FK-01,8.952,5.622,4.399,2.945,12.110,3.771,6.3730,3.200,5.338,7.663,...,10.050,1.6930,10.39,3.810,6.423,5.700,6.617,7.609,7.819,5.690
TCGA-BH-A1ES-06,8.887,8.016,4.011,8.925,8.072,9.120,3.2140,4.609,4.403,6.943,...,10.590,0.4436,10.52,3.269,6.882,4.731,6.461,7.725,7.136,4.328
TCGA-AO-A0JC-01,8.447,6.061,11.860,8.587,10.470,9.997,10.4000,6.311,7.158,10.100,...,8.127,6.4760,13.74,7.360,6.206,7.571,7.211,8.533,7.321,6.681
TCGA-GI-A2C8-11,8.223,4.479,5.295,12.140,5.024,1.351,0.6013,4.719,8.868,6.741,...,9.448,0.0000,11.45,5.587,3.952,6.842,5.420,6.670,6.491,4.801
TCGA-E9-A2JT-01,7.968,7.112,11.170,8.024,10.190,9.962,9.4730,5.813,7.649,9.336,...,8.688,6.3150,12.41,7.083,6.121,6.735,6.810,8.517,8.066,5.968
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TCGA-E2-A1LI-11,,,,,,,,,,,...,,,,,,,,,,
TCGA-E9-A1N8-11,,,,,,,,,,,...,,,,,,,,,,
TCGA-E9-A1NE-11,,,,,,,,,,,...,,,,,,,,,,
TCGA-E9-A1NH-11,,,,,,,,,,,...,,,,,,,,,,


In a next step, we joined the clustered HER2+ BRCA Patient cohort from our main analysis with the immune cell matrix we prepared earlier in this notebook.
This way we limited the analysis to the patients included in our cluster analysis. After dropping all information that is not needed for running the CIBERSORTx tool, we saved the matrix in a suitable format for the analysis.

In [6]:
# Import the table that resulted from the clustering analysis
df_clusters = pd.read_excel("BRCA_HER2+_Main.xlsx", index_col = 0)

# Drop all information that is not necessary for the CIBERSORTx deconvolution
all_clusters =df_clusters.drop(["IL22RA1", "IL22RA2", "IL10RB", "PVR", "histological_type", "sample_type", "OS_event_nature2012", "OS_Time_nature2012", "ER_Status_nature2012","PR_Status_nature2012", "HER2_Final_Status_nature2012", "AJCC_Stage_nature2012", "samples"], axis =1)
all_cluster_LM22 = all_clusters.join(df_raw)
ac_CISO = all_cluster_LM22.drop(["Cluster"], axis = 1)
                            
# Save the table in a fitting format                            
ac_exp =ac_CISO.transpose()
ac_exp.to_csv("BRCA_all_clusters_CISO.txt", sep = "\t")

After the CIBERSORTx run was completed, we read in the Excel file containing the deconvolution results and joined it again with the cluster labels for further analyses, which were carried out in excel and Graphpad Prism.

In [8]:
df_ciso_results = pd.read_excel("BRCA_CIBERSORTx_Job3_Results.xlsx", sep= "\t", index_col= 0)
df_ciso_results = df_ciso_results.join(all_clusters)
df_ciso_results.to_excel("CISO_Abs_Clusters_BRCA.xlsx")
