# Preparations for CIBERSORTx

[CIBERSORTx](https://cibersortx.stanford.edu/) is a tool to estimate the abundance of cells of interest in a mixed cell sample, by imputing gene expression profiles, called signature matrices. In a first step, we prepared a gene expresion matrix from the TCGA LUAD cohort with genes from the [LM22](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5895181/) signature matrix, which allows to deconvolute the abundance of several immune cell types. The tables containing the gene expression data was retrieved via UCSC Xenabrowser. 

In [None]:
#import the necessary libraries

import numpy as np
import pandas as pd
import openpyxl


# import warnings filter
from warnings import simplefilter
# ignore all future warnings
simplefilter(action='ignore', category=FutureWarning)


In [3]:
df_samples = pd.read_csv("TCGA_LUAD_LM22_samples.tsv", sep= "\t", index_col= 0)
df_samples =df_samples.drop(["samples"], axis = 1).copy()
X = df_samples
X

The gene expression data from Xenabrowser was normalized by log(x+1). 
We had to get the data out of the logarithmic space to make it appropriate for CIBERSORTx.

In [5]:
df_raw = np.power(2, X)-1
pd.set_option("display.max_columns", 16)
df_raw.describe()


Unnamed: 0_level_0,ABCB4,ABCB9,ACAP1,ACHE,ACP5,ADAM28,ADAMDEC1,ADAMTS3,ADRB2,AIF1,...,ZBP1,ZBTB10,ZBTB32,ZFP36L2,ZNF135,ZNF165,ZNF204P,ZNF222,ZNF286A,ZNF324
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
TCGA-97-8179-01,10.8000,6.182,6.325,4.012,11.310,8.107,1.735,3.420,5.512,8.689,...,3.521,9.359,2.8500,11.10,7.092,7.142,7.305,5.768,7.801,6.551
TCGA-78-7161-01,9.2260,6.342,5.979,9.600,9.926,8.745,3.754,4.760,5.954,6.916,...,5.195,9.257,2.4350,12.93,4.594,7.454,7.720,5.856,8.476,6.849
TCGA-MP-A4T8-01,9.1850,7.178,6.532,3.470,9.934,8.416,6.259,5.041,3.893,6.895,...,5.398,10.520,3.2960,12.77,4.403,6.450,7.404,4.757,7.704,7.201
TCGA-49-4510-01,8.8660,5.606,6.885,5.249,10.870,6.393,7.794,4.603,5.473,8.388,...,5.780,10.010,2.8850,11.37,5.969,6.628,7.459,6.370,7.236,7.940
TCGA-55-6971-01,8.7730,7.036,9.487,9.317,11.870,9.439,10.040,6.568,7.992,9.498,...,8.516,8.568,5.7610,11.60,4.999,5.084,7.380,7.661,7.538,7.702
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TCGA-91-6847-01,1.0200,7.591,5.287,2.973,6.817,4.880,1.245,7.858,3.661,5.573,...,6.024,9.554,1.0200,10.49,5.274,7.806,9.369,7.032,9.162,7.424
TCGA-91-8496-01,0.8273,7.657,7.421,7.558,12.890,7.117,1.733,4.043,7.421,10.600,...,4.173,7.076,2.9940,11.34,5.725,5.293,8.566,6.390,7.060,7.043
TCGA-44-6144-11,0.6950,5.921,8.078,6.017,13.680,6.198,6.977,5.344,10.190,10.190,...,4.652,8.041,2.7160,12.02,6.523,5.872,8.962,5.811,6.897,7.377
TCGA-78-7155-01,0.6587,7.097,4.757,9.502,9.259,4.410,7.158,7.086,2.161,7.286,...,2.763,10.050,0.6587,10.86,7.817,8.354,7.313,7.104,10.720,8.641


In a next step, we joined the clustered LUAD Patient cohort from our main analysis with the immune cell matrix we prepared earlier in this notebook.
This way we limited the analysis to the patients included in our cluster analysis. After dropping all information that is not needed for running the CIBERSORTx tool, we saved the matrix in a suitable format for the analysis.

In [7]:
# Import the table that resulted from the clustering analysis
df_clusters = pd.read_excel("LUAD_Table_for_decon.xlsx", sheet_name = 3, index_col = 0)

# Drop all information that is not necessary for the CIBERSORTx deconvolution
all_clusters =df_clusters.drop(["IL22RA1", "IL22RA2", "IL10RB", "PVR", "histological_type", "sample_type", "OS", "OS.time", "pathologic_stage"], axis =1)
all_cluster_LM22 = all_clusters.join(df_raw)
ac_CISO = all_cluster_LM22.drop(["Cluster"], axis = 1)

# Save the table in a fitting format 
ac_exp =ac_CISO.transpose()
ac_exp.to_csv("LUAD_all_clusters_CISO.txt", sep = "\t")

Unnamed: 0,ABCB4,ABCB9,ACAP1,ACHE,ACP5,ADAM28,ADAMDEC1,ADAMTS3,...,ZBTB32,ZFP36L2,ZNF135,ZNF165,ZNF204P,ZNF222,ZNF286A,ZNF324
count,576.0,576.0,576.0,576.0,576.0,576.0,576.0,576.0,...,576.0,576.0,576.0,576.0,576.0,576.0,576.0,576.0
mean,37.679112,177.616752,299.412888,354.689443,3137.844593,473.605969,245.574062,60.811615,...,15.365313,3873.388727,85.868312,112.215498,272.851171,67.185126,208.250309,157.765984
std,96.755229,136.831687,256.26418,653.048563,3209.729226,573.819944,413.970964,100.203187,...,14.015394,2050.095911,81.522608,66.934446,203.386024,28.361015,135.305972,53.245688
min,0.497546,23.43708,0.597372,2.319578,111.75128,4.567386,0.0,2.24901,...,0.0,729.125167,1.711329,5.570486,3.756828,14.369518,38.72568,42.411338
25%,7.830544,90.249836,122.63985,44.37262,1397.825223,151.747085,32.626174,18.024086,...,6.147805,2447.201333,27.403563,69.144135,124.997438,48.582189,135.476287,119.697383
50%,14.364193,140.288553,218.564361,117.603528,2232.373249,322.249016,103.040622,34.126796,...,11.17627,3443.311717,63.759367,97.325982,218.716604,62.756511,177.094701,154.632548
75%,30.130354,211.63741,382.943599,368.455109,3690.521895,562.101063,277.736452,64.776496,...,19.825102,4836.345941,116.722809,139.093874,383.076548,81.796637,241.35869,189.348615
max,1781.887554,1175.267116,1477.583496,7803.01048,31432.166662,5517.268731,4328.545894,1540.372669,...,98.525893,14461.205891,782.071374,648.416396,1378.567183,244.401456,1685.714403,406.879729


In [None]:
After the CIBERSORTx run was completed, we read in the Excel file containing the deconvolution results and joined it again with the cluster labels for further analyses, which were carried out in excel and Graphpad Prism.

In [8]:
df_ciso_results = pd.read_excel("CIBERSORTx_Job1_Results_LUAD_all_clusters.xlsx", sep= "\t", index_col= 0)
df_ciso_results = df_ciso_results.join(all_clusters)
df_ciso_results.to_excel("CISO_Abs_Clusters_LUAD.xlsx")