# tut 1

NSCLC PBMCs Single Cell RNA-Seq (Fig. 2a,b):
* This example builds a signature matrix from single cell RNA sequencing data from NSCLC PBMCs and enumerates the proportions of the different cell types in a RNA-seq dataset profiled from whole blood using S-mode batch correction.


# example 1: generate signature matrix

### NSCLC PBMCs Single Cell RNA-Seq (Fig. 2a,b):

This example builds a signature matrix from single cell RNA sequencing data from NSCLC PBMCs and enumerates the proportions of the different cell types in a RNA-seq dataset profiled from whole blood using S-mode batch correction.

```
docker run \
    -v absolute/path/to/input/dir:/src/data \
    -v absolute/path/to/output/dir:/src/outdir \
    cibersortx/fractions \
    --username email_address_registered_on_CIBERSORTx_website \
    --token token_obtained_from_CIBERSORTx_website \
    --single_cell TRUE \
    --refsample Fig2ab-NSCLC_PBMCs_scRNAseq_refsample.txt \
    --mixture Fig2b-WholeBlood_RNAseq.txt \
    --fraction 0 \
    --rmbatchSmode TRUE 
```

## set up some stuff

In [None]:
import logging

In [None]:
logging.basicConfig()

## download data

In [None]:
%%bash

pushd /mnt/liulab/csx_example_files/

export BASE_URL="https://cibersortx.stanford.edu/inc/inc.download.page.handler.php"
# curl -O -J -L {$BASE_URL}?file=NSCLC_PBMCs_Single_Cell_RNA-Seq_Fig2ab.zip
# unzip NSCLC_PBMCs_Single_Cell_RNA-Seq_Fig2ab.zip
# curl -O -J -L {$BASE_URL}?file=RNA-Seq_mixture_melanoma_Tirosh_Fig2b-d.txt

tree -h

popd

### read data into dataframes

In [None]:
import pandas as pd

logging.getLogger('pandas').setLevel('DEBUG')

In [None]:
path = (
    "/mnt/liulab/csx_example_files/Fig2ab-NSCLC_PBMCs/"
    "Fig2ab-NSCLC_PBMCs_scRNAseq_refsample.txt"
)

nsclc_pbmc_sc = pd.read_csv(
    path,
    sep='\t',
    index_col=0
)

nsclc_pbmc_sc

In [None]:
nsclc_pbmc_sc.sum(axis=0).sort_values()

In [None]:
path = (
    "/mnt/liulab/csx_example_files/Fig2ab-NSCLC_PBMCs/"
    "Fig2b-WholeBlood_RNAseq.txt"
)

nsclc_wholeblood_mixtures = pd.read_csv(
    path,
    sep='\t',
    index_col=0
)

nsclc_wholeblood_mixtures

## run csx with docker

```
docker run \
    -v absolute/path/to/input/dir:/src/data \
    -v absolute/path/to/output/dir:/src/outdir \
    cibersortx/fractions \
    --username email_address_registered_on_CIBERSORTx_website \
    --token token_obtained_from_CIBERSORTx_website \
    --single_cell TRUE \
    --refsample Fig2ab-NSCLC_PBMCs_scRNAseq_refsample.txt \
    --mixture Fig2b-WholeBlood_RNAseq.txt \
    --fraction 0 \
    --rmbatchSmode TRUE 

```

In [1]:
!ls -l /mnt/liulab/csx_example_files

total 18239
drwxr-xr-x 1 jupyter jupyter        0 Jul 14 11:48 Expression_datasets
drwxr-xr-x 1 jupyter jupyter        0 Jul 14 11:48 Fig2ab-NSCLC_PBMCs
-rw-r--r-- 1 jupyter jupyter      835 Jul  2 21:48 Fig2b_ground_truth_whole_blood.txt
-rw-r--r-- 1 jupyter jupyter   146759 Jul  3 04:39 LM22.txt
-rw-r--r-- 1 jupyter jupyter 12259563 Jul 13 08:06 NSCLC_PBMCs_Single_Cell_RNA-Seq_Fig2ab.zip
-rw-r--r-- 1 jupyter jupyter  6264562 Jul 13 08:39 RNA-Seq_mixture_melanoma_Tirosh_Fig2b-d.txt
drwxr-xr-x 1 jupyter jupyter        0 Jul 14 11:48 Single_Cell_RNA-Seq_Melanoma_SuppFig_3b-d
-rw-r--r-- 1 jupyter jupyter     1974 Jul  2 21:48 groundtruth_HNSCC_Puram_et_al_Fig2cd.txt
-rw-r--r-- 1 jupyter jupyter     1216 Jul  2 21:48 groundtruth_Melanoma_Tirosh_et_al_SuppFig3b-d.txt


In [4]:
!tree -h /home/jupyter/csx

[01;34m/home/jupyter/csx[00m
├── [4.0K]  [01;34minput[00m
│   ├── [4.0K]  [01;34mmixture.txt[00m
│   │   └── [4.1M]  Fig2b-WholeBlood_RNAseq.txt
│   └── [ 52M]  refsample.txt
└── [4.0K]  [01;34moutput[00m
    ├── [2.0M]  CIBERSORTx_cell_type_sourceGEP.txt
    ├── [ 84K]  CIBERSORTx_refsample_inferred_phenoclasses.CIBERSORTx_refsample_inferred_refsample.bm.K999.pdf
    ├── [228K]  CIBERSORTx_refsample_inferred_phenoclasses.CIBERSORTx_refsample_inferred_refsample.bm.K999.txt
    ├── [ 421]  CIBERSORTx_refsample_inferred_phenoclasses.txt
    └── [9.6M]  CIBERSORTx_refsample_inferred_refsample.txt

3 directories, 7 files


In [11]:
!rsync -r ~/csx/ ~/csx.1

In [7]:
!./run_csx_fractions.sh

Fig2b-WholeBlood_RNAseq.txt

sent 4,351,876 bytes  received 35 bytes  8,703,822.00 bytes/sec
total size is 4,350,725  speedup is 1.00
Fig2ab-NSCLC_PBMCs_scRNAseq_refsample.txt

sent 54,724,713 bytes  received 35 bytes  109,449,496.00 bytes/sec
total size is 54,711,251  speedup is 1.00
[01;34m/home/jupyter/csx[00m
├── [4.0K]  [01;34min[00m
│   ├── [4.1M]  mixture.txt
│   └── [ 52M]  refsample.txt
└── [4.0K]  [01;34mout[00m

2 directories, 2 files
>Running CIBERSORTxFractions...
>[Options] username: lyronctk@stanford.edu
>[Options] token: dfeba2c8b9d61daebee5fa87026b8e56
>[Options] single_cell: TRUE
>[Options] refsample: refsample.txt
>[Options] mixture: mixture.txt
>[Options] rmbatchSmode: TRUE
>[Options] verbose: TRUE
>Making reference sample file.
>Making phenotype class file.
>single_cell is set to TRUE, so quantile normalization is set to FALSE, and the default parameters for building the signature matrix have been set to the following values:
	- G.min <- 300
	- G.max <- 500
	

In [None]:
!ls -hlt /home/jupyter/csx/output

In [None]:
path = "/home/jupyter/csx/output/CIBERSORTx_sigmatrix_Adjusted.txt"

learned_sigmatrix = pd.read_csv(
    path,
    sep='\t',
    index_col=0
)

In [None]:
learned_sigmatrix

In [None]:
tirosh_tumor_mixtures['53']

In [None]:
pd.merge(learned_sigmatrix, tirosh_tumor_mixtures['53'], left_index=True, right_index=True)

In [None]:
pd.merge(learned_sigmatrix, tirosh_tumor_mixtures['53'], left_index=True, right_index=True)

# attempt inferring fractions myself with sigmatrix, mixture

In [None]:
from sklearn.svm import NuSVR
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

In [None]:
_combined_data = pd.merge(learned_sigmatrix, tirosh_tumor_mixtures['53'], left_index=True, right_index=True)
y = _combined_data.values[:, -1]
X = _combined_data.values[:, :-1]
y.shape, X.shape

In [None]:
regr = make_pipeline(StandardScaler(), NuSVR(kernel='linear'))
regr.fit(X, y)

In [None]:
_ = regr.named_steps['nusvr'].coef_
import numpy as np
_ / np.sum(_)

# check fractions inferred by csx

In [None]:
!find /home/jupyter/csx/output -name '*txt'

In [None]:
path = "/home/jupyter/csx/output/CIBERSORTx_Adjusted.txt"

pd.read_csv(
    path,
    sep='\t',
    index_col=0
).loc[53]

# extra

In [None]:
pd.read_csv(
    "/mnt/liulab/csx_example_files/Fig2ab-NSCLC_PBMCs/Fig2ab-NSCLC_PBMCs_scRNAseq_sigmatrix.txt",
    sep='\t',
    index_col=0
)

In [None]:
pd.read_csv(
    "/mnt/liulab/csx_example_files/Fig2ab-NSCLC_PBMCs/Fig2b-WholeBlood_RNAseq.txt",
    sep='\t',
    index_col=0
)