# Cho et al study from Mayo

__Excerpts from Abstract__: Single-cell RNA sequencing to create a cellular atlas of APC heterogeneity in mouse visceral adipose tissue. Our analysis identified two distinct populations of adipose tissue-derived stem cells (ASCs) and three distinct populations of preadipocytes (PAs)



We will analyze data using the **kallisto | bustools** `kbtools` from Lior Pachter lab. This notebook is adapted from the incredible resources available [here](https://www.kallistobus.tools/)

## Setup

In [2]:
# This is  used to time the running of the notebook
import time
start_time = time.time()

### Install python packages

In [3]:
!pip install matplotlib
!pip install scikit-learn
!pip install numpy
!pip install scipy



In [5]:
%%time
# `kb` is a wrapper for the kallisto and bustools program, and the kb-python package contains the kallisto and bustools executables.
!pip install kb-python

Wall time: 1.82 s


In [None]:
%%time
# Install scanpy and other packages needed for single-cell RNA-seq analysis
!pip install scanpy python-igraph louvain MulticoreTSNE pybiomart

In [None]:
# Import packages
import anndata
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scanpy as sc
from sklearn.decomposition import TruncatedSVD
from scipy import sparse, io

matplotlib.rcParams.update({'font.size': 12})
%config InlineBackend.figure_format = 'retina'

### Download the kallisto index for mouse

This data consists of cells from a mouse visceral adipose tissue, so we download the mouse index (GRCm38.98 from [Ensembl](https://uswest.ensembl.org/index.html)).

In [None]:
!kb ref -d mouse -i index.idx -g t2g.txt -f1 transcriptome.fasta

## Pseudoalignment and counting

The data we will process was produced with **10xv2**. For a comprehensive list run `kb --list`

In [8]:
!kb --list

List of supported single-cell technologies

name        whitelist provided    barcode (file #, start, stop)        umi (file #, start, stop)    read file #    
--------    ------------------    ---------------------------------    -------------------------    -----------    
10XV1       yes                   (2, 0, 0)                            (1, 0, 0)                    0              
10XV2       yes                   (0, 0, 16)                           (0, 16, 26)                  1              
10XV3       yes                   (0, 0, 16)                           (0, 16, 28)                  1              
CELSEQ                            (0, 0, 8)                            (0, 8, 12)                   1              
CELSEQ2                           (0, 6, 12)                           (0, 0, 6)                    1              
DROPSEQ                           (0, 0, 12)                           (0, 12, 20)                  1              
INDROPS     yes             

### Run kallisto and bustools

The following command will generate an RNA count matrix of cells (rows) by genes (columns) in H5AD format, which is a binary format used to store [Anndata](https://anndata.readthedocs.io/en/stable/) objects. Notice that this requires providing the index and transcript-to-gene mapping downloaded in the previous step to the `-i` and `-g` arguments respectively. Also, since the reads were generated with the 10x Genomics Chromium Single Cell v2 Chemistry, the `-x 10xv2` argument is used. To view other supported technologies, run `kb --list`.

__Note:__ To output a [Loom](https://linnarssonlab.org/loompy/format/index.html) file instead, replace the `--h5ad` flag with `--loom`. To obtain the raw matrix output by `kb` instead of the H5AD or Loom converted files, omit these flags.

In [None]:
%%time
# This step runs `kb` to pseudoalign the reads, and then generate the cells x gene matrix in h5ad format.
!kb count -i index.idx -g t2g.txt -x 10xv2 --h5ad -t 2 \
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/078/SRR10305578/SRR10305578_1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/078/SRR10305578/SRR10305578_2.fastq.gz

## Basic QC

In [None]:
# import data
adata = anndata.read('counts_unfiltered/adata.h5ad')
adata