Data and code below were taken from the DIY.transcriptomics course by Dr. Daniel Beiting
These are the fastq files that come from 1000 peripheral blood mononuclear cells (PBMCs) and is one of the sample datasets provided by 10X Genomics.
Storage space ~ 5Gb
Download here Note: do not uncompress them.
Get the reference sequences from Ensembl (cDNA fasta file for Human) here
(Optional) This file will be generated on the fly but I am anyway pasting the link here (just in case). transcript to gene mapping file
- Create a conda environment (name it sva_demo) and activate it
conda create --name sva_demo
conda activate sva_demo
- Install the Kallisto package (popular for single-cell analysis)
conda install kallisto
- Install the kb-python package that consist of some bustools required to perform preprocessing of the dataset
pip install kb-python
More info about kb-python here
- Use Kallisto to build index from reference sequences
kallisto index -i Homo_sapiens.GRCh38.cdna.all.fa Homo_sapiens.GRCh38.cdna.all.index
kallisto index -i input_fasta output_index
- Preprocessing scRNA-seq data
kb count \
pbmc_1k_v3_S1_mergedLanes_R1.fastq.gz pbmc_1k_v3_S1_mergedLanes_R2.fastq.gz \
-i Homo_sapiens.GRCh38.cdna.all.index \
-x 10XV3 \
-g t2g.txt \
-t 8 \
--cellranger
Great, now you are done with the initial setup and preprocessing!
-
You must have R and RStudio installed. If not .....
-
Now open the DIY_scRNAseq script on your system (Rstudio) and simply follow the instructions in it.
ML_input.tsv.gz