# 1. Installing `spaceranger`

* For detailed instructions: https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/installation

**System requirements**

```
Space Ranger pipelines run on Linux systems that meet these minimum requirements:

- 8-core Intel or AMD processor (32 cores recommended)
- 64GB RAM (128GB recommended)
- 1TB free disk space
- 64-bit CentOS/RedHat 6.0 or Ubuntu 12.04
- Note: Version 1.3 is the last version that will support CentOS/RedHat 6 or 
        Ubuntu 12.04. Future versions will require CentOS/RedHat 7 or newer, or 
        Ubuntu 14.04 or newer.
```

```bash
LOCAL=$(realpath ~/data)
LOCALSOURCE=$LOCAL/source_files
LOCALBIN=$LOCAL/local

# Below two lines assume that directories don't exist
mkdir -p $LOCALSOURCE # Make a directory to store .tar.gz file
mkdir -p $LOCALBIN # Make a directory to store locally installed packages/softwares

# Always get an updated download link from below:
# https://support.10xgenomics.com/spatial-gene-expression/software/downloads/latest
wget -O $LOCALSOURCE/spaceranger-1.3.1.tar.gz "https://cf.10xgenomics.com/releases/spatial-exp/spaceranger-1.3.1.tar.gz?Expires=1649214425&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZi4xMHhnZW5vbWljcy5jb20vcmVsZWFzZXMvc3BhdGlhbC1leHAvc3BhY2VyYW5nZXItMS4zLjEudGFyLmd6IiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNjQ5MjE0NDI1fX19XX0_&Signature=okfVjpy~mx1NuUK0hY~~ttQ-nu3t1QDPJYPKfQ9Khw2de7HJOrDEk37B0eUW5HmxzfpX639mG-qOrBZRY3rqqNfW5-wuXXyXnq~pbiurN2rCN21PDdLxeK9iNwvitPzmcCnlyfoKNaXV6koK1pyUwAm2fOFhyj9hjXwZhae8AyghKJy8MlF062MJ7UKibYN4qPZoCXDN5PwiSVxWVvR14DpxgP1z~i6qhbmllk5N7SDoxeSN8XLyH7PfMVBahDJomhTKOLNJrEe-C80KvjrOwvV4V5j9J~mVXtCko6aiStjstgpDT8bO8Hr0XfK1~XEfHYapFnda1YiD2UMxqSJK5g__&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA"

# Extract the downloadded .tar.gz file to LOCALBIN
tar -xzvf $LOCALSOURCE/spaceranger-1.3.1.tar.gz -C $LOCALBIN

# Add below spaceranger path to PATH
export PATH=$LOCALBIN/spaceranger-1.3.1:$PATH

# Verify installation
spaceranger testrun --id=tiny
```
---

# 2. Dependencies

10x has reference packages available for a human/mouse. If not, you can try to follow the build steps highlighted below:

https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build

```bash
# Create a directory to store dependency files
LOCAL=$(realpath ~/data)
WORKDIR=$LOCAL/sr_workdir
mkdir -p $WORKDIR

# Download fasta
wget -P $WORKDIR http://ftp.ensembl.org/pub/release-105/fasta/rattus_norvegicus/dna/Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa.gz
gunzip $WORKDIR/Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa.gz
 
# Download GTF
wget -P $WORKDIR http://ftp.ensembl.org/pub/release-105/gtf/rattus_norvegicus/Rattus_norvegicus.mRatBN7.2.105.gtf.gz
gunzip $WORKDIR/Rattus_norvegicus.mRatBN7.2.105.gtf.gz
```
---

Before creating a reference package required for `spaceranger`, the input `.fasta` file requires a small modification in the headers. Below is the description of the process:

```
# Modify sequence headers in the Ensembl FASTA to match the file
# "GRCm38.primary_assembly.genome.fa" from GENCODE. Unplaced and unlocalized
# sequences such as "GL456210.1" have the same names in both versions.
#
# Input FASTA:
#   >1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF
#
# Output FASTA:
#   >chr1 1
```

```bash
# sed commands:
# 1. Replace metadata after space with original contig name, as in GENCODE
# 2. Add "chr" to names of autosomes and sex chromosomes
# 3. Handle the mitochrondrial chromosome

cat $WORKDIR/Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa \
    | sed -E 's/^>(\S+).*/>\1 \1/' \
    | sed -E 's/^>([0-9]+|[XY]) />chr\1 /' \
    | sed -E 's/^>MT />chrM /' \
    > $WORKDIR/Rattus_norvegicus.mRatBN7.2.dna.toplevel.modified.fa
```
---

The `.gtf` file downloaded above seems to adhere to the format that `spaceranger` would like it to be prior to running the analysis. Specifically, the documentation attempts to do the following for `.gtf` files that requires formatting:
```
# Remove version suffix from transcript, gene, and exon IDs in order to match
# previous Cell Ranger reference packages
#
# Input GTF:
#     ... gene_id "ENSMUSG00000102693.1"; ...
# Output GTF:
#     ... gene_id "ENSMUSG00000102693"; gene_version "1"; ...
```

Since formatting is taken care of, we now filter out entries that is not needed in the analysis (as per 10x Genomics)

```bash

BIOTYPE_PATTERN=\
"(protein_coding|lncRNA|\
IG_C_gene|IG_D_gene|IG_J_gene|IG_LV_gene|IG_V_gene|\
IG_V_pseudogene|IG_J_pseudogene|IG_C_pseudogene|\
TR_C_gene|TR_D_gene|TR_J_gene|TR_V_gene|\
TR_V_pseudogene|TR_J_pseudogene)"
GENE_PATTERN="gene_type \"${BIOTYPE_PATTERN}\""
TX_PATTERN="transcript_type \"${BIOTYPE_PATTERN}\""
READTHROUGH_PATTERN="tag \"readthrough_transcript\""

LOCAL=$(realpath ~/data)
WORKDIR=$LOCAL/sr_workdir


cat $WORKDIR/Rattus_norvegicus.mRatBN7.2.105.gtf \
    | awk '$3 == "transcript"' \
    | grep -E "$GENE_PATTERN" \
    | grep -E "$TX_PATTERN" \
    | grep -Ev "$READTHROUGH_PATTERN" \
    | sed -E 's/.*(gene_id "[^"]+").*/\1/' \
    | sort \
    | uniq \
    > $WORKDIR/Rattus_norvegicus.mRatBN7.2.105.gene_allowlist

# Copy header lines beginning with "#"
grep -E "^#" $WORKDIR/Rattus_norvegicus.mRatBN7.2.105.gtf > $WORKDIR/Rattus_norvegicus.mRatBN7.2.105.filtered.gtf

# Filter to the gene allowlist
grep -Ff $WORKDIR/Rattus_norvegicus.mRatBN7.2.105.gene_allowlist \
         $WORKDIR/Rattus_norvegicus.mRatBN7.2.105.gtf \
         >> $WORKDIR/Rattus_norvegicus.mRatBN7.2.105.filtered.gtf
```
---

Once above filtering steps are done, run `spaceranger` to generate the reference file.

```bash
# Create reference package
spaceranger mkref --genome=mRatBN7.2.105 \
    --fasta=Rattus_norvegicus.mRatBN7.2.dna.toplevel.modified.fa \
    --genes=Rattus_norvegicus.mRatBN7.2.105.filtered.gtf
```
---

# 3. Preparing PBS script

Once the reference package has been downloaded/created, you are ready to run `spaceranger`. Below is the example command:

```
LOCAL=$(realpath ~/data)

spaceranger count \
	--id=[run_id] \
	--slide=[slide_id] \
	--transcriptome=$LOCAL/sr_workdir/mRatBN7.2.105 \
	--fastqs=[path-to-fastq-file-directory] --sample=[sample-name] \
	--image=[sample-slide-image].tif \
	--loupe-alignment=[cloupe-alignment].json \
	--area=[A1, B1, etc]
```

If you want to query multiple jobs using a PBS script rather than running it via an interactive session, follow the template below to create a `.pbs` file:

```bash
#PBS -N JM_spaceranger_count
#PBS -l nodes=1:ppn=5
#PBS -l mem=8gb
#PBS -l walltime=05:30:00
#PBS -q hive
#PBS -k oe

LOCAL=$(realpath ~/data)
LOCALBIN=$LOCAL/local
SRPATH=$LOCALBIN/spaceranger-1.3.1

spaceranger count \
    --transcriptome=$LOCAL/sr_workdir/mRatBN7.2.105 \
    --id=$run_id \
    --slide=$slide_id \
    --fastqs=$fastq_dir \
    --sample=$sample \
    --image=$image \
    --loupe-alignment=$loupe \
    --area=$area
```
---

For detailed information on customizing a PBS script, refer to PACE documentation: https://docs.pace.gatech.edu/software/PBS_script_guide/.

Every input argument except `--transcriptome` (rat reference files) is required when running the above PBS script. Example is shown below:

```bash
qsub -d ./ -v run_id=sr_count_JM04_1_220217,slide_id=V11L12-118,fastq_dir=./fastq,sample=JM04_1,image=a1_Composite.tif,loupe=V11L12-118-A1.json,area=A1 re_helper.pbs
```