Mapping one big fastq of all samples will be better. Then, umi demultiplexing will work properly.

pos 1-8: bc1
pos 9-16: 8 bp index
pos 17-24: bc2 (reverse complement of whitelist)  
pos 25-30: UMI

In [1]:
star_solo_indrop_rna_mapping_unique_cellfilter () {
    local fastq_R1_filename="${1}";
    local fastq_R2_filename="${2}";
    local bam_filename="${3}";
    # 3 barcode lists:
    local whitelist_part1_filename='indrop_whitelist/gel_barcode2_list.txt';
    local whitelist_part2_filename='indrop_whitelist/index_whitelist.txt';
    local whitelist_part3_filename='indrop_whitelist/gel_barcode2_list_revcomp.txt';
    star_reference_dir='/lustre1/project/stg_00002/lcb/fderop/data/00000000_genomes/mm10_STAR_2.7.8_premrna';
    sjdbgtf='/lustre1/project/stg_00002/lcb/fderop/data/00000000_genomes/mm10_STAR_2.7.8_premrna/genes.gtf'
    #module load STAR/2.7.5b-foss-2018a;
    #module load SAMtools/1.10-foss-2018a;
    # Map with STAR solo
    /lustre1/project/stg_00002/lcb/fderop/scripts/STAR-2.7.8a/bin/Linux_x86_64/STAR \
        --runThreadN 32 \
        --runMode alignReads \
        --outSAMtype BAM SortedByCoordinate \
        --sysShell /bin/bash \
        --genomeDir "${star_reference_dir}" \
        --readFilesIn "${fastq_R1_filename}" "${fastq_R2_filename}" \
        --readFilesCommand 'gzip -c -d' \
        --soloCBwhitelist "${whitelist_part1_filename}" "${whitelist_part2_filename}" "${whitelist_part3_filename}" \
        --soloType CB_UMI_Complex \
        --soloCBposition 0_0_0_7 0_8_0_15 0_16_0_23 \
        --soloUMIposition 0_24_0_29 \
        --sjdbGTFfile $sjdbgtf \
        --soloCellFilter CellRanger2.2 2000 0.99 10 \
        --soloCBmatchWLtype 1MM \
        --outFilterMultimapNmax 1 \
        --outSAMattributes NH HI AS nM CB UB CR CY UR UY \
        --outFileNamePrefix ${bam_filename%bam} \
        --outReadsUnmapped Fastx \
        --quantMode GeneCounts \
        --bamRemoveDuplicatesType UniqueIdentical \
        --soloFeatures Gene GeneFull
        # Index BAM file.
    # samtools index "${bam_filename%bam}Aligned.sortedByCoord.out.bam"
}

In [2]:
# cat SRR10545068_234.fastq.gz SRR10545069_234.fastq.gz SRR10545070_234.fastq.gz SRR10545071_234.fastq.gz SRR10545072_234.fastq.gz SRR10545073_234.fastq.gz SRR10545074_234.fastq.gz SRR10545075_234.fastq.gz SRR10545076_234.fastq.gz SRR10545077_234.fastq.gz SRR10545078_234.fastq.gz SRR10545079_234.fastq.gz > indrop_234.fastq.gz 

In [3]:
# cat SRR10545068_1.fastq.gz SRR10545069_1.fastq.gz SRR10545070_1.fastq.gz SRR10545071_1.fastq.gz SRR10545072_1.fastq.gz SRR10545073_1.fastq.gz SRR10545074_1.fastq.gz SRR10545075_1.fastq.gz SRR10545076_1.fastq.gz SRR10545077_1.fastq.gz SRR10545078_1.fastq.gz SRR10545079_1.fastq.gz > indrop_1.fastq.gz

In [4]:
dir=demultiplexed_indrop_index
star_solo_indrop_rna_mapping_unique_cellfilter \
    fastq_indrop/indrop_1.fastq.gz \
    fastq_indrop/indrop_234.fastq.gz \
    $dir/merged.bam

Jan 31 17:49:01 ..... started STAR run
Jan 31 17:49:02 ..... loading genome
Jan 31 17:50:38 ..... processing annotations GTF
Jan 31 17:50:45 ..... started mapping
Jan 31 20:20:31 ..... finished mapping
Jan 31 20:20:33 ..... started Solo counting
Jan 31 20:33:02 ..... finished Solo counting
Jan 31 20:33:02 ..... started sorting BAM
Jan 31 20:51:01 ..... finished successfully


Now, we can get a summary of the statistics below:

In [5]:
for sample in $dir/*/Gene/Summary.csv
do
    echo ${sample#*/}
    cat $sample
    cut -d, -f2 $sample > ${sample%/Gene/Summary.csv}/Gene/Summary_numbers.csv
    printf "\n"
done

SRR10545068.Solo.out/Gene/Summary.csv
Number of Reads,136252173
Reads With Valid Barcodes,0.4216
Sequencing Saturation,0.261272
Q30 Bases in CB+UMI,0.834279
Q30 Bases in RNA read,0.800453
Reads Mapped to Genome: Unique+Multiple,0.488978
Reads Mapped to Genome: Unique,0.488978
Reads Mapped to Transcriptome: Unique+Multipe Genes,0.290449
Reads Mapped to Transcriptome: Unique Genes,0.282861
Estimated Number of Cells,13956
Reads in Cells Mapped to Unique Genes,20664237
Fraction of Reads in Cells,0.536171
Mean Reads per Cell,1480
Median Reads per Cell,1256
UMIs in Cells,14962203
Mean UMI per Cell,1072
Median UMI per Cell,919
Mean Genes per Cell,795
Median Genes per Cell,717
Total Genes Detected,20789

SRR10545069.Solo.out/Gene/Summary.csv
Number of Reads,103036640
Reads With Valid Barcodes,0.352682
Sequencing Saturation,0.189946
Q30 Bases in CB+UMI,0.734679
Q30 Bases in RNA read,0.711272
Reads Mapped to Genome: Unique+Multiple,0.450167
Reads Mapped to Genome: Unique,0.450167
Reads Mapped to

In [3]:
gunzip demultiplexed_indrop_index/merged.Solo.out/Gene/raw/*

/lustre1/project/stg_00002/lcb/fderop/scripts/STAR-2.7.8a/bin/Linux_x86_64/STAR \
    --runThreadN 36 \
    --runMode soloCellFiltering demultiplexed_indrop_index/merged.Solo.out/Gene/raw demultiplexed_indrop_index/merged.Solo.out/Gene/filtered_27094/ \
    --soloCellFilter CellRanger2.2 27094 0.99 10 \
    --soloCBmatchWLtype 1MM \
    --quantMode GeneCounts \
    --soloFeatures Gene GeneFull

gzip: demultiplexed_indrop_index/merged.Solo.out/Gene/raw/barcodes.tsv: unknown suffix -- ignored
gzip: demultiplexed_indrop_index/merged.Solo.out/Gene/raw/features.tsv: unknown suffix -- ignored
gzip: demultiplexed_indrop_index/merged.Solo.out/Gene/raw/matrix.mtx: unknown suffix -- ignored
Feb 09 15:17:09 ..... started STAR run
Feb 09 15:17:09 ..... starting SoloCellFiltering
Feb 09 15:17:40 ..... finished successfully


In [6]:
wc -l demultiplexed_indrop_index/merged.Solo.out/Gene/filtered_31293/barcodes.tsv

20845 demultiplexed_indrop_index/merged.Solo.out/Gene/filtered_31293/barcodes.tsv
