HyDrop produces a steady stream of cell/bead emulsion. The user can pick the volume of emulsion for droplet PCR and downstream processing. A large emulsion volume will lead to fewer steps downstream, and thus less work to do. However, if the number of cells is too high relative to the barcode complexity (total number of barcodes available to index cells), the odds of barcode collisions (two different cells receiving the same barcode by chance) increases according to a poisson distribution.  

For each replicate, we created an emulsion containing the equivalent of 3000 recovered cells. We split each of this emulsion into two parts, and indexed them separately to avoid these barcode collisions. We do this only with the 384x384 version of hydrop, as the barcode complexity is only ~140k compared to the 96x96x96 complexity of ~880k.

These 4 aliquots are manifested as such:

```
fragments_mm/HYA__24010b__20210813_384_PBMC_11_S9.sinto.mm.fragments.tsv
fragments_mm/HYA__2beafa__20210813_384_PBMC_12_S10.sinto.mm.fragments.tsv
fragments_mm/HYA__3d6da9__20210813_384_PBMC_21_S11.sinto.mm.fragments.tsv
fragments_mm/HYA__5028cb__20210813_384_PBMC_22_S12.sinto.mm.fragments.tsv
```
These aliquot fragments files each bear a number denoting replicate-aliquot. e.g. `HYA__3d6da9__20210813_384_PBMC_21_S11` corresponds to aliquot 1 from replicate 2. It is clear that the two fragments file originating from the same emulsion can be merged into a single fragments file. In order to do this however, we must append the aliquot number to each aliquot's barcodes. If we did not do this, then we would not be able to distinguish different cells which by chance were indexed with the same barcode, but in different aliquots. This is done below.

# Merge fragments per-sample and append samplename

We continue with the `bam_postbap/` and `fragments_mm` dirs generated in notebook 1

In [None]:
gunzip fragments_mm/*.tsv.gz

In [2]:
ls fragments_mm/*.tsv

fragments_mm/HYA__24010b__20210813_384_PBMC_11_S9.sinto.mm.fragments.tsv
fragments_mm/HYA__2beafa__20210813_384_PBMC_12_S10.sinto.mm.fragments.tsv
fragments_mm/HYA__3d6da9__20210813_384_PBMC_21_S11.sinto.mm.fragments.tsv
fragments_mm/HYA__5028cb__20210813_384_PBMC_22_S12.sinto.mm.fragments.tsv


In [3]:
module load mawk

Then, for each aliquot, add a unique identifier to each fragments file barcode which denotes the sample and aliquot origin.

In [10]:
frags=fragments_mm/HYA__5028cb__20210813_384_PBMC_22_S12.sinto.mm.fragments.tsv
newname=${frags%.tsv}.ID.tsv
mawk '{ print $1 "\t" $2 "\t" $3 "\t" $4 "-22\t" $5}' $frags > $newname

In [11]:
ls fragments_bap/fragments_mm/*ID.tsv

fragments_bap/fragments_mm/HYA__24010b__20210813_384_PBMC_11_S9.sinto.mm.fragments.ID.tsv
fragments_bap/fragments_mm/HYA__2beafa__20210813_384_PBMC_12_S10.sinto.mm.fragments.ID.tsv
fragments_bap/fragments_mm/HYA__3d6da9__20210813_384_PBMC_21_S11.sinto.mm.fragments.ID.tsv
fragments_bap/fragments_mm/HYA__5028cb__20210813_384_PBMC_22_S12.sinto.mm.fragments.ID.tsv


Then merge the fragments files from same run

In [11]:
module load BCFtools

In [None]:
newfile=fragments_mm/VIB_Hydrop_1.sinto.mm.fragments.tsv
sort --parallel=8 -S 8G -k1,1 -k 2,2n -k3,3n fragments_mm/HYA__24010b__20210813_384_PBMC_11_S9.sinto.mm.fragments.ID.tsv fragments_mm/HYA__2beafa__20210813_384_PBMC_12_S10.sinto.mm.fragments.ID.tsv > $newfile
bgzip -@ 4 -i $newfile
tabix -p bed $newfile.gz

In [None]:
newfile=fragments_mm/VIB_Hydrop_2.sinto.mm.fragments.tsv
sort --parallel=8 -S 8G -k1,1 -k 2,2n -k3,3n fragments_mm/HYA__3d6da9__20210813_384_PBMC_21_S11.sinto.mm.fragments.ID.tsv fragments_mm/HYA__5028cb__20210813_384_PBMC_22_S12.sinto.mm.fragments.ID.tsv > $newfile
bgzip -@ 4 -i $newfile
tabix -p bed $newfile.gz

# similarly, merge bams

need to add an id to all the barcodes in the bams first

In [16]:
bam=bam_postbap/HYA__24010b__20210813_384_PBMC_11_S9.bwa.out.possorted.mm.bam
outbam=${bam%.bam}.ID.bam
/staging/leuven/stg_00002/lcb/ghuls/software/single_cell_toolkit_rust/target/release/append_string_to_bc_tag_try $bam $outbam DB '-11' &

[1] 15971


In [18]:
bam=bam_postbap/HYA__2beafa__20210813_384_PBMC_12_S10.bwa.out.possorted.mm.bam
outbam=${bam%.bam}.ID.bam
/staging/leuven/stg_00002/lcb/ghuls/software/single_cell_toolkit_rust/target/release/append_string_to_bc_tag_try $bam $outbam DB '-12' &

[2] 16061


In [19]:
bam=bam_postbap/HYA__3d6da9__20210813_384_PBMC_21_S11.bwa.out.possorted.mm.bam
outbam=${bam%.bam}.ID.bam
/staging/leuven/stg_00002/lcb/ghuls/software/single_cell_toolkit_rust/target/release/append_string_to_bc_tag_try $bam $outbam DB '-21' &

[3] 16081


In [20]:
bam=bam_postbap/HYA__5028cb__20210813_384_PBMC_22_S12.bwa.out.possorted.mm.bam
outbam=${bam%.bam}.ID.bam
/staging/leuven/stg_00002/lcb/ghuls/software/single_cell_toolkit_rust/target/release/append_string_to_bc_tag_try $bam $outbam DB '-22' &

[4] 16118


Then, merge the resulting bamsm

In [24]:
module load SAMtools
samtools merge -@ 12 -o bam_postbap/Hydrop_1.bwa.out.possorted.mm.bam bam_postbap/HYA__24010b__20210813_384_PBMC_11_S9.bwa.out.possorted.mm.ID.bam bam_postbap/HYA__2beafa__20210813_384_PBMC_12_S10.bwa.out.possorted.mm.ID.bam -f &
samtools merge -@ 12 -o bam_postbap/Hydrop_2.bwa.out.possorted.mm.bam bam_postbap/HYA__3d6da9__20210813_384_PBMC_21_S11.bwa.out.possorted.mm.ID.bam bam_postbap/HYA__5028cb__20210813_384_PBMC_22_S12.bwa.out.possorted.mm.ID.bam -f &


The following have been reloaded with a version change:
  1) XZ/5.2.4-GCCcore-6.4.0 => XZ/5.2.5-GCCcore-6.4.0

[1] 18219
[2] 18233


In [34]:
samtools index bam_postbap/Hydrop_1.bwa.out.possorted.mm.bam &
samtools index bam_postbap/Hydrop_2.bwa.out.possorted.mm.bam &

[3] 22295
[4] 22296


The final result is 2 instead of 4 bams and fragments files, each corresponding to the hydrop replicate runs.