# BAM quality control for 10xNEW samples

In [1]:
%%bash
module load qualimap java


The following have been reloaded with a version change:
  1) R/4.3.0 => R/4.2.2     2) java/17 => java/11



In [12]:
%%bash
## Load module
module load qualimap java

## Define variables
ref_gen_ver='v3.5'
batch='10xNEW-Plate9'
WORKDIR='/nfs/scistore18/bartogrp/apal/snap_hap'
bamfolder=$WORKDIR/bams/${ref_gen_ver}/ema_align/$batch
bamfile=$(sed -n "1p" <(ls $bamfolder/*bam))
outfolder=$WORKDIR/bamqc/${ref_gen_ver}/$batch/$(basename ${bamfile/.sorted.BXnum.bam/_stats})
if [ ! -d $outfolder ]; then mkdir -p $outfolder; fi

## Print variables
echo BAM File: $bamfile
echo BAMqc output Folder: $outfolder

## BAM-QC 
time qualimap bamqc -bam $bamfile \
        --paint-chromosome-limits \
        -hm 3 -nr 1000 -nw 400 -nt 24 \
        --collect-overlap-pairs \
        -outdir $outfolder \
        -outformat PDF:HTML \
        --skip-dup-mode 2 \
        --java-mem-size=8G #only set for 60x samples
        # --output-genome-coverage $outfolder/coverage.txt


The following have been reloaded with a version change:
  1) R/4.3.0 => R/4.2.2     2) java/17 => java/11



BAM File: /nfs/scistore18/bartogrp/apal/snap_hap/bams/v3.5/ema_align/10xNEW-Plate9/10xNEW-Plate9-10_Am_Pla_pb0873_v3.5.sorted.BXnum.bam
BAMqc output Folder: /nfs/scistore18/bartogrp/apal/snap_hap/bamqc/v3.5/10xNEW-Plate9/10xNEW-Plate9-10_Am_Pla_pb0873_v3.5_stats
Java memory size is set to 1200M
Launching application...





QualiMap v.2.2.1
Built on 2016-10-03 18:14

Selected tool: bamqc
Available memory (Mb): 33
Max memory (Mb): 1258

Starting bam qc....
Loading sam header...
Loading locator...
Loading reference...
Number of windows: 400, effective number of windows: 407
Chunk of reads size: 1000
Number of threads: 24
Processed 50 out of 407 windows...
Processed 100 out of 407 windows...
Processed 150 out of 407 windows...
Processed 200 out of 407 windows...
Processed 250 out of 407 windows...
Processed 300 out of 407 windows...
Processed 350 out of 407 windows...
Processed 400 out of 407 windows...
Total processed windows:407
Number of reads: 20871614
Number of valid reads: 20417220
Number of correct strand reads:0

Inside of regions...
Num mapped reads: 20417220
Num mapped first of pair: 10234066
Num mapped second of pair: 10183154
Num singletons: 388600
Time taken to analyze reads: 167
Computing descriptors...
numberOfMappedBases: 2588200661
referenceSize: 508957748
numberOfSequencedBases: 2580241405



real	2m55.044s
user	21m22.643s
sys	0m16.944s


## Running BAM-QC on the cluster

For actual runs, __*coverage.txt*__ at each basepair is **NOT** stored. 

In [None]:
# %%bash
# ## 10xNEW-Plate9
# cd /nfs/scistore18/bartogrp/apal/snap_hap/bamqc/jobs/10xNEW-Plate9
sbatch --array=1-64 /nfs/scistore18/bartogrp/apal/snap_hap/_scripts/sbatch/bam-utils/job-bamqc.sbatch \
    v3.5 \
    10xNEW-Plate9

# ## 10xNEW-Plate10
cd /nfs/scistore18/bartogrp/apal/snap_hap/bamqc/jobs/10xNEW-Plate10
sbatch --array=1-96 /nfs/scistore18/bartogrp/apal/snap_hap/_scripts/sbatch/bam-utils/job-bamqc.sbatch \
    v3.5 \
    10xNEW-Plate10

## Running multi-BAM-QC

### 10xNEW samples ONLY

In [10]:
%%bash
cd /nfs/scistore18/bartogrp/apal/snap_hap/bamqc/multi_bamqc/
## Make config file
paste <(realpath /nfs/scistore18/bartogrp/apal/snap_hap/bamqc/v3.5/10xNEW-Plate*/*_stats | cut -d/ -f10 | cut -d_ -f 1-4) \
        <(realpath /nfs/scistore18/bartogrp/apal/snap_hap/bamqc/v3.5/10xNEW-Plate*/*_stats) > config_files/config_10xNEW.txt

In [2]:
%%bash
## Load modules
module load qualimap java

## Set variables 
WORKDIR='/nfs/scistore18/bartogrp/apal/snap_hap'
batch='10xNEW'

## Workding directory
cd ~/snap_hap/bamqc/multi_bamqc/

time qualimap multi-bamqc --data $WORKDIR/bamqc/multi_bamqc/config_files/config_10xNEW.txt \
        -outdir $WORKDIR/bamqc/multi_bamqc/$batch \
        -outformat PDF:HTML \
        --java-mem-size=64G


The following have been reloaded with a version change:
  1) R/4.3.0 => R/4.2.2     2) java/17 => java/11



Java memory size is set to 64G
Launching application...





QualiMap v.2.2.1
Built on 2016-10-03 18:14

Selected tool: multi-bamqc

Running multi-sample BAM QC

Checking input paths
Loading sample data
Creating charts

Preparing result report
Writing PDF:HTML report...
HTML report created successfully

PDF file created successfully 

Finished



real	0m16.863s
user	0m24.835s
sys	0m1.196s


In [3]:
%%bash
## Run multi-BAM-QC on cluster

# cd /nfs/scistore18/bartogrp/apal/snap_hap/bamqc/jobs/multi_bamqc
# sbatch --array=1 /nfs/scistore18/bartogrp/apal/snap_hap/_scripts/sbatch/bam-utils/job-bamqc.sbatch \
#     v3.5 \
#     10xNEW

In [18]:
import numpy as np
import matplotlib.pyplot as plt
import os

cov = open("/nfs/scistore18/bartogrp/apal/snap_hap/bamqc/v3.5/10xNEW-Plate9/10xNEW-Plate9-1_Am_Pla_pb0810_v3.5_stats/raw_data_qualimapReport/coverage_across_reference.txt", "r")
print(cov.read())

#Position (bp)	Coverage
636198.0	6.255404178733805
1908593.0	3.9944938482153733
3180988.0	6.554929090416105
4453383.0	3.6413487949889776
5725778.0	4.178345560930372
6998173.0	4.344840242220379
8270568.0	4.173998640359322
9542963.0	4.411609602364046
1.0815358E7	4.205695558376133
1.2087753E7	4.737125656733954
1.3360148E7	4.053925864216693
1.4632543E7	3.930739275146476
1.5904938E7	4.069487855579439
1.7177333E7	4.309986285705304
1.8449728E7	4.165773993138923
1.9722123E7	4.117674935849323
2.0994518E7	3.9126254032749266
2.2266913E7	4.3000962751346865
2.3539308E7	4.5856019553676335
2.4811703E7	4.172883420635888
2.6084098E7	4.672529363916079
2.7356493E7	4.00145395101364
2.8628888E7	5.061606655166045
2.9901283E7	5.115373763650439
3.1173678E7	4.608748855504777
3.2446073E7	4.25073110158402
3.3718468E7	4.467834281021224
3.4990863E7	4.020124253867706
3.6263258E7	5.259439875195989
3.7535653E7	4.192444956165342
3.8808048E7	3.5324321456780323
4.0080443E7	3.7930650466246725
4.1352838E7	4.17522624656651