In [None]:
%mkdir samtools_out

#### 1. convert `sam` into `bam`

In [9]:
%%bash
samtools view \
-b -S -o \
./samtools_out/lsdv_51.bam \
./bowtie_out/lsdv_51.sam 

#### 2. sorting bam file by genome position 

In [10]:
%%bash
samtools sort ./samtools_out/lsdv_51.bam > ./samtools_out/lsdv_51_sorted.bam

[bam_sort_core] merging from 4 files...


#### 3. removing pcr duplicates. but: see note below. rmdup is an old command.

In [7]:
%%bash
samtools rmdup  -s ./samtools_out/lsdv_51_sorted.bam ./samtools_out/lsdv_51_sorted_rmdup.bam

[bam_rmdupse_core] 26074 / 126444 = 0.2062 in library '	'


#### 4th and 5th steps one liners are taken from here :
https://sarahpenir.github.io/bioinformatics/awk/calculating-mapping-stats-from-a-bam-file-using-samtools-and-awk/

#### 4. mean depth using bash one liner:
- `-a` calculate depth at all positions 

In [8]:
%%bash
samtools depth -a ./samtools_out/lsdv_51_sorted_rmdup.bam | awk '{c++;s+=$3}END{print s/c}'

139.233


#### 5. mean breadth of coverage using bash one liner :

In [11]:
%%bash
samtools depth -a ./samtools_out/lsdv_51_sorted_rmdup.bam | awk '{c++; if($3>0) total+=1}END{print (total/c)*100}'

99.9556


#### mean depth using Python :

In [13]:
%%bash
samtools depth -a ./samtools_out/lsdv_51_sorted_rmdup.bam > ./samtools_out/lsdv_51_depth.txt

In [30]:
import pandas as pd

In [31]:
cov_df = pd.read_csv("./samtools_out/lsdv_51_depth.txt", sep="\t", header=None)

In [32]:
cov_df.columns = ["chr", "pos", "cov"]

In [40]:
cov_df["cov"] = pd.to_numeric(cov_df["cov"])
cov_df["pos"] = pd.to_numeric(cov_df["pos"])

In [37]:
total_cov = cov_df["cov"].sum()

In [41]:
ref_seq_len = len(cov_df["cov"])

#### mean coverage depth

In [47]:
mean_cov_depth = int(total_cov / ref_seq_len)
mean_cov_depth

139

In [60]:
covered = 0
for i in cov_df["cov"]:
    if i != 0:
        covered += 1
        

#### coverage breadth

In [62]:
cov_breadth = (num_zeros / ref_seq_len) * 100
round(cov_breadth, 2)

99.96

#### flagstat creates summary statistics :

In [14]:
%%bash
samtools flagstat ./samtools_out/lsdv_51_sorted_rmdup.bam > ./samtools_out/lsdv_51_flagstat.txt

#### 6. indexing sorted file (esp. if we want to have a look at it in IGV viewer)

In [12]:
%%bash
samtools index ./samtools_out/lsdv_51_sorted_rmdup.bam

In [6]:
%%bash
samtools


Program: samtools (Tools for alignments in the SAM format)
Version: 1.3.1 (using htslib 1.3.1)

Usage:   samtools <command> [options]

Commands:
  -- Indexing
     dict           create a sequence dictionary file
     faidx          index/extract FASTA
     index          index alignment

  -- Editing
     calmd          recalculate MD/NM tags and '=' bases
     fixmate        fix mate information
     reheader       replace BAM header
     rmdup          remove PCR duplicates
     targetcut      cut fosmid regions (for fosmid pool only)
     addreplacerg   adds or replaces RG tags

  -- File operations
     collate        shuffle and group alignments by name
     cat            concatenate BAMs
     merge          merge sorted alignments
     mpileup        multi-way pileup
     sort           sort alignment file
     split          splits a file by read group
     quickcheck     quickly check if SAM/BAM/CRAM file appears intact
     fastq          converts a BAM to a FASTQ
     fast

NOTE: 

rmdup is obsolete in new samtools version, use markdup with -r instead :

http://www.htslib.org/doc/samtools.html

https://bioinformatics.stackexchange.com/questions/4615/difference-between-samtools-mark-duplicates-and-samtools-remove-duplicates

thus in this version there's no -m flax in fixmate command

In [2]:
%%bash
samtools fixmate -m ./samtools_out/lsdv_51_sorted.bam /samtools_out/lsdv_51_sorted_fixmate.bam

fixmate: invalid option -- 'm'
Usage: samtools fixmate <in.nameSrt.bam> <out.nameSrt.bam>
Options:
  -r           Remove unmapped reads and secondary alignments
  -p           Disable FR proper pair check
  -c           Add template cigar ct tag
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]

As elsewhere in samtools, use '-' as the filename for stdin/stdout. The input
file must be grouped by read name (e.g. sorted by name). Coordinated sorted
input is not accepted.
