# Esercizio 9

Dato in input un file in formato `BAM`:

- controllare se sono presenti *paired-end* reads
- determinare le lunghezze degli introni supportati dagli allineamenti
- determinare la base della *reference* che ha la massima copertura in termini di reads allineati e produrre un file SAM contenente solo gli allineamenti che coprono tale base

## Importare `pysam` e la classe `AlignmentFile`

In [2]:
import pysam
from pysam import AlignmentFile

## Leggere il file `BAM`

In [3]:
pysam.index('./sample.bam')
bam_file = AlignmentFile('./sample.bam', 'rb')

## Controllare se sono presenti *paired-end* reads.

In [4]:
alignment_iter = bam_file.fetch()

In [5]:
all_alignments = list(alignment_iter)

In [6]:
[alignment for alignment in all_alignments if alignment.is_paired] != []

False

In [7]:
any(alignment.is_paired for alignment in all_alignments)

False

## Determinare le lunghezze degli introni supportati dagli allineamenti

a) Ricavare l'insieme delle lunghezze degli introni supportati a partire dalle *cigar strings*.

In [10]:
import re

In [11]:
set([int(re.search(r'(\d+)N', alignment.cigarstring).group(1)) for alignment in all_alignments if 'N' in alignment.cigarstring])

{57, 287, 309, 598, 980, 1514, 1999, 4116, 4226}

b) Verificare che si trova lo stesso insieme utilizzando il metodo `find_introns()`

In [14]:
set(intron_end-intron_start for (intron_start, intron_end) in bam_file.find_introns(bam_file.fetch()))

{57, 287, 309, 598, 980, 1514, 1999, 4116, 4226}

## Trovare la base della reference che ha copertura massima

a) Determinare la lista delle colonne di *pileup*.

In [18]:
pileup_iter = bam_file.pileup()

In [19]:
pileup_columns = list(pileup_iter)

In [20]:
pileup_columns

[<pysam.libcalignedsegment.PileupColumn at 0x10e9700b0>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e970120>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e970190>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e970200>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e970270>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e9702e0>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e970350>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e9703c0>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e9704a0>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e970510>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e970580>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e9705f0>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e970660>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e9706d0>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e967f20>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e967eb0>,
 <pysam.libcalignedsegment.PileupColumn at 0x10e967f90>,
 <pysam.libcalignedsegment.Pile

b) Estrarre le colonna di altezza massima (cioé coperta dal maggior numero di allineamenti).

In [23]:
max_height = max(pileup_col.nsegments for pileup_col in pileup_columns)

In [27]:
max_pileup_col = [pileup_col for pileup_col in pileup_columns if pileup_col.nsegments == max_height].pop(0)

In [28]:
max_pileup_col.pos

286723

## Produrre il file `SAM` contenente gli allineamenti che coprono la base di copertura massima.

Produrre gli allineamenti in un SAM file utilizzando la stessa Header Section del BAM file.

In [29]:
max_pileup_col.set_min_base_quality(0)

In [30]:
max_pileup_col.pileups

[<pysam.libcalignedsegment.PileupRead at 0x10d2ea680>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2ea9f0>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2ea900>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2eaf90>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2ea180>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2eadb0>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2ea770>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f1f40>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f19f0>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f1db0>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f1ae0>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f1180>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f1d60>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f1f90>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f1b30>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f1630>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f12c0>,
 <pysam.libcalignedsegment.PileupRead at 0x10d2f1cc0>,
 <pysam.li

In [32]:
pileup_alignments = [pileup_read.alignment for pileup_read in max_pileup_col.pileups]

In [33]:
output_file = pysam.AlignmentFile('./max-coverage-position.sam', 'w', template = bam_file)

In [34]:
for alignment in pileup_alignments:
    output_file.write(alignment)
    
output_file.close()