# Esercizio7 - Pysam

Prendere in input un file in formato `BAM` che contiene allineamenti al cromosoma X (*reference*) e:

- controllare se sono presenti *paired-end* reads
- determinare le lunghezze degli introni supportati dagli allineamenti del file `BAM`, senza utilizzare il metodo `find_introns()`
- determinare la base della *reference* che ha la massima copertura in termini di reads allineati e produrre un file SAM contenente gli allineamenti che coprono tale base

### 1) Importare `pysam` e la classe `AlignmentFile`

In [2]:
import pysam

In [3]:
from pysam import AlignmentFile

### 2) Leggere il file `BAM` in input

In [4]:
pysam.index('./sample.bam')

''

In [5]:
bam_file = AlignmentFile('./sample.bam', 'rb')

In [6]:
bam_file

<pysam.libcalignmentfile.AlignmentFile at 0x1118280d0>

### 3) Controllare se sono presenti *paired-end* reads.

In [7]:
alignment_iter = bam_file.fetch()
alignment_list = list(alignment_iter)

any([alignment.is_paired for alignment in alignment_list])

False

### 4) Determinare le lunghezze degli introni supportati dagli allineamenti nel file `BAM`

a) Ricavare l'insieme delle lunghezze degli introni supportati dalle cigar strings che contengono un'operazione `N` (cioé un inserimento nella reference che corrisponde a un introne).

In [8]:
import re

In [9]:
set([int(re.search(r'(\d+)N', alignment.cigarstring).group(1)) for alignment in alignment_list if 'N' in alignment.cigarstring])

{57, 287, 309, 598, 980, 1514, 1999, 4116, 4226}

b) Verificare che si trova la stessa cosa utilizzando il metodo `find_introns()`

In [10]:
set([intron[1]-intron[0] for intron in bam_file.find_introns(bam_file.fetch())])

{57, 287, 309, 598, 980, 1514, 1999, 4116, 4226}

### 5) Trovare la base della reference che ha copertura massima

a) Determinare la lista delle colonne di *pileup*.

In [12]:
pileup_iter = bam_file.pileup()

In [13]:
pileup_columns = list(pileup_iter)

In [14]:
pileup_columns

[<pysam.libcalignedsegment.PileupColumn at 0x111819ac0>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819b30>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819ba0>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819c10>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819c80>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819cf0>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819d60>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819dd0>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819e40>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819eb0>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819f20>,
 <pysam.libcalignedsegment.PileupColumn at 0x111819f90>,
 <pysam.libcalignedsegment.PileupColumn at 0x112ced040>,
 <pysam.libcalignedsegment.PileupColumn at 0x112ced0b0>,
 <pysam.libcalignedsegment.PileupColumn at 0x112ced120>,
 <pysam.libcalignedsegment.PileupColumn at 0x112ced190>,
 <pysam.libcalignedsegment.PileupColumn at 0x112ced200>,
 <pysam.libcalignedsegment.Pile

b) Estrarre le colonna di altezza massima (cioé coperta dal maggior numero di allineamenti).

In [18]:
max_height = max([pileup_col.nsegments for pileup_col in pileup_columns])

In [21]:
max_pileup_col = [pileup_col for pileup_col in pileup_columns if pileup_col.nsegments == max_height][0]

In [22]:
max_pileup_col.nsegments

1469

In [23]:
max_pileup_col.pos

286723

### 6) Produrre il file `SAM` contenente gli allineamenti che coprono la base di copertura massima.

Produrre gli allineamenti in un SAM file utilizzando la stessa Header Section del BAM file.

In [24]:
max_pileup_col.set_min_base_quality(0)

In [25]:
max_pileup_col.pileups

[<pysam.libcalignedsegment.PileupRead at 0x112e5ed60>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f310>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f4a0>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f680>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f5e0>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f720>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f770>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f6d0>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f7c0>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f860>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f8b0>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f950>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5f9a0>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5fc70>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5fbd0>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5fcc0>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5fd10>,
 <pysam.libcalignedsegment.PileupRead at 0x112e5fd60>,
 <pysam.li

In [27]:
pileup_alignments = [pileup_read.alignment for pileup_read in max_pileup_col.pileups]

In [28]:
output_file = pysam.AlignmentFile('./prova.sam', 'w', template=bam_file)

In [29]:
for alignment in pileup_alignments:
    output_file.write(alignment)
    
output_file.close()