In [None]:
import sys
sys.path.append('../')

import cigarmath as cm

# Blocks

In this context, a **block** is refering to a continious region of a sequence usually denoted as a pair of tuples.

## Alignment Blocks

Often, when provided with a set of cigartuples we want to know the start,end positions on the reference and the query sequence.
`cigarmath` provides a number of tools for this purpose.

Consider the following alignment example:

In [None]:
cigartuples = cm.cigarstr2tup('3H4M1D3M2I3M4H')
reference_start = 3

First, we may want to know where on the reference this read aligns to.
Our aligner will provide us with a **reference_start** but not a reference end.

In [None]:
# Length of the block along the reference
cm.reference_offset(cigartuples)

11

In [None]:
# Or as a block
cm.reference_block(cigartuples, reference_start=reference_start)

(3, 14)

The same can be done on the query side of the alignment.
The aligner will provide none of these values.

In [None]:
# This is just the left_clipping but included for consistency
cm.query_start(cigartuples)

3

In [None]:
# The length of the block along the query
cm.query_offset(cigartuples)

12

In [None]:
# The start/end location of the the block along the query
cm.query_block(cigartuples)

(3, 15)

Often times there is a need to calculate the overlap (or not) of different blocks.
`block_overlap_length` can be given two blocks and it will caluculate the amount of overlap.

In [None]:
cm.block_overlap_length((5, 10), (7, 13))

3

In [None]:
# They can also be provided in any order
cm.block_overlap_length((7, 13), (5, 10))

3

In [None]:
# Blocks that do not overlap will give negative values
# indicating the nearest distance
cm.block_overlap_length((5, 7), (9, 13))

-2

## Region Blocks

Given an alignment, sometimes we want to know the positions along the reference of large mapping or deleted regions.
Consider the example below:

In [None]:
cigartuples = cm.cigarstr2tup('7M3D4M6D4M')

If we wanted to know the regions that _map_ to the genome we can use `reference_mapping_blocks`

In [None]:
# Consider any sized deletion
list(cm.reference_mapping_blocks(cigartuples, deletion_split=1))

[(0, 7), (10, 14), (20, 24)]

In [None]:
# Consider only larger deletions
list(cm.reference_mapping_blocks(cigartuples, deletion_split=5))

[(0, 14), (20, 24)]

The converse is also available.

In [None]:
# Consider any sized deletion
list(cm.reference_deletion_blocks(cigartuples, min_size=1))

[(7, 10), (14, 20)]

In [None]:
# Consider only large deletions
list(cm.reference_deletion_blocks(cigartuples, min_size=5))

[(14, 20)]