# Motif Objects in BioPython

## Context:
- A (sequence) motif is a nucleotide or amino-acid sequence pattern, formed by 3D arrangement of amino acids which may not be adjacent. 
- Biopython provides a separate module, **Bio.motifs** to access the functionalities of (sequence) motif.
- *Consensus Sequence*, defined as the sequence of letters along the positions of the motif for which the largest value in the corresponding columns of the counts matrix is obtained. The opposite is an *Anticonsensus Sequence*
- Note: There is some ambiguity in the definition of the consensus and anticonsensus sequence i.e., in some columns multiple nucleotides can have the maximum or minimum count. As a solution, we can create a *Degenerate Consensus Sequence*, in which ambiguous nucleotides (W = A/T, V = A/C/G) are used for positions where there are multiple nucleotides with high counts.

## Creating a DNA motif

In [1]:
from Bio import motifs

In [2]:
from Bio.Seq import Seq

instances = [
    Seq("AATTC"),
    Seq("ATTCC"),
    Seq("AAGTC"),
    Seq("ACTGC"),
    Seq("CATTG"),
    Seq("TAGTC"),
    Seq("GATAC")
]

In [7]:
m = motifs.create(instances)
print(m)

AATTC
ATTCC
AAGTC
ACTGC
CATTG
TAGTC
GATAC



In [8]:
m.instances # motif instances

[Seq('AATTC'),
 Seq('ATTCC'),
 Seq('AAGTC'),
 Seq('ACTGC'),
 Seq('CATTG'),
 Seq('TAGTC'),
 Seq('GATAC')]

In [9]:
m.counts # counting the occurence of nucleotides in each position

{'A': [4, 5, 0, 1, 0],
 'C': [1, 1, 0, 1, 6],
 'G': [1, 0, 2, 1, 1],
 'T': [1, 1, 5, 4, 0]}

In [10]:
m.counts['A'] # accessing the occurences of a nucleotide

[4, 5, 0, 1, 0]

In [11]:
m.counts[:, 3] # accessing occurence of nucleotides in a position

{'A': 1, 'C': 1, 'G': 1, 'T': 4}

In [12]:
m.consensus # the consensus sequence of the motif

Seq('AATTC')

In [13]:
m.anticonsensus # the anticonsensus sequence of the motif

Seq('CGAAA')

In [14]:
m.degenerate_consensus # the degenerate consensus sequence of the motif

Seq('AATTC')

## Summary: In this tutorial, among multiple instances of a DNA motif, we created consensus, anticonsensus and degeenrate consensus sequence instances.

# Finish!