# Motif Objects in BioPython

A sequence motif is a nucleotide or amino-acid sequence pattern. Sequence motifs are formed by three-dimensional arrangement of amino acids which may not be adjacent. BioPython provides a module `Bio.motifs` to access the functionalities of sequence motifs.

In [1]:
from Bio import motifs

## Creating a simple DNA Motif

In [2]:
from Bio.Seq import Seq

In [3]:
instances = [
    Seq("TACAA"),
    Seq("TACGC"),
    Seq("TACAC"),
    Seq("TACCC"),
    Seq("AACCC"),
    Seq("AATGC"),
    Seq("AATGC"),
]

In [4]:
m = motifs.create(instances)
m

<Bio.motifs.Motif at 0x19e26e56a50>

The instances are saved in an attribute `.instances`

In [5]:
m.instances



[Seq('TACAA'),
 Seq('TACGC'),
 Seq('TACAC'),
 Seq('TACCC'),
 Seq('AACCC'),
 Seq('AATGC'),
 Seq('AATGC')]

The Motif object has an attribute `.counts` containing the counts of each nucleotide at each position.

In [6]:
m.counts

{'A': [3.0, 7.0, 0.0, 2.0, 1.0],
 'C': [0.0, 0.0, 5.0, 2.0, 6.0],
 'G': [0.0, 0.0, 0.0, 3.0, 0.0],
 'T': [4.0, 0.0, 2.0, 0.0, 0.0]}

In [7]:
m.counts['A']

[3.0, 7.0, 0.0, 2.0, 1.0]

In [8]:
m.counts[:, 3]

{'A': 2.0, 'C': 2.0, 'G': 3.0, 'T': 0.0}

As well as **anticonsensus** sequence, corresponding to the smallest value in the columns of the `.counts` matrix

In [9]:
m.anticonsensus

Seq('CCATG')

Note that there is some ambigiuty in the definition of the consensus and anticonsensus sequence if in some columns multiple nucleotides have the maximum or minimum count. You can also ask for a degenerate consensus sequence, in which ambiguous nucleotides are used for positions where there are multiple nucleotides with high counts

In [10]:
m.degenerate_consensus

Seq('WACVC')

Here W and V follow the IUPAC nucleotide ambiguity codes: W is either A or T, and V is A, C or G