# Molecular types

The ``MolType`` object provides services for resolving ambiguities, or providing the correct ambiguity for recoding. It also maintains the mappings between different kinds of alphabets, sequences and alignments.

If your analysis involves handling ambiguous states, or translation via a genetic code, it's critical to specify the appropriate moltype.

## Available molecular types

In [1]:
from cogent3 import available_moltypes

available_moltypes()

Abbreviation,Number of states,Moltype
ab,2,"MolType(('a', 'b'))"
dna,4,"MolType(('T', 'C', 'A', 'G'))"
rna,4,"MolType(('U', 'C', 'A', 'G'))"
protein,21,"MolType(('A', 'C', 'D', 'E', 'F', 'G', ..."
protein_with_stop,22,"MolType(('A', 'C', 'D', 'E', 'F', 'G', ..."
text,52,"MolType(('a', 'b', 'c', 'd', 'e', 'f', ..."
bytes,256,"MolType(('\x00', '\x01', '\x02', '\x03'..."


For statements that have a `moltype` argument, use the entry under the "Abbreviation" column. For example:

```python
from cogent3 import load_aligned_seqs

seqs = load_aligned_seqs("path/to/data.fasta", moltype="dna")
```

## Getting a `MolType`

In [2]:
from cogent3 import get_moltype

dna = get_moltype("dna")
dna

MolType(('T', 'C', 'A', 'G'))

## Using a `MolType` to get ambiguity codes

Just using `dna` from above.

In [3]:
dna.ambiguities

{'?': ('T', 'C', 'A', 'G', '-'),
 '-': ('-',),
 'N': ('A', 'C', 'T', 'G'),
 'R': ('A', 'G'),
 'Y': ('C', 'T'),
 'W': ('A', 'T'),
 'S': ('C', 'G'),
 'K': ('T', 'G'),
 'M': ('C', 'A'),
 'B': ('C', 'T', 'G'),
 'D': ('A', 'T', 'G'),
 'H': ('A', 'C', 'T'),
 'V': ('A', 'C', 'G'),
 'T': ('T',),
 'C': ('C',),
 'A': ('A',),
 'G': ('G',)}

## `MolType` definition of degenerate codes 

In [4]:
dna.degenerates

{'N': ('A', 'C', 'T', 'G'),
 'R': ('A', 'G'),
 'Y': ('C', 'T'),
 'W': ('A', 'T'),
 'S': ('C', 'G'),
 'K': ('T', 'G'),
 'M': ('C', 'A'),
 'B': ('C', 'T', 'G'),
 'D': ('A', 'T', 'G'),
 'H': ('A', 'C', 'T'),
 'V': ('A', 'C', 'G'),
 '?': 'TCAG-'}

## Nucleic acid `MolType` and complementing


In [5]:
dna.complement("AGG")

'TCC'

## Making sequences

Use the either the top level `cogent3.make_seq` function, or the method on the `MolType` instance.

In [6]:
seq = dna.make_seq("AGGCTT", name="seq1")
seq

DnaSequence(AGGCTT)

## Verify sequences

In [7]:
rna = get_moltype("rna")
rna.is_valid("ACGUACGUACGUACGU")

True

## Making a custom ``MolType``

We demonstrate this by customising DNA so it allows ``.`` as gaps

In [8]:
from cogent3.core import moltype as mt

DNAgapped = mt.MolType(seq_constructor=mt.DnaSequence,
                       motifset=mt.IUPAC_DNA_chars,
                       ambiguities=mt.IUPAC_DNA_ambiguities,
                       complements=mt.IUPAC_DNA_ambiguities_complements,
                       pairs=mt.DnaStandardPairs,
                       gaps='.')
seq = DNAgapped.make_seq('ACG.')
seq

DnaSequence(ACG.)

In [9]:
from cogent3 import DNA
from cogent3.core.sequence import DnaSequence
DnaSequence.moltype = DNA