# Biomolecules and BuildAMol

> ### In this tutorial we will cover:
> - how we can use the packages in the `bio` extension to make
>   - peptides
>   - lipids
>   - glycans
>   - oligonucleotides

## The Bio Extension

BuildAMol comes with a `bio` extension that contains functions to quickly model small biomolecules. Currently available are functions to model peptides, glycans, different types of lipids, as well as small stretches of DNA or RNA. 

We can use these by importing the respective packages from the extension. Here we will talk more about them. 

## Making a Peptide

We can use the `peptide` function from the `bio.proteins` package to obtain a model from a single-letter code amino acid sequence. The implementation is very low-level. While it will work with large sequences of amino acids, do not think that you can use it to model protein structures! It can make models for small peptides but cannot model secondary or tertiary structures!

In [2]:
# import the proteins extension
from buildamol.extensions.bio import proteins

# make a peptide
peptide_seq = "MAARGRRAWLSVLLGLVLGF"
peptide = proteins.peptide(peptide_seq)

# now optimize using rdkit's forcefield
peptide.optimize(algorithm="rdkit")

# show the peptide
peptide.py3dmol().show()

And there we have a peptide molecule that we can work with. Of course, it does not have any notable structure, BuildAMol does not do any kind of folding after all!

## Making a Glycan

In glycobiology the [IUPAC nomenclature](https://iupac.qmul.ac.uk/2carb/) has been widely used to represent glycans in textual form. This is because glycans tend to produce very long and complex SMILES strings, which is also why they are often difficult to compute from SMILES. Different flavors of the IUPAC nomenclature exist. BuildAMol supports the _condensed_ version and can read text inputs in that format. 

To create a glycan model from an IUPAC string we can import the `glycans` extension like so:

In [8]:
# import the glycans extension
from buildamol.extensions.bio import glycans

# make a glycan
iupac = "Gal(b1-3)[Fuc(a1-4)]Man(b1-4)GalNAc(b1-4)GlcNAc"
glycan = glycans.glycan(iupac)

# now optimize using rdkit's forcefield
glycan.optimize(algorithm="rdkit")

# show the glycan
glycan.py3dmol().show()

And there we have a small glycan model that we can use further...

## Making Lipids

Lipids are a little more diverse than peptides or glycans. For once, the length and saturation of fatty acids can be very flexible. Furthermore, different types of lipids include different backbones or head groups. In the `lipids` extension BuildAMol offers functions to create:
- fatty acids
- mono-, di-, and triacylglycerols (all in the `triacylglycerol` function)
- phospholipids
- sphingolipids

Let's explore a bit how we can work with these functions:

In [9]:
# import the lipids extension
from buildamol.extensions.bio import lipids

### Making Fatty Acids

Using the `fatty_acid` function we can control the length as well as saturation of the fatty acids we produce. The saturation can be controlled by either specifying exactly where we want double bonds and whether they are supposed to be in cis configuration, or by simply providing inputs for the number of double bonds as well as a probability of a bond to be in cis configuration. Like so we can quickly both generate specific fatty acids or a population of random ones. Here's how:

In [11]:
# make a fatty acid with 18 carbons and 2 double bonds, with a 50% chance for cis configuration
fa1 = lipids.fatty_acid(18, 2, cis=0.5)
fa1.py3dmol().show()

And there we have one fatty acid! Let's make some more:

In [12]:
# make a fatty acid with 16 carbons and one double bond at the 9th position in trans configuration
# to specify individual double bonds, we use a tuple of positions rather than an integer
fa2 = lipids.fatty_acid(16, (9,), cis=False)
fa2.py3dmol().show()

In [13]:
# make a fatty acid with 20 carbons and 2 double bonds at the 5th and 8th positions in cis and trans configuration
fa3 = lipids.fatty_acid(20, (5, 8), cis=(True, False))
fa3.py3dmol().show()

Feel free to play around a little more if you like. Now let's make some larger lipids using these fatty acids!

### Making Acylglycerols

Using the function `triacylglycerol` we can make mono-, di-, and triacylglycerols by passing one, two, or three, fatty acid molecules as arguments. Positions where no fatty acid should be we can specify by passing `None`. 

In [15]:
# make a triacylglycerol 
tag = lipids.triacylglycerol(fa1, fa2, fa3)
tag.py3dmol().show()

In [20]:
#  make a diacylglycerol with the middle position empty
dag = lipids.triacylglycerol(fa1, None, fa3)
dag.py3dmol().show()

### Making Phospho- and Sphingolipids

Phospho- and Sphingolipids haves two fatty acid chains and one headgroup. The headgroups are more diverse in structure which is why the `phospholipid` and `sphingolipid` function require in addition to the molecules themselves also a _Linkage_ that defines how the headgroup should be connected. The Linkage only needs to specify the _source_ atom and deleters, however, the target settings will be automatically applied.

In [26]:
# make a phosphatidylserine
from buildamol import molecule, linkage, load_amino_acids

# load the amino acids to get a serine molecule
load_amino_acids()
ser = molecule("SER")

# define the linkage with which to attach the serine to the glycerol
# (we do not need to specify the atom1 because the position of where the headgroup is attached
# is already known in the glycerol, the only uncertainty is which headgroup atoms to use)
link = linkage(atom1=None, atom2="CB", delete_in_source=["OG", "HG"])

# make the phosphatidylserine
ps = lipids.phospholipid(fa1, fa2, headgroup=ser, headgroup_link=link)
ps.py3dmol().show()

In [33]:
# make a sphingoglycolipid
# let's actually just attach the glycan we made before as a headgroup

# make sure we attach the glycan via it's first (i.e. root) residue
glycan.set_attach_residue(1)

# define the linkage with which to attach the glycan to the sphingosine
link = linkage(None, "C1", delete_in_source=["O1", "HO1"])

# make the sphingoglycolipid
sgl = lipids.sphingolipid(fa1, headgroup=glycan, headgroup_link=link)
sgl.py3dmol().show()

## Making Nucleic Acids

Oligonucleotides are again a simpler case that is very similar to the peptide extension. We can create small oligonucleotides from a sequence using the `dna` or `rna` functions like so:

In [2]:
from buildamol.extensions.bio import nucleic_acids

# make an RNA strand
seq =  "ACCUCAAGAGACUAC"
rna = nucleic_acids.rna(seq)

# now optimize using rdkit's forcefield
rna.optimize(algorithm="rdkit")
rna.py3dmol().show()

So that just about wraps up the `bio` extension! We saw how we can build simple peptides, glycans, various lipids, and nucleic acids using easy toplevel functions. Of course, the lipids especially might require some additional post-processing to align the fatty acid chains properly (if one wanted to construct things like membranes for instance). 

Thanks for checking out this tutorial and good luck in your next project using BuildAMol!