> ### In this tutorial we will cover:
> - which built-in resources are available
> - how to set your own default settings

## Built-in resources

`biubuild` has three built-in data resources: the _CHARMM_ force field, the _PDBE compound library_, and _PubChem_ (remotely queried).

```mermaid

flowchart TB
  node_1(("CHARMM"))
  node_2["pre-defined linkages"]
  node_3(("PDBE Compounds"))
  node_4["small molecules"]
  node_5(("PubChem"))
  node_6["large(er) molecules as well"]
  node_1 --> node_2
  node_3 --> node_4
  node_5 --> node_4
  node_5 --> node_6
  
  

```

### CHARMM Force Field

In order to connect molecules together, the user may define their own `Linkage` by specifying which atoms to connect and which atoms to remove in the process. However, to make life easier, `biobuild` references the CHARMM force field which already specifies a number of linkage types - so-called `patches`. Each _patch_ specifies the atoms to connect and remove as well as the _internal coordinates_ around the newly formed bond. This allows biobuild to generate structures by pure matrix transformation as the resulting geometry is already specified. 

We can check what linkages are available by default using:

In [2]:
import biobuild as bb

bb.available_linkages()



[Linkage(12aa),
 Linkage(12ab),
 Linkage(12ba),
 Linkage(12bb),
 Linkage(14bb),
 Linkage(16ab),
 Linkage(16bb),
 Linkage(26ab),
 Linkage(13ab),
 Linkage(NGLB),
 Linkage(WGPA),
 Linkage(CERB)]

Each linkage is identified by an ID within the CHARMM force field - e.g. `12aa` stands for the `1->2 alpha glycosydic linkage`. Each of the pre-defined available linkages can be referenced by their (string) id when connecting molecules together.

For example, we can connect two glucoses using a `12aa` linkage by:

In [3]:
glc = bb.molecule("GLC")

# use pre-defined 12aa linkage
glc2 = bb.connect(glc, glc, "12aa")
glc2.show()

#### Setting default linkages

If a user has a particular kind of linkage that they keep using all the time but do not wish to store in a module of their own or copy/paste into their code repeatedly, they can use the `add_linkage` function to add their own linkage to the available defaults. This will make these linkages available by id throughout the entire session. If additionally `overwrite=True` is set, the linkage is added permanently to the defaults and will be available in all future sessions.

In [4]:
# define a custom 1->2 glycosydic linkage
my_12aa = bb.linkage("C1", "O2", ["O1", "HO1"], ["O2"], id="my_12aa")

# add the linkage to the library
bb.add_linkage(my_12aa)

bb.available_linkages()

[Linkage(12aa),
 Linkage(12ab),
 Linkage(12ba),
 Linkage(12bb),
 Linkage(14bb),
 Linkage(16ab),
 Linkage(16bb),
 Linkage(26ab),
 Linkage(13ab),
 Linkage(NGLB),
 Linkage(WGPA),
 Linkage(CERB),
 Linkage(my_12aa)]

If the set of available linkages is quite large and we want to check if a particular one is available, we can also use the `has_linkage` function to check if a linkage with a given id is pre-loaded in the current session.

In [5]:
bb.has_linkage("my_12aa")

True

#### The `CHARMMTopology` class

The data from the CHARMM force field is handled by the `CHARMMTopology` class, which parses (you guessed it) a CHARMM topology file (**not** parameter file) and stores its data in a dictionary structure. Its purpose is to store linkages.

The default instance of this class can be accessed using the `get_default_topology` function. Why is this useful? Well, if there is a "get"-default topology function there may be a "set"-version as well (which is totally the case). If you have your own CHARMM topology file with defined linkages and molecules, you can `read_topology` to parse your own file, use `set_default=True` to make your topology the default, and thus tailor biobuild to your specific needs. 

In [6]:
# read a custom topology file to make a CHARMMTopology
# (but don't set it as the default topology)
my_top = bb.read_topology("files/my_top.top", set_default=False)

# check out the patches (linkages) in the topology
my_top.patches

[Linkage(my_14bb), Linkage(my_16ab)]

If we want to use a non-default topology we can either specify the topology we want to use as an argument to functions and methods which accept a `_topology` argument, or we directly provide the linkage objects we obtain from the topology.

In [7]:
# connect the two clucoses using the `my_16ab` linkage from my_top

my_glc2 = bb.connect(glc, glc, "my_16ab", _topology=my_top)
# or
my_16ab = my_top.get_patch("my_16ab")
my_glc2 = bb.connect(glc, glc, my_16ab)

my_glc2.show()

### PDBE Compounds

biobuild maintains a part of the PDBE component library of small molecules to directly obtain molecular structures within while coding, without the need to download any pdb files externally. Molecules can be obtained from the library using their _PDB ID_, their names or some other available identifier such as SMILES. 

> In fact, when we called `bb.molecule("GLC")` earlier, the `molecule` function internally referenced the loaded PDBE compounds database, found that a compound with ID `"GLC"` was available, and returned the corresponding `Molecule` object. Referencing the built-in PDBE compounds database is the first go-to source for molecular structures in biobuild. 

Similar to the CHARMM topology, we can get the default instance of the `PDBECompounds` class that handles the databse using

In [9]:
comp = bb.get_default_compounds()
# print how many compounds are available in the library
len(comp)

36335