# Protein structure superposing



## **Superposing structures locally**

Web-based services are extremely powerful, especially when searching against large databases such as the PDBe archive or AlphaFold Database. However, you might be working with newly solved protein structures which are on one of these databases or want to modify your superposition command beyond the scope of the options provided with these tools. This is where superposing structures on your own machine can be useful. Most superposition software can run on modern consumer hardware in a reasonable time, which makes exploring proteins on a laptop completely feasible. We will be looking at two powerful superposition algorithms used in structural biology packages such as Pymol and Coot.

Running software from the command line provides several benefits over GUI-based applications. One advantage is the (sometimes significant) lower computational overhead required to run command line applications, vs those which have good looking GUIs. This can make running operations faster, especially on older hardware or where a visual interface is not immediately necessary. Secondly, once we know the the parameters we need, we can execute the program without traversing menus, toggling optional fields and locating the files in cumbersome file system managers. Running software from the command line also allows us to quickly repeat executions, modify parameters and even incorperate them into our own scripts. We can also obtain a log from our execution of the program -- a report of programmatic events, errors, warnings, values, runtime and more -- which can sometimes be very useful when trouble shooting problems. Although not exclusive to command-line applications, GUIs often omit this information for brevity. 

Many tools for protein superposition exist as command-line tools. We will be using the structual biology tool suite CCP4 as it contains several excellent superposition algorithms. Many also exist as webservers, which can be used to query databases such as the PDBe or AlphaFoldDB, or modules we can load into our own code. Let's take a closer look at CCP4!

----------------------


### 2.1) Setup CCP4

The software suite CCP4 contains several command-line programs for superposing structures. CCP4 should be installed on your virtual machine, but can be downloaded from [their website](https://www.ccp4.ac.uk/). Once installed, run the command below to enable you to execute programs from CCP4 as terminal commands:

> `source /path/to/ccp4-8.0/bin/ccp4.setup-sh`

You will need to adjust the path to point to the location of your CCP4 installation. For users working on EMBL-EBI's virtual machine, this is:

> ` change me source /path/to/ccp4-8.0/bin/ccp4.setup-sh`

Now you will be able to run any program offered by the CCP4 suite! You can find the documentation for all programs CCP4 contains [here](https://www.ccp4.ac.uk/html/). We will be limiting our use of CCP4 to only the programs useful for protein superposition in this tutorial. Let us begin by comparing the two superposition algorithms: SSM and GESAMT. 

### 2.2) Superpose: SSM 

> `superpose ./examples_mmcif/6mka.cif ./examples_mmcif/6mkf.cif -o superpose_example_output.pdb`

This command will superpose the structure `6mkf` to `6mka`, saving the new version to the file `superpose_example_output.pdb`. Either mmCIF or PDB file formats can be parsed into `superpose`, although the program currently returns the structure in PDB format only. 

### 2.3) Superpose: GESAMT

> `gesamt ./examples_mmcif/6mka.cif ./examples_mmcif/6mkf.cif -o gesamt_example_output.pdb`

This command will superpose the structure `6mkf` to `6mka`, saving the new version to the file `gesamt_example_output.pdb`. Either mmCIF or PDB file formats can be parsed into `gesamt`, although the program currently returns the structure in PDB format only. 

### 2.4) Viewing the results

In addition to saving our superposition as a PDB file, we are also provided with debug information printed to the terminal. Included is also information regarding the structural alignment, such as RMSD, Q-score and the rotation-translation matrice(s). Furthermore, we are also provided with the sequence identity and multiple-sequence alignment. We can capture this information by following our superposition command with the `>>` operation and the name of the file we want to send the information to. For example, 

> `gesamt test1.cif test2.cif -o test.pdb >> test.out`

Once you have saved all the debug information you might need later, you can now open the output PDB file in your favourite molecular graphics viewer. We suggest opening your results in [Mol* viewer](https://molstar.org/viewer/), a feature-rich online viewer that opens in your browser. Compare whether SSM and GESAMT give the same result. Many molecular graphics viewers are packaged with SSM as their default structural alignment tool. GESAMT was built on SSM to remediate some of its limitations, without a prohibitive runtime penalty. 

----------------------------

#### **Practice exercise: Explore the active site**

