<h1><span style="color:red">Please read this very carefully! </span></h1>

In order to setup your own experiments, you need to download remote files to your linux disk image in the collaboratory environment. As data for your user account is NOT reset when you close or reload the HBP, you have to be very careful how you organize & structure your data. In order to help you with that we create a unique working directory for each molecular use case you run.

Please be also aware that we switch current working directories in this use case. That means that you have to restart and clear all output in order to go back to your starting directory. 

# Calculate an electrostatic potential of a protein from its atomic structure

**Aim:** This use case shows how to use the multipipsa tool to calculate the electrostatic potential surrounding a protein in aqueous solution.

**Version:** 1.0 (April 2019)

**Contributors:**  Neil Bruce, Lukas Adam, Stefan Richter, Rebecca Wade (HITS, Heidelberg, Germany)

**Contact:** [mcmsoft@h-its.org](mailto:mcmsoft@h-its.org)

**Note:** This notebook has graphical output using nglview. If you use the "RunAll" function of the notebook, this graphical output might not appear on your screen. The cell defined to show the output must be visible in the browser during execution.

## Setting up your environment

### Check that all required python packages are installed and working

In [None]:
# Check that required packages are installed
! pip install --upgrade "hbp-service-client" 
! pip install wget python-magic
! pip --quiet install numpy>=1.16
! pip --quiet install rpy2==2.9.1
! pip --quiet install --extra-index-url https://projects.h-its.org/pypi multipipsa==4.0.10

In [None]:
# Import python packages used in this notebook
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import numpy
import rpy2
import os, wget, datetime, magic, inspect
from multipipsa.multipipsa import ApbsRun
from hbp_service_client.storage_service.client import Client
import nglview
print("Python environment set correctly")

### Set up local directory structure

In [None]:
# Create a local working directory
try:
    homeDir = os.environ['HOME']
except:
    print("Error in environment")

else:
    workDir = os.path.join(homeDir, 'work')
    if not os.path.isdir(workDir):
        try:
            os.mkdir(workDir)
        except:
            print("unable to make working directory")
    
    # Make a new directory to run the use case in. 
    # If directory already exists, add a number to make a unique name
    baseDir = 'calcEP'
    dirIter = 0
    useCaseDir = os.path.join(workDir, baseDir)
    print(useCaseDir)
    
    if os.path.exists(useCaseDir):
        while os.path.exists(useCaseDir):
            dirIter += 1
            useCaseDir = os.path.join(workDir, baseDir + '.' + str(dirIter))            
    
    try:
        os.mkdir(useCaseDir)
    except:
        print("Failed to make use case working directory")
    else:
        print("Working directory for current use case: %s" % useCaseDir)


### Set up collab storage for saving data at end of calculation

In [None]:
#Find your own collab storage path
collab_path = get_collab_storage_path()
print(collab_path)
storage_client = Client.new(oauth.get_token())

# Calculate an electrostatic potential of a protein from its atomic structure 

This use case describes how you can calculate an electrostatic potential surrounding a protein in aqueous solution. The use case uses the [multipipsa](https://collab.humanbrainproject.eu/#/collab/19/nav/2108?state=software,multipipsa) software tool, which helps to automate these calculations by providing a python wrapper for the following open source software tools:

* [PDB2PQR](https://apbs-pdb2pqr.readthedocs.io/en/latest/pdb2pqr/index.html): A tool that takes a protein structure in [PDB format](http://www.wwpdb.org/documentation/file-format), adds missing hydrogen atoms, and creates a structure file in PQR format. The PQR file format is derived from the PDB format for describing atomic data, but with the occupancy and temperature factor fields replaced with atomic partial charges and radii.

* [APBS](https://apbs-pdb2pqr.readthedocs.io/en/latest/apbs/index.html): A tool that calculates electrostatic potentials through solution of the Poisson-Boltzmann equation, one of the most common continuum models for describing electrostatic interactions between molecular solutes in salty, aqueous media. 

In addition to automating these calculations, the main use of multipipsa is to compare the electrostatic potentials surrounding a set of similar protein structures. These comparisons are described in other molecular use cases.

This use case also makes use of the [NGL View](https://github.com/arose/nglview) python package for displaying molecular data. NGL View provides IPython widgets for displaying molecular data inside notebooks, using the [NGL Viewer](http://nglviewer.org/) WebGL molecular viewer.

### **What is an electrostatic potential?** ###

An electrostatic potential $\phi$ describes the potential energy of a unit charge located at a position in an electric field. In a uniform dielectric medium with a dielectric constant $\epsilon$, the electrostatic potential at a position $\mathbf{r}$ due to a set of fixed charges $q_i$ can be calculated from Coulomb's equation:

$$ \phi\left(\mathbf{r}\right) = \sum_i \frac{q_{i}}{4 \pi \epsilon r_{i}} $$

where $r_i$ is the distance from point $\mathbf{r}$ to the $i^\textrm{th}$ charge. For a protein in solution, we no longer have a uniform dielectric, as the interior has a much lower dielectric than the solvent. Here, we can calculate the potential by solving the Poisson equation: 

$$ \bigtriangledown \left( -\epsilon\left(\mathbf{r}\right) \cdot \bigtriangledown \phi\left(\mathbf{r}\right) \right)  = \rho\left(\mathbf{r}\right) $$

which links the gradient of the electrostatic field ($-\epsilon\left(\mathbf{r}\right) \cdot \bigtriangledown \phi\left(\mathbf{r}\right)$) with the charge density $\rho\left(\mathbf{r}\right)$. Biological solvents are rarely pure water, and instead contain small dissolved charged ions. The total charge density in the region of the protein is then represented by the charge density due to the fixed charges in the protein ($\rho_\textrm{p}\left(\mathbf{r}\right)$) and the dissolved ions, which are assumed to have a Boltzmann distribution. This leads to the Poisson-Boltzmann equation:

$$ \bigtriangledown \left( - \epsilon\left(\mathbf{r}\right) \cdot \bigtriangledown \phi\left(\mathbf{r}\right) \right)= \rho_\textrm{p}\left(\mathbf{r}\right) + \sum_{i}^{ } z_{i}e_l c_{i}^0\textrm{exp}\left(\frac{-z_{i}e_l\phi \left ( \mathbf{r} \right )}{k_BT}\right) $$

where $z_{i}$ is the net atomic charge of dissolved ion type $i$,  $c_{i}^0$ is its bulk concentration, $e_l$ is the elementary charge, $k_B$ is the Boltzmann constant and $T$ the temperature. If the argument of the exponential function is small, and there is a 1:1 ratio of positive and negative charge ions of equal magnitude, this differential equation can be linearised to simplify its solution.



## Downloading the protein structure

In this use case, we use as our input structure a structure of the catalytic domain of the enzyme adenylyl cyclase 5 (AC5), modelled during the work described in [Tong et al (2016)](https://doi.org/10.1002/prot.25167). The following cell downloads this structure from the CSCS storage area.

In [None]:
# Download AC5 structure file from CSCS storage for calculation
try:
    print("Downloading AC5 structure file from CSCS storage area")
    fileUrl= 'https://object.cscs.ch/v1/AUTH_c0a333ecf7c045809321ce9d9ecdfdea/SGA2_molecular_models/data/Modelled_adenylyl_cyclase_AC_isoform_structures/refined/AC5.pdb'
    wget.download(fileUrl, useCaseDir)
except:
    print("Error downloading structure file from CSCS storage")
else:
    print("Sucessfully downloaded the structure file from CSCS storage")

### Viewing the protein structure
The following cell creates a molecular viewer to visualise the structure of AC5. The catalytic domain of AC5 is a dimer consisting of two protein chains. In the full structure of AC5 these two chains are connected by a series of transmembrane helices that anchor the protein in the post-synaptic membrane.

In [None]:
# View the downloaded structure
# Create a NGL widget object
viewPDB = nglview.NGLWidget()
# Set the display size
viewPDB._remote_call('setSize', target='Widget', args=['600px','400px'])

# Define files to load
AC5_struct_file = nglview.FileStructure(os.path.join(useCaseDir, 'AC5.pdb'))

# Create a component object for displaying the structure
viewPDB_struct = viewPDB.add_component(AC5_struct_file)
# Clear default representation from the component and add cartoon representations for both chains
viewPDB_struct.clear_representations()
viewPDB_struct.add_representation('cartoon', sele=':A', color='orange')
viewPDB_struct.add_representation('cartoon', sele=':B', color='green')

# Display the widget
viewPDB

## Multipipsa

In the following cells we use multipipsa to solve the linearised Poisson-Boltzmann equation for AC5. First we create an object of the ApbsRun class, and set a number of parameters for the calculation.

In [None]:
# Define which structures we want to use. 
# This should be the name of the PDB file with the '.pdb' extension removed
structures = ['AC5']

# Find location of PIPSA executables
pipsaDir = os.path.join(os.path.dirname(inspect.getfile(ApbsRun)), 'data', 'pipsa')

# Create an ApbsRun object for the current calculation
epCalc = ApbsRun(
                    dataDir=useCaseDir,    # Pass the use case work directory as the directory for running the calculation
                    pipsaRoot=pipsaDir,    # Pass the location of the PIPSA executables defined above
                    temp='298.15',         # Define the temperature in Kelvin
                    ios='0.100',           # Define the solvent ionic strength in Molar concentration
                    pH='7.4',              # Define the solvent pH
                    structures=structures  # Pass the list of structures defined above
                ) 

### Predict amino acid protonation states and assign atomic charges and radii
The next cell calls the runPdb2Pqr method of AbpsRun, which runs PDB2PQR. Proteins contain a number of ionisable amino acids, which can exist in different protonation states, depending on the pH of the solution they are in. PDB2PQR can predict the states of these amino acids, at a given pH (defined as 7.4 in the last cell, a normal physiological pH), then add all missing hydrogen atoms to the structure, and assign atomic charges and radii to all atoms. By default, multipipsa assigns charges and radii from the [Amber](http://dx.doi.org/10.1021/ja00124a002) force field. 

In [None]:
# Run pdb2pqr to predict the protonation states of ionisable residues at the chosen pH, 
# and define atomic charges and radii
epCalc.runPdb2Pqr()

### Run APBS to solve the Poisson-Boltzmann equation

The next cell calls the runApbs method of ApbsRun, which calls APBS to solve the linearised Poisson-Boltzmann equation to obtain the electrostatic potential in the dx and UHBD file formats. It also creates a dx file describing the solvent excluded volume of AC5. This is used for visualisation later.


The potential is calculated using a protein internal relative dielectric of 1.0 and an ionic strength of 0.1 M (as used in [Tong et al (2016)](https://doi.org/10.1002/prot.25167)).

In [None]:
# Run APBS to solve the Poisson-Boltzmann equation to calculate the electrostatic potential
epCalc.runApbs()

### Visualise the electrostatic potential

The following cell creates a molecular viewer to display the electrostatic potential calculated above. The potential is set to zero inside the solvent excluded volume of AC5, to make visualisation easier. The potential is displayed as two isopotential surfaces, at potentials of $1\  k_BT/e$ (blue) and $-1\  k_BT/e$ (red). For comparison, the same potential is shown in Figure 4F of [Tong et al (2016)](https://doi.org/10.1002/prot.25167).

**Note:** the potential sometimes takes a little time to appear after the structure appears, so please be patient, and do not run the cell again.


In [None]:
# Create a NGL widget object
viewEP = nglview.NGLWidget()
# Set the display size
viewEP._remote_call('setSize', target='Widget', args=['600px','400px'])

# Define files to load
AC5_struct_file = nglview.FileStructure(os.path.join(useCaseDir, 'AC5.pdb'))
AC5_vol_file = nglview.FileStructure(os.path.join(useCaseDir, 'AC5_vis.dx'))

# Create a component object for displaying the structure
viewEP_struct = viewEP.add_component(AC5_struct_file)
# Clear default representation from the component and add cartoon representations for both chains
viewEP_struct.clear_representations()
viewEP_struct.add_representation('cartoon', sele=':A', color='orange')
viewEP_struct.add_representation('cartoon', sele=':B', color='green')
                              
# Create a component object for displaying the potential                              
viewEP_vol = viewEP.add_component(AC5_vol_file)
# Clear default representation from the component and add  +/- 1kT/e potential isosurfaces
viewEP_vol.clear_representations()
viewEP_vol.add_representation('surface', isolevel=1, isolevelType='value', color='blue')
viewEP_vol.add_representation('surface', isolevel=-1, isolevelType='value', color='red')

# Display the widget
viewEP

### Saving your data to the collab storage area 
In the final cell, your data will be moved to the storage area for your collab, from where you can download your files, and the local working directory will be cleaned.

In [None]:
# Set up a timestamped directory name for saving results to the storage area
baseStorageDir = 'multipipsaCalcEP_'
timestamp = datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
storageDir = os.path.join(collab_path, baseStorageDir + timestamp)
try:
    print('Creating storage directory: %s' % storageDir)
    storage_client.mkdir(storageDir)
except:
    print('There was an error creating the storage directory')
else:
    # Copy files to the storage area and remove the local files
    cleanDir = True
    for fName in os.listdir(useCaseDir):
        localFile = os.path.join(useCaseDir, fName)
        storageFile = os.path.join(storageDir, fName)
        fType =  magic.Magic(mime=True).from_file(localFile)
        try:
            storage_client.upload_file(localFile, storageFile, fType)
        except:
            print('Error copying %s to storage' % fName)
            cleanDir = False
        else: 
            os.remove(localFile)
            
    print('All files in the working directory have been moved to the storage area directory:')
    print(storageDir)
    os.chdir(homeDir)
    if cleanDir:
        os.rmdir(useCaseDir)