# Cluster Interface / Interface Similarity

## Data Resource

* (2007) [PISA](https://www.ebi.ac.uk/pdbe/pisa/)
* (2011, 2020) [ProtCID](http://dunbrack2.fccc.edu/ProtCiD/Default.aspx)
* (2012) [InterEvol](http://biodev.cea.fr/interevol/interevol.aspx)
* (2014) [PIFACE](http://prism.ccbb.ku.edu.tr/piface/)
* (2014) [EPPIC](http://www.eppic-web.org/ewui/#)
* (2017) [~~QSbio~~](http://www.qsbio.org/)

## Method

* (2004) [MultiProt](https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.10628)
* (2005) [TM-Align](https://doi.org/10.1093/nar/gki524)
* (2009) [MM-align](https://doi.org/10.1093/nar/gkp318)
* (2010) [iAlign](http://doi.org/10.1093/bioinformatics/btq404)
* (2015) [PCalign](https://doi.org/10.1186/s12859-015-0471-x)
* (2015) [PROSTA-inter](https://doi.org/10.1093/bioinformatics/btv242)
* (2018) [InterComp](https://doi.org/10.1093/bioinformatics/bty587)
* (2018) [PatchBag](https://doi.org/10.1038/s41598-018-26497-z)

## Some Other Insights

* (2018) [Integrating co‐evolutionary signals and other properties of residue pairs to distinguish biological interfaces from crystal contacts](https://onlinelibrary.wiley.com/doi/full/10.1002/pro.3448)
* (2018) [Distinguishing crystallographic from biological interfaces in protein complexes: role of intermolecular contacts and energetics for classification](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2414-9)
* (2019) [Accurate Classification of Biological and non-Biological Interfaces in Protein Crystal Structures using Subtle Covariation Signals](http://dx.doi.org/10.1038/s41598-019-48913-8)

## Reference

1. Xu, Q., Dunbrack, R.L. ProtCID: a data resource for structural information on protein interactions. Nat Commun 11, 711 (2020). https://doi.org/10.1038/s41467-020-14301-4
2. Gao M, Skolnick J. iAlign: a method for the structural comparison of protein-protein interfaces. Bioinformatics (Oxford, England). 2010 Sep;26(18):2259-2265. DOI: 10.1093/bioinformatics/btq404.
3. Cukuroglu E, Gursoy A, Nussinov R, Keskin O. Non-redundant unique interface structures as templates for modeling protein interactions. PLoS One. 2014;9(1):e86738. Published 2014 Jan 27. doi:10.1371/journal.pone.0086738
4. Baskaran, K., Duarte, J.M., Biyani, N. et al. A PDB-wide, evolution-based assessment of protein-protein interfaces. BMC Struct Biol 14, 22 (2014). https://doi.org/10.1186/s12900-014-0022-0
5. Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007;372(3):774-797. doi:10.1016/j.jmb.2007.05.022
6. Yang Zhang, Jeffrey Skolnick, TM-align: a protein structure alignment algorithm based on the TM-score, Nucleic Acids Research, Volume 33, Issue 7, 1 April 2005, Pages 2302–2309, https://doi.org/10.1093/nar/gki524
7. Xuefeng Cui, Hammad Naveed, Xin Gao, Finding optimal interaction interface alignments between biological complexes, Bioinformatics, Volume 31, Issue 12, 15 June 2015, Pages i133–i141, https://doi.org/10.1093/bioinformatics/btv242
8. Claudio Mirabello, Björn Wallner, Topology independent structural matching discovers novel templates for protein interfaces, Bioinformatics, Volume 34, Issue 17, 01 September 2018, Pages i787–i794, https://doi.org/10.1093/bioinformatics/bty587
9. Budowski-Tal I, Kolodny R, Mandel-Gutfreund Y. A Novel Geometry-Based Approach to Infer Protein Interface Similarity. Sci Rep. 2018;8(1):8192. Published 2018 May 29. doi:10.1038/s41598-018-26497-z
10. Faure G, Andreani J, Guerois R. InterEvol database: exploring the structure and evolution of protein complex interfaces. Nucleic Acids Res. 2012;40(Database issue):D847-D856. doi:10.1093/nar/gkr845
11. Elez K, Bonvin AMJJ, Vangone A. Distinguishing crystallographic from biological interfaces in protein complexes: role of intermolecular contacts and energetics for classification. BMC Bioinformatics. 2018;19(Suppl 15):438. Published 2018 Nov 30. doi:10.1186/s12859-018-2414-9
12. Fukasawa Y, Tomii K. Accurate Classification of Biological and non-Biological Interfaces in Protein Crystal Structures using Subtle Covariation Signals. Sci Rep. 2019;9(1):12603. Published 2019 Aug 30. doi:10.1038/s41598-019-48913-8
13. Dey S, Ritchie DW, Levy ED. PDB-wide identification of biological assemblies from conserved quaternary structure geometry. Nat Methods. 2018;15(1):67-72. doi:10.1038/nmeth.4510
14. Hu J, Liu HF, Sun J, Wang J, Liu R. Integrating co-evolutionary signals and other properties of residue pairs to distinguish biological interfaces from crystal contacts. Protein Sci. 2018;27(9):1723-1735. doi:10.1002/pro.3448
15. Shatsky M, Nussinov R, Wolfson HJ (2004) A method for simultaneous alignment of multiple protein structures. Proteins 56: 143–156.
16. Cheng, S., Zhang, Y. & Brooks, C.L. PCalign: a method to quantify physicochemical similarity of protein-protein interfaces. BMC Bioinformatics 16, 33 (2015). https://doi.org/10.1186/s12859-015-0471-x
17. Mukherjee S, Zhang Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res. 2009;37(11):e83. doi:10.1093/nar/gkp318


In [57]:
import nglview
import pandas as pd
import ujson as json
rep1 = [
    {"type": "line", "params": {
        "sele": surface, "color":"chainindex", "opacity": 0.2
    }},
    {"type": "spacefill", "params": {
        "sele": interface, "color": "chainindex", "opacity": 0.3
    }},
    {"type": "line", "params": {
        "sele": interface, "color": "residueindex"
    }},
    {"type": "surface", "params": {
        "sele": interface, "color": "chainindex","opacity": 0.1
    }}
]

In [40]:
view = nglview.show_file("./pdb_files/1u7f.cif")
view.background = '#212121'
view.representations = rep1
view

NGLWidget(background='#212121')

![fig](../docs/figs/1u7f_A_B_interface.png)

In [33]:
view.clear_representations()
chain_res_tem = "({res_str}) and :{chain_id}"
a_ab = chain_res_tem.format(res_str=ab_dict['A'], chain_id='A')
b_ab = chain_res_tem.format(res_str=ab_dict['B'], chain_id='B')
b_bc = chain_res_tem.format(res_str=bc_dict['B'], chain_id='B')
c_bc = chain_res_tem.format(res_str=bc_dict['C'], chain_id='C')
a_ac = chain_res_tem.format(res_str=ac_dict['A'], chain_id='A')
c_ac = chain_res_tem.format(res_str=ac_dict['C'], chain_id='C')

a_ab_s = chain_res_tem.format(res_str=ab_s_dict['A'], chain_id='A')
b_ab_s = chain_res_tem.format(res_str=ab_s_dict['B'], chain_id='B')

i_chains = ' or '.join(f"({i})" for i in (a_ab, b_ab)) # , b_bc, c_bc, a_ac, c_ac
s_chains = ' or '.join(f"({i})" for i in (a_ab_s, b_ab_s))
interface = f"({i_chains}) and % and /0 and protein"
surface = f"({s_chains}) and % and /0 and protein"

view.add_cartoon(selection="(:A or :B) and protein", color="chainindex", opacity=0.5) # (:A or :B) and protein
view.add_spacefill(selection="(:A or :B) and protein", color="gray", opacity=0.1)
view.add_spacefill(selection=surface, color="chainindex", opacity=0.3)
view.add_surface(selection=interface, color="chainindex")
# view.add_surface(selection=interface, color="residueindex", opacity=0.05)


# 
# "352 or 355 and ^ and :B and % and /0"

In [46]:
converters = {
    'pdb_id': str,
    'chain_id': str,
    'struct_asym_id': str,
    'entity_id': int,
    'author_residue_number': int,
    'residue_number': int,
    'author_insertion_code': str}

eec_as_df = pd.read_csv("C:\\Download\\20200716\\biounit\\0725.tsv", sep="\t", converters=converters)
check = pd.read_csv(
    r"C:\Download\20200716\biounit\pisa%interfacedetail%+1u7f%1%3.tsv", 
    sep="\t", 
    usecols=['pdb_code', 'assemble_code', 'interface_number', 'chain_id', 'residue', 'sequence', 'insertion_code', 'buried_surface_area','solvent_accessible_area', 'hsdc'],
    na_values=[' ']
    ).rename(columns={"pdb_code":"pdb_id",
                      "sequence":"author_residue_number",
                      "insertion_code":"author_insertion_code",
                      "residue":"residue_name",
                      "chain_id": "struct_asym_id_in_assembly"})
check.author_insertion_code.fillna('', inplace=True)

chain_df_check = eec_as_df[eec_as_df.pdb_id.eq('1u7f') & eec_as_df.assembly_id.eq(1)]
residues_check = pd.read_csv("C:\\Download\\20200716\\biounit\\pdb%entry%residue_listing%+1u7f.tsv", sep="\t", converters=converters)
check = check.merge(chain_df_check, how="left")
check = check.merge(residues_check, how="left")
def annotate_pisa(df: pd.DataFrame):
    '''
    Buried Residues:  ASA.eq(0)
    Surface Residues: ASA.ne(0)
    Interface Residues: BSA.ne(0)
    '''
    df['pisa_surface'] = df.solvent_accessible_area.apply(lambda x: 1 if x>0 else 0)
    df['pisa_interface'] = df.buried_surface_area.apply(lambda x: 1 if x>0 else 0)
    return df

annotate_pisa(check)
# check['pic'] = check.apply(lambda x: f"P1_{x['residue_number']}_{x['residue_name']}" if x['struct_asym_id'] == 'B' else f"P2_{x['residue_number']}_{x['residue_name']}", axis=1)
check['pic'] = check.author_residue_number.astype(str)+' and ^'+check.author_insertion_code
check_interface = check[check.pisa_interface.eq(1)]

In [60]:
# ab_i = tuple(zip(check_interface.residue_number, check_interface.struct_asym_id))
# ac_i = tuple(zip(check_interface.residue_number, check_interface.struct_asym_id))
# bc_i = tuple(zip(check_interface.residue_number, check_interface.struct_asym_id))
info = {"atom_site": [dict(zip(("label_seq_id", "label_asym_id"), tp)) for tp in set(ab_i+ac_i+bc_i)]}
# json.dumps(info)

In [31]:
def str_int_join(iterable):
    return ' or '.join(f"({i})" for i in iterable)
# ab_dict = check[check.pisa_interface.eq(1)].groupby(['struct_asym_id_in_assembly']).pic.apply(str_int_join).to_dict()
# bc_dict = check[check.pisa_interface.eq(1)].groupby(['struct_asym_id_in_assembly']).pic.apply(str_int_join).to_dict()
# ac_dict = check[check.pisa_interface.eq(1)].groupby(['struct_asym_id_in_assembly']).pic.apply(str_int_join).to_dict()

In [27]:
ab_s_dict = check[check.pisa_surface.eq(1)].groupby(['struct_asym_id_in_assembly']).pic.apply(str_int_join).to_dict()