# Amino Acid Specific Contact Distance Cutoffs

## Motivation: We want to characterize amino acid specific distance distributions over non-redundant PDB dataset. The reason to do such a calculation is to refine current contact defintion, which always uses a single distance cutoff (e.g., 4.5 or 6.0 angstrom). We want to develop amino acid specific cutoffs to account for distinct interaction ranges between e.g. ionizable residues and between non-polar residues.

In [34]:
from pyspark import SparkContext
from pyspark.sql import SparkSession
from mmtfPyspark.io import mmtfReader, mmtfWriter
from mmtfPyspark.webfilters import Pisces
from mmtfPyspark.mappers import StructureToPolymerChains
from mmtfPyspark.utils import traverseStructureHierarchy, ColumnarStructure
from mmtfPyspark import structureViewer
import numpy as np
from scipy.spatial.distance import pdist, squareform
import matplotlib.pyplot as plt

In [None]:
def distmap(structure):
    arrays = ColumnarStructure(structure.values().first(), firstModelOnly=True)
    x = arrays.get_x_coords()
    y = arrays.get_y_coords()
    z = arrays.get_z_coords()
    
#   atom_names = arrays.get_atom_names()
#    noh_idx = (atom_names = 'H*')

**Configure Sparks**

In [2]:
spark = SparkSession.builder.master("local[4]").appName("Hackthon").getOrCreate()
sc = spark.sparkContext

In [3]:
path = "../resources/mmtf_full_sample"

pdb = mmtfReader.read_sequence_file(path, sc)

In [8]:
nr_chains = pdb \
    .filter(Pisces(sequenceIdentity=20, resolution = 1.6)) \
    .flatMap(StructureToPolymerChains()) \
    .filter(Pisces(sequenceIdentity=20, resolution = 1.6)) \
    .filter(ContainsLProteinChain())

In [5]:
nr_chains.count()

2991

In [30]:
pdbids = nr_chains.keys().collect()[:10]
pdbids

['4WN5.A',
 '4WND.B',
 '4WP9.A',
 '4WPG.A',
 '4WPK.A',
 '4WRI.A',
 '4WSF.A',
 '1GWM.A',
 '1GWM.A',
 '1GXU.A']

In [31]:
structures = nr_chains.filter(lambda x: x[0] in pdbids)

In [32]:
structures.count()

10

In [43]:
arrays = ColumnarStructure(structures.values())
#type(arrays)


In [46]:
#help(ColumnarStructure)

In [44]:
x = arrays.get_x_coords()
y = arrays.get_y_coords()
z = arrays.get_z_coords()

In [None]:
spark.stop()