## Introduction

### Importing data

In [1]:
from pyprot.protein import Protein

PDB can be imported directly from its path or from a Bio.PDB.Structure.Structure

In [2]:
# From a filepath
Protein("2XJX.pdb")

<pyprot.protein.Protein at 0x7f6f145bd0b8>

In [3]:
# From a biopython object
from Bio.PDB import PDBParser
parser = PDBParser()
pdb = parser.get_structure("pdb_id", "2XJX.pdb")
Protein(pdb)

<pyprot.protein.Protein at 0x7f6f4c0c6be0>

## pandas.DataFrame functions

In [4]:
prot = Protein(pdb)

In [5]:
# Get head of DataFrame representation
prot.head()

Unnamed: 0,bfactor,chain,coord,disordered_flag,element,full_id,mass,resname,occupancy,x,y,z
0,34.75,A,"[34.384, 9.301, 30.781]",0.0,N,"(pdb_id, 0, A, ( , 11, ), (N, ))",14.0067,PRO,1.0,34.384,9.301,30.781
1,35.93,A,"[35.484, 8.424, 30.523]",0.0,C,"(pdb_id, 0, A, ( , 11, ), (CA, ))",12.0107,PRO,1.0,35.484,8.424,30.523
2,36.26,A,"[36.65, 9.185, 31.133]",0.0,C,"(pdb_id, 0, A, ( , 11, ), (CB, ))",12.0107,PRO,1.0,36.65,9.185,31.133
3,36.5,A,"[36.335, 10.577, 30.706]",0.0,C,"(pdb_id, 0, A, ( , 11, ), (CG, ))",12.0107,PRO,1.0,36.335,10.577,30.706
4,36.32,A,"[34.824, 10.693, 30.664]",0.0,C,"(pdb_id, 0, A, ( , 11, ), (CD, ))",12.0107,PRO,1.0,34.824,10.693,30.664
5,36.38,A,"[35.266, 7.063, 31.134]",0.0,C,"(pdb_id, 0, A, ( , 11, ), (C, ))",12.0107,PRO,1.0,35.266,7.063,31.134
6,39.92,A,"[34.124, 6.624, 31.188]",0.0,O,"(pdb_id, 0, A, ( , 11, ), (O, ))",15.9994,PRO,1.0,34.124,6.624,31.188
7,36.02,A,"[36.328, 6.399, 31.581]",0.0,N,"(pdb_id, 0, A, ( , 12, ), (N, ))",14.0067,MET,1.0,36.328,6.399,31.581
8,34.7,A,"[36.363, 4.945, 31.562]",0.0,C,"(pdb_id, 0, A, ( , 12, ), (CA, ))",12.0107,MET,1.0,36.363,4.945,31.562
9,36.89,A,"[35.567, 4.32, 32.727]",0.0,C,"(pdb_id, 0, A, ( , 12, ), (CB, ))",12.0107,MET,1.0,35.567,4.32,32.727


In [6]:
# access DataFrame
prot.df.resname.str.lower().head()

0    pro
1    pro
2    pro
3    pro
4    pro
Name: resname, dtype: object

You can choose whether to initialize the DataFrame when generating Protein or not, in which case will need to call Protein.generate_dataframe()

In [7]:
prot = Protein(pdb, generate_dataframe=False)
prot.df # nothing happens!

So first you will need to call 

In [8]:
prot.generate_dataframe()

In [9]:
prot.df.resname.str.lower().head()

0    pro
1    pro
2    pro
3    pro
4    pro
Name: resname, dtype: object

When generating the DataFrame you can choose to raise an exception or not if there are null values in coordinates column. Default is not to raise error but to generate null values, but you can change this behaviour.

In [10]:
prot.df[~prot.df.coord.isnull()].shape # not-null coordinates

(1950, 12)

In [11]:
prot.df[prot.df.coord.isnull()].shape # null coordinates

(20, 12)

## Graph models

In [13]:
from pyprot.graph_models import StructureGraphGenerator

In [15]:
sgraph = StructureGraphGenerator()