Skip to content

Graph Convolutional Neural Network for Atom Classification

License

Notifications You must be signed in to change notification settings

alescrnjar/MolAsNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MolAsNet

MolAsNet is a Graph Convolutional Neural (GCN) Network, that takes the structure of a given protein as a graph and predicts whether atoms are hydrogens or heavy atoms (C, N, O, S), thus performing a node classification task. The protein structure is loaded as a .mol2 file, which provides information on the atoms identity as well as the bond network.

Node features include atomic species (C, N, O, S, H), residue name, whether the atom is a hydrogen or a heavy atom, and whether the atom belongs to the protein backbone or to a side-chain.

The chosen default embeddings/numerical representations for the nodes is node degree.

The code is adapted from this tutorial: https://towardsdatascience.com/a-beginners-guide-to-graph-neural-networks-using-pytorch-geometric-part-1-d98dc93e7742

The provided example .mol2 file regards the crystal structure of a polyethylene terephthalate degrading hydrolase (PDB ID: 6EQE, https://www.rcsb.org/structure/6eqe). The .pdb file was downloaded already included hydrogens, and the software VMD was used to make a .mol2 file for the selection of protein atoms. An associated .pdb file was also made, for later use.

Throught the usage of the library MDAnalysis, an output .pdb is produced, whose atoms temperature factors beta take three values: 0 (red with beta coloring method in VMD) for incorrectly classified atom, 1 (white) for non classified atom (i.e. not part of the test set), 2 (blue) for correctly classified atom.

Required Libraries

  • numpy >= 1.21.5

  • pandas >= 1.5.1

  • torch_geometric >= 2.2.0

  • torch >= 1.13.0

  • networkx >= 2.8.4

  • tensorboardX >= 2.11.2

  • matplotlib >= 3.5.2

  • MDAnalysis >= 2.2.0

Case Study: 6EQE

With 8000 training epochs:

Train Accuracy: 0.947305745757666 Test Accuracy: 0.9428571428571428

Temperature factors beta for 6EQE predictions (0: red, 1: white, 2: blue), with resids 90-96 in Licorice representation.