Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 1.11 KB

README.md

File metadata and controls

33 lines (24 loc) · 1.11 KB

Goal

Classification of protein function based on their sequences.

The protein function which the project focusses is the ATP binding.

Protein phosphorylation

Data collection

Data scraping was performed on several protein sequence and their function from biological databases mainly Unitprot.

data-scrapes folder contains the sequence in the fasta format and annotation of the various proteins.

Approach

The sequence of the protein were augmented after 500 residues. The sequences, which had lower length were artifically padded with '_'. I used Artifical Neural Network (ANN) to classify the protein function (obtained after hyperparameter tuninng) :

Method Numbers
ANN 2/3 layers
Embedding dim 10
Sequence length 23
Optimizer Stochastic Gradient Descent (SGD)
Loss Binary crossentropy
Nodes 128
Batch size 128
Learning Rate 0.001
Accuracy Score 0.95
Precision Score 0.93
Recall Score 0.93
F1-macro 0.93