Skip to content

Vikasdubey0551/Sequential-analysis-ANN

Repository files navigation

Goal

Classification of protein function based on their sequences.

The protein function which the project focusses is the ATP binding.

Protein phosphorylation

Data collection

Data scraping was performed on several protein sequence and their function from biological databases mainly Unitprot.

data-scrapes folder contains the sequence in the fasta format and annotation of the various proteins.

Approach

The sequence of the protein were augmented after 500 residues. The sequences, which had lower length were artifically padded with '_'. I used Artifical Neural Network (ANN) to classify the protein function (obtained after hyperparameter tuninng) :

Method Numbers
ANN 2/3 layers
Embedding dim 10
Sequence length 23
Optimizer Stochastic Gradient Descent (SGD)
Loss Binary crossentropy
Nodes 128
Batch size 128
Learning Rate 0.001
Accuracy Score 0.95
Precision Score 0.93
Recall Score 0.93
F1-macro 0.93

About

Classification of protein function based on their sequences with Artificial Neural Networks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published