Brief description of scripts

The scripts were developed as part of the study titled "Predicting structural susceptibility of proteins to proteolytic processing", which has been submitted to IJMS (International Journal of Molecular Sciences).

Brief description of scripts

1_download_structure_with_sep_chain.py: a) to download experimentally verified structures from RCSB PDB or modelled structures from AlphaFold Protein Structure Database; b) to extract specific protein chain from pdb-file with the whole structure into separate pdb-file (only for experimental structures).
2_del_ligands_pyv2.py: a) to remove ligands from pdb-file with specific protein chain using Chimera tool.
3_parse_pdb.py: a) to extract "b-factor" values (one of the structural features); b) to map polypeptide sequence and structure positions.
3_parse_pdb_with_cuts.py: a) to extract "b-factor" values (one of the structural features); b) to map polypeptide sequence and structure positions; c) to map proteolytic cleavage sites from sequence into structure.
4_create_dssp.py: a) to generate dssp-files locally or remotely using pdb-files.
5_extract_features.py: a) to extract feature information from DSSP files - initial type of secondary structure, solvent accessibility and others; b) to generate ad-hoc features - length of loop, type of secondary structure, terminal regions - based on information of initial type of secondary structure; c) to map structure information between DSSP files and PDB files.
6_norm_data.py: a) to apply normalisation of features for variables with float type; b) to generate dummy variables from secondary structure information. As a rule, we only apply normalisation within the protein structure chain. While creating of training dataset we applied two mode of normalisation: within the protein structure chain and within the whole dataset.
7_get_structural_score.py: a) to predict structural scores of proteolytic cleavage sites using our structural model.
ROC_AUC.py: a) to generate ROC-curves and ROC AUC scores on testing dataset; b) to compare our results with ProCleave results.
corr_proba.py: a) to visualise PWM and structural scores on plot for specific protein substrate.
decision_boundary.py: a) to visualise decision boundary plot while getting total score.
dist_proba.py: a) to visualise distribution of scores predicted.
evaluate_StrModel.py: a) to estimate our model on training dataset; b) to visualise performance score of our model.
evaluate_features.py: a) to estimate predictive performance of specific structural features; b) to visualise estimates.
filter_blastp.py: a) to filter BLASTp output with conditions: 1) % identity >= 90; 2) % coverage >= 67
map_cuts_pyv2.py: a) to visualise (map) proteolytic cleavage sites onto structures using Chimera.
map_scores_pyv2.py: a) to visualise (map) proteolytic cleavage scores onto structures using Chimera.
preprocess1_CutDB.py: a) to aggregate information about proteolytic cleavage sites from CutDB and sequences of protein substrates.
preprocess2_CutDB.py: a) to present information about proteolytic cleavage sites and sequence for each protein substrate in the table form.
save_model.py: a) to save ML model.
statistics_*.py: a) to get summary statistics - the number of unique protein substrate ID, the number of unique structure ID, the number of proteolytic cleavage sites, the number of proteases - on the different step of creating training dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
development		development
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

development

development

README.md

README.md

Repository files navigation

Brief description of scripts

About

Releases

Packages

Languages

EugeneVlg02/ProteolysisStructuralPrediction_development

Folders and files

Latest commit

History

development

development

README.md

README.md

Repository files navigation

Brief description of scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages