Refactoring of PDB_Prepare #2

discoleo · 2023-01-06T18:19:07Z

Function PDB_Prepare

This function can benefit a lot from refactoring:

split sub-functions into separate functions: these can be useful on their own;
reduce the complexity of the code;
optimize the code and remove dependency on dplyr;
correct an ugly bug;

I will address all these topics below. The refactored code is also available on GitHub:

PDB_Prepare function:
: https://github.com/discoleo/R/blob/master/Chemistry/Proteins.Structure.FiScore.R
separate Helper functions:
: https://github.com/discoleo/R/blob/master/Chemistry/Proteins.Structure.R

PDB_Prepare Main Function

The refactored main function is in file Proteins.Structure.FiScore.R (this is only a convenience name for my script files; the initial name can be retained).

most of the code has been moved to external helper functions;
the lower limit of aa can be explicitly set: default = 5;
the last for-loop should run more efficiently and the dependence on dplyr has also been removed;

Features

Are extracted by the helper function features.pdb (see file Proteins.Structure.R). This function can be used on its own and should be exported by the package).

the function also uses the helper functions: as.type.helix and as.type.sheet;
the structure name is stored as a factor (for efficiency): therefore requires explicit as.character() when used in the main function;

Torsions & B-Factor

Are computed by separate functions (see file Proteins.Structure.R). These functions can be used on their own as well.

string extraction: the vectorized version is used directly and should run far more efficiently, e.g.:
df_resno = as.numeric(stringr::str_extract(rownames(pdb_df), "[0-9]{1,}"));

Ugly bug was also corrected:

the torsions function now stores an attribute with the complete cases (as a logical vector):
attr(pdb_df, "complete") = isComplete;
the BFactor function uses explicitly this information to select only the complete cases;

Other

read.pdb: is a minor helper function not actually used in the code;

The refactored code should be faster and more robust. The function names are provisional and may be changed or adapted to better suite various workflows.

Note:

the refactored code has NOT been thoroughly tested!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring of PDB_Prepare #2

Refactoring of PDB_Prepare #2

discoleo commented Jan 6, 2023 •

edited

Loading

Refactoring of PDB_Prepare #2

Refactoring of PDB_Prepare #2

Comments

discoleo commented Jan 6, 2023 • edited Loading

Function PDB_Prepare

PDB_Prepare Main Function

Features

Torsions & B-Factor

Other

Note:

discoleo commented Jan 6, 2023 •

edited

Loading