Improved Protein Relative Solvent Accessibility Prediction Using Deep Multi-View Feature Learning Framework.
- Linux system
- python3.7
- pytorch (version 1.3.1) (
- HHblits (
- uniclust30_2018_08 (
- blast-2.2.26 (
- nr (
- PSIPRED (version 3.2) (
*Install and configure the softwares of Python3, Java, Pytorch, HHblits, uniclust30_2018_08, blast+, nr, ProtChain database, and PSIPRED in your Linux system. Please make sure that python3 includes the modules of 'os', 'math', 'numpy-0.47', 'configparser', 'numba', 'random', 'subprocess', 'sys', and 'shutil'. If any one modules does not exist, please using 'pip install xxx' command install the python revelant module. Here, "xxx" is one module name.
*Download this repository at (705,644KB). Then, uncompress it and run the following command lines on Linux System.
$ jar xvf
$ chmod -R 777 ./DMVFL-RSA-main
$ cd ./DMVFL-RSA-main
$ java -jar ./Util/FileUnion.jar ./save_model/ ./
$ rm -rf ./save_model
$ unzip
$ cd ./Util
$ java -jar ./FileUnion.jar ./database/ ./
$ rm -rf ./database
$ unzip
$ cd ../
Here, you will see two configuration files.
*Configure the following tools or databases in
The file of "" should be set as follows:
- HHblits
- uniclust30_2018_08
- blast-2.2.26
- nr
- ProtChain
For example:
# Generate PSSM PSS config path
# Generate RPRSA config
*Configure the following tools or databases in DMVFL-RSA.config
The file of "DMVFL-RSA.config" should be set as follows:
- HHblits
- uniclust30_2018_08
For example:
HHBLITS_EXE = hhblits
HHBLITS_DB = /data/commonuser/library/uniclust30_2018_08/uniclust30_2018_08
For example:
Brief introduction for protein solvent accessibility prediction by DMVFL-RSA
Step 0. generate an MSA (in a3m format) for your protein sequence from HHblits.
Step 1. generate one PSFM profile for your the MSA
Step 2. generate one PSSM profile and a PSS profile for your protein sequence from blast+ and PSIPRED.
Step 3. generate one RPRSA profile for your protein sequence from TBP
Step 4. "protein name +.rsa" is the result file
*The protein solvent accessibility result of each rsidue should be found in the outputted file, i.e., " protein name +.rsa". In each result file, where "NO" is the position of each residue in your protein, where "AA" is the name of each residue in your protein, where "RSA" is the predicted relative accessible surface area of each residue in your protein, and where "ASA" is the predicted accessible surface area of each residue in your protein.
First release 2021-08-03 First release 2021-10-20
[1] Xue-Qiang Fan, Jun Hu*, Ning-Xin Jia, Dong-Jun Yu*, and Gui-Jun Zhang*. Improved Protein Relative Solvent Accessibility Prediction Using Deep Multi-View Feature Learning Framework. Analytical Biochemistry. sumitted.