TMCrys was developed to help the target selection of structural genomics projects by providing prediction for the propensity of the solubilization, purification and crystallization steps of the crystallization process, as well as a prediction for the whole process.
If you find TMCrys useful, please cite:
Julia K. Varga and Gábor E. Tusnády
TMCrys: predict propensity of success for transmembrane protein crystallization
Bioinformatics, bty342
https://doi.org/10.1093/bioinformatics/bty342
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Packages and modules, please copy and paste code below to install.
TMCrys was developed with R v3.4.1 and Perl v5.18.2. Lower versions may not work properly.
R packages - from R shell
install.packages("xgboost", repos='http://cran.rstudio.com/')
install.packages("caret", repos='http://cran.rstudio.com/')
install.packages("docopt", repos='http://cran.rstudio.com/')
install.packages("protr", repos='http://cran.rstudio.com/')
Perl Modules
sudo -i cpan install XML::LibXML
sudo -i cpan install Bio::Tools::Protparam
sudo -i cpan install Getopt::Std
sudo -i cpan install Statistics::R
You may need to add sudo -i
before the commands.
You will also need a modified version of the OB module (used for OB-score calculation), it is downloaded together with TMCrys to tools directory. Please do not remove it or data (data/zmat.dat) belonging to it.
TMCrys requires an installed copy of BioPerl (http://bioperl.org/INSTALL.html) for running properly. BioPerl could also be installed during the installation of Bio::Tools::Protparam when installer ask about whether to install all modules.
Download or clone git folder from https://github.com/brgenzim/tmcrys/.
git clone https://github.com/brgenzim/tmcrys/
If downloaded as a compressed file, please uncompress it to a folder.
Add $TMCRYS to the environmental variables with
export TMCRYS=/path/to/tmcrys/folder
Alternatively, you may copy it to ~/.bashrc or ~/.profile or ~/.bash_profile according to your system settings.
If you want to make it permanent, write TMCRYS=/path/to/tmcrys/folder
to /etc/environment.
After insttaling all modules and packages, please run:
cd $TMCRYS
./tmcrys --test
If text ends with 'Test ok', then the installation was successful.
For running TMCrys you will need:
- Sequence and topology of transmembrane protein(s). There are multiple options for input.
- A single CCTOP result file, containing one CCTOP entry. Use
-i <CCTOPFILE>
option with tmcrys. - A directory of CCTOP files. Use
-d <CCTOPDIR>
option. If you use this possibility, please provide a name for the project with--name NAME
option. - Alternatively, you may also use a space delimited file where lines look as follow: 'proteinID sequence topology). Here, a string represents topology as in
test/test.txt
file. Use-s <DELIMITEDFILE>
option. You may predict the topology of your protein with CCTOP at http://cctop.enzim.ttk.mta.hu. For multiple proteins, a python script is available at http://cctop.enzim.ttk.mta.hu/?_=/documents/direct_interface.html.
- A single CCTOP result file, containing one CCTOP entry. Use
- NetSurfP result .rsa files. Please provide them with
-n <NETSURFPFILE>
option. It may contain results for or multiple proteins. NetSurfP may be run or downloaded from http://www.cbs.dtu.dk/services/NetSurfP/. - A working directory, specified with
--wd <DIR>
option.
For test purposes, all these are included in the ./test folder.
To run please type
cd $TMCRYS
./tmcrys (-i <CCTOPFILE> | -d <CCTOPDIR> | -s <DELIMITEDFILE>) -n <NETSURFPFILE> --wd <DIR>
Help for every script is available by typing -h
or --help
or no arguments when running commands.
The result file of TMCrys contains propensity of success for every steo (pr1, pr2, p3) and for the whole process (prw) together with the final predictions (pred1, pred2, pred3, predw) according to the thresholds described in the paper.
Julia K. Varga
Gábor E. Tusnády
If you encounter any problems, please feel free to open an issue or contact: tmcrys@ttk.mta.hu
This project is licensed under the GNU License - see the LICENSE.md file for details.
- Dobson,L., Reményi,I. and Tusnády,G.E. (2015) CCTOP: a Consensus Constrained TOPology prediction web server. Nucleic Acids Res., 43, W408–W412.
- Xiao,N., Cao,D.-S., Zhu,M.-F. and Xu,Q.-S. (2015) protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics, 31, 1857–1859.
- Walker,J.M. ed. (2005) The Proteomics Protocols Handbook Humana Press, Totowa, NJ.
- Overton,I.M. and Barton,G.J. (2006) A normalised scale for structural genomics target ranking: The OB-Score. FEBS Lett., 580, 4005–4009.
- Petersen,B., Petersen,T.N., Andersen,P., Nielsen,M. and Lundegaard,C. (2009) A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct. Biol., 9, 51.
- Chen,T., He,T., Benesty,M., Khotilovich,V. and Tang,Y. (2017) xgboost: Extreme Gradient Boosting.
- Kuhn,M. et al. (2017) caret: Classification and Regression Training.
- Kawashima,S., Ogata,H. and Kanehisa,M. (1999) AAindex: Amino Acid Index Database. Nucleic Acids Res., 27, 368–369.
- Yan,Y. (2016) rBayesianOptimization: Bayesian Optimization of Hyperparameters.