Postic G., Hamelryck T., Chomilier J., Stratmann D.
Type 'make' in the terminal. This will create executable binaries named 'scoring' and 'training'.
Run each program without any argument (or with -h option).
$ ./training -l example/list1.txt -d example/dataset/ -o myPotentials
This will create a statistical potential for each residue pair represented by the carbons alpha (n=210; *.nrg files).
The 'myPotentials/' output directory will also contain 3 Tab-Separated Values (.tsv) files with some statistics about the training dataset:
- the atomic pairs ranked by their lowest energy peaks (top_energies.tsv);
- the atomic pairs ranked by their frequencies (top_occurrences.tsv);
- the 100 shortest distances (top_distances.tsv).
Note: The same results can be obtained with the following command:
$ ./training -L example/list3.txt -d example/dataset/ -o myPotentials
Unlike the -l argument, -L does not require using a list of native protein structures (i.e. a list of PDB codes). This allows using a set of decoys as an input (each having any type of filename).
$ ./scoring -i example/dataset/1BKR.pdb -d myPotentials/
This will calculate the pseudo-energy of the structure 1BKR by using the previously computed potentials.
$ ./training -l example/list1.txt -d example/dataset/ -o myPotentials -r CB -p -g
This will create statistical potentials, with residues represented by their carbons beta (-r CB) Each potential will be plotted as a SVG file (-p). This interatomic squared distances used for the calculations are written into *.dat files (-g).
Note: Any previously created 'myPotentials/parameters.log' file will be overwritten.
$ ./scoring -i example/dataset/1BKR.pdb -d myPotentials/ -c -p -w -o myResults
The pseudo-energy of 1BKR will be calculated with cubic-interpolated potentials (-c). These interpolated potentials will be plotted as SVG files (-p). Two TSV files will be written (-w):
- the pseudo-energy and distance for each atomic pair (data.tsv);
- the pseudo-energy for each residue of the protein sequence (energy_[WINDOW_SIZE].tsv). All these data are written into 'myResults' directory (-o myResults).
Notes:
- the default representation is now CB (carbons beta), as defined in 'myPotentials/parameters.log';
- the WINDOW_SIZE is defined as in https://prosa.services.came.sbg.ac.at/prosa_help.html
$ ./training -l example/list1.txt -d example/dataset/ -o myPotentials -k e -b SJ-dpi -p
Same training as case#1 but with Kernel Density Estimations (KDE) Here, we use an Epanechnikov kernel (-k e), and the kernel bandwidth is selected with the Sheather-Jones direct plug-in (-b SJ-dpi) method. Each potential will be plotted as a SVG file (-p).
$ ./scoring -i example/dataset/1BKR.pdb -d myPotentials/ -q 10A,11A,12A,13A,14A,15A,16A,17A,18A,19A,20A -z -s 2000
Only the residues 10A to 20A of 1BKR will be processed (-q). A Z-score will be computed to evaluate the absolute structural quality (-z); the more negative, the better the model. This Z-score will be computed on 2000 random sequence decoys (-s 2000).
After any training:
$ ./scoring -l example/list2.txt -d myPotentials/
Multiple inputs: a pseudo-energy will be calculated for each of the 25 structures of the 'example/list2.txt'. The chain name is provided for 2 structures in this list. By default, all chains found will be processed.
$ ./training -l example/list1.txt -d example/dataset/ -o myRefState -r allatom -W
This trains the reference state separately (-W) on all atoms (-r allatom). A 'frequencies.ref' file is created, which can then be used (-R) to train a statistical potential.
$ ./training -l example/list1.txt -d example/dataset/ -o myPotentials -R myRefState/frequencies.ref -r backbone
Thus, the observed frequencies are trained on backbones, while the reference state is trained on all atoms.
Contact: guillaume.postic@u-paris.fr