- Improved README.md
- Phenix dependency removed
- PanDDA defaults to using present cifs with no rebuilding
- Cleaner logs
- Improved ranking for unbuildable events
- Memory performance optimizations
- Which filters to apply to datasets can be configured with --data_quality_filters and --reference_comparability_filters
PanDDA 2 is still in development and feedback is always appreciated!
If you have a problem with installation, then it is most likely specific to your system and best to just email me.
If the program errors while it runs, it is most helpful if you include the command line output and the json log in a GitHub issue.
If you uncertain about the correctness of the results, then a GitHub issue is appropriate if you can share information publicly, in particular screenshots of maps or ligand fits. If you cannot, then an email to me is the best way to raise the concerns. Either way, please include the program output, the json log and screenshots of the offending z maps/event maps/autobuilds.
It is recommended that you install PanDDA 2 in its own python 3.8 anaconda environment. If you do not have an Anaconda enviroment you can install one by following the instructions at https://www.anaconda.com/products/distribution#linux.
Then:
conda create -n pandda2 python=3.9
conda activate pandda2
conda install -c conda-forge -y fire numpy scipy joblib scikit-learn umap-learn bokeh dask dask-jobqueue hdbscan matplotlib rich seaborn rdkit openbabel
pip install ray
git clone https://github.com/ConorFWild/pandda_2_gemmi.git
cd pandda_2_gemmi
pip install -e .
cd _gemmi
pip install .
pip install numpy==1.21.0
Installing PanDDA 2 this way will add various scripts to your path, but only while you are in this anaconda environment.
PanDDA 2 supports the autobuilding of events and ranking them by autobuildability. All one needs to do is ensure that BUSTER is set up in their path (and hence ana_pdbmaps and rhofit).
Once you have installed PanDDA 2 in a conda environment, it can be run from that enviroment with autobuilding and automated ground state identification with the following:
python /path/to/analyse.py --data_dirs=<data directories> --out_dir=<output directory> --pdb_regex=<pdb regex> --mtz_regex=<mtz regex> <options>
If you want to run the lightest possible PanDDA (no clustering of datasets, no autobuilding, ect: basically PanDDA 1), then a command like the following is appropriate:
python /path/to/analyse.py --data_dirs=<data directories> --out_dir=<output directory> --pdb_regex=<pdb regex> --mtz_regex=<mtz regex> --autobuild=False --rank_method="size" --comparison_strategy="high_res_first" <options>
It is strongly recommended that if you are qsub'ing a script that will run PanDDA 2 you set up your environment on the head node (by activating the anaconda environment in which PanDDA 2 is installed) and use the "-V" option on qsub to copy your current environment to the job.
An example of how to run with distributed computing at Diamond Light Source is as follows:
# Ensuring availability of Global Phasing code for autobuilding and phenix for building cifs
module load ccp4
module load buster
# Put the following in the file submit.sh
python /path/to/analyse.py --data_dirs=<data dirs> --out_dir=<output dirs> --pdb_regex=<pdb regex> --mtz_regex=<mtz regex> --global_processing="distributed" <options>
# Submitting
chmod 777 submit.sh
qsub -V -o submit.o -e submit.e -q medium.q -pe smp 12 -l m_mem_free=15G submit.sh
PanDDA 2 differs from PanDDA 1 in two major methodological ways.
Firstly, it attempts to identify which sets of datasets should be used to produce a statistical model which has the optimal contrast (to each test dataset individually). This allows it to handle subtle heterogeneity between datasets.
Secondly it attempts to autobuild the events returned, and then rank the events based on the quality of the model of the fragment that could be constructed. This allows improved rankings of events.
PanDDA 2 selects the datasets to construct each statistical model it tries by identifying the sets of datasets which are closest to each other. This is achieved by:
- Finding all pairwise distances between electron density maps
- Finding the nearest neighbours of each dataset's electron density map
- Finding the non-overlapping neighbourhoods with the minimum standard deviation between the maps they contain.
Currently PanDDA 2 only chooses one statistical model to progress to event map generation and autobuilding. This is done by fitting several conformations of the expected fragment into each event from each statistical by differential evolution.
The statistical which has the best fragment fit is then selected for progression to event map generation and autobuilding.
PanDDA 2 autobuilds the events of each dataset. This is done with a custom Rhofit invocation to account for the differing distribution of values between event maps and normal crystallographic 2Fo-Fc maps.
Current limitations with the interaction between pandda.inspect and PanDDA 2 mean that it is only possible to show one autobuild in the GUI.
Therefore, the highest scoring autobuild by RSCC from any event in each dataset is selected and included in the initial model pandda.inspect shows the user.
This has the effect that users may open apparently good hit density with no autobuild present, if another hit which is better fit by the autobuilding is present in the same dataset.