Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New qfit ligand #426

Merged
merged 35 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
67a59a3
Update qfit_ligand.py
jessicaflowers Apr 23, 2024
6bd862e
Update qfit.py
jessicaflowers Apr 23, 2024
541e8dc
add self.sample_b()
jessicaflowers Apr 23, 2024
fc0b889
update requirements.txt
jessicaflowers Apr 29, 2024
d0ebaf6
Update environment.yml
jessicaflowers Apr 29, 2024
b60b685
Update setup.py
jessicaflowers Apr 29, 2024
5998ef1
Update test_qfit_ligand.py
jessicaflowers Apr 29, 2024
c26f60c
Update test_qfit_ligand.py
jessicaflowers Apr 29, 2024
0c96bf9
Update qfit_ligand.py
jessicaflowers Apr 29, 2024
43ffb31
Update qfit.py
jessicaflowers Apr 29, 2024
8d83242
Update qfit_ligand.py
jessicaflowers Apr 29, 2024
2da7b3a
Update qfit.py
jessicaflowers Apr 29, 2024
b2511a5
remove biopython from setup.py
jessicaflowers Apr 29, 2024
1af3258
remove biopython from requirements.txt
jessicaflowers Apr 29, 2024
a1cd2ea
remove biopython from environment.yml
jessicaflowers Apr 29, 2024
88305ee
remove biopython from qfit.py
jessicaflowers Apr 29, 2024
ce48b11
Add try/except OSError when writing the ligand pdb file in qfit_ligan…
jessicaflowers Apr 29, 2024
52242d7
Create qfit_final_refine_ligand.sh
jessicaflowers Apr 29, 2024
daa0aa5
add qfit_final_refine_ligand.sh to setup.py
jessicaflowers Apr 29, 2024
454d734
Add qfit-ligand to README.md
jessicaflowers Apr 29, 2024
df1f82d
Update README.md
jessicaflowers Apr 29, 2024
2f22bc0
Update example README.md
jessicaflowers Apr 29, 2024
5610494
remove old qfit code from qfit.py
jessicaflowers Apr 29, 2024
d6401fd
update to match dev
Apr 29, 2024
7851ca9
Merge branch 'dev' into new_qfit_ligand
Apr 29, 2024
c4f6eec
Update qfit_final_refine_ligand.sh
jessicaflowers May 1, 2024
133006c
update input requirements
May 2, 2024
976a1c4
change print -> logger.info/debug
May 2, 2024
e3a2e3c
clean up comments, update rot/trans functions in qfit.py
jessicaflowers May 2, 2024
6f9f454
Update qfit_ligand.py
jessicaflowers May 2, 2024
47e0d02
remove old arguments qfit_ligand.py
jessicaflowers May 2, 2024
fc7b40c
add logger errors qfit.py
jessicaflowers May 2, 2024
3638138
typo in logger error qfit.py
jessicaflowers May 3, 2024
975a574
Update qfit_ligand.py
May 6, 2024
13a4ab1
update for mac compatability
May 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 2 additions & 4 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,12 @@ jobs:
strategy:
fail-fast: false
matrix:
os: ['ubuntu-latest', 'macos-latest', 'windows-latest']
os: ['ubuntu-latest', 'macos-13', 'windows-latest']
python-version: ['3.9', '3.10']
steps:
- uses: actions/checkout@v2
- uses: conda-incubator/setup-miniconda@v2
- uses: conda-incubator/setup-miniconda@v3
with:
miniconda-version: ${{ runner.os == 'macOS' && 'latest' || null }}
auto-update-conda: true
channels: anaconda,ibmdecisionoptimization
auto-activate-base: false
python-version: ${{ matrix.python-version }}
Expand Down
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ qFit is a collection of programs for modeling multi-conformer protein structures

Electron density maps obtained from high-resolution X-ray diffraction data are a spatial and temporal average of all conformations within the crystal. qFit evaluates an extremely large number of combinations of sidechain conformers, backbone fragments and small-molecule ligands to locally explain the electron density.


## Installation

1) Download qFit
Expand Down Expand Up @@ -96,6 +97,25 @@ After *multiconformer_model2.pdb* has been generated, refine this model using:

More advanced features of qFit (modeling single residue, more advanced options, and further explainations) are explained in [TUTORIAL](example/TUTORIAL.md).

To model alternate conformations of ligands using qFit, execute the following command:

`qfit_ligand [COMPOSITE_OMIT_MAP_FILE] [PDB_FILE] -l [LABELS] [SELECTION] -sm [SMILES]`

This command facilitates the incorporation of alternate ligand conformations into your protein model. The results are outputted to two files: *multiconformer_ligand_bound_with_protein.pdb*, which is the multiconformer model of the protein-ligand complex, and *multiconformer_ligand_only.pdb*, which is the multiconformer model of the ligand alone.

After running qFit-ligand, it is recommended to perform a final refinement using the script found in [scripts](scripts/post). Run this in the same directory as your models.

If you wish to specify the number of ligand conformers for qFit to sample, use the flag `-nc [NUM_CONFS]`. The default number is set to 10,000.

Using the example 4MS6:

`qfit_ligand example/qfit_ligand_example/4ms6_composit_map.mtz example/qfit_ligand_example/4ms6.pdb -l 2FOFCWT,PH2FOFCWT A,702 -sm 'C1C[C@H](NC1)C(=O)CCC(=O)N2CCC[C@H]2C(=O)O' -nc 10000`

To refine *multiconformer_ligand_bound_with_protein.pdb*, use the following command

`qfit_final_refine_ligand.sh 4ms6.mtz`


## Citations
If you use this software, please cite:
- [Wankowicz SA, Ravikumar A, Sharma S, Riley BT, Raju A, Hogan DW, van den Bedem H, Keedy DA, & Fraser JS. Uncovering Protein Ensembles: Automated Multiconformer Model Building for X-ray Crystallography and Cryo-EM. bioRxiv. (2023).](https://www.biorxiv.org/content/10.1101/2023.06.28.546963v2.abstract)
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ dependencies:
- pyparsing>=2.2.0
- cvxopt
- tqdm
- rdkit
14 changes: 11 additions & 3 deletions example/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,19 @@ Using the example 3K0N, spawning 30 parallel processes:
To model alternate conformers of ligands, the command line tool *qfit_ligand*
should be used:

`qfit_ligand [COMPOSITE_OMIT_MAP_FILE] -l [LABEL] [PDB_FILE] [CHAIN,LIGAND]`
`qfit_ligand [COMPOSITE_OMIT_MAP_FILE] -l [LABEL] [PDB_FILE] [CHAIN,LIGAND] -sm [SMILES]`

Where *LIGAND* corresponds to the numeric identifier of the ligand on the PDB
(aka res. number). The main output file is named *multiconformer_model_[CHAIN]_[LIGAND].pdb*
(aka res. number). The main output file is named *multiconformer_ligand_bound_with_protein.pdb*


If you wish to specify the number of ligand conformers for qFit to sample, use the flag `-nc [NUM_CONFS]`. The default number is set to 10,000.

Using the example 4MS6:

`qfit_ligand qfit_ligand_example/4ms6_composite_map.mtz -l 2FOFCWT,PH2FOFCWT qfit_ligand_example/4ms6.pdb A,702`
`qfit_ligand qfit_ligand_example/4ms6_composite_map.mtz -l 2FOFCWT,PH2FOFCWT qfit_ligand_example/4ms6.pdb A,702 -sm 'C1C[C@H](NC1)C(=O)CCC(=O)N2CCC[C@H]2C(=O)O' -nc 10000`


To refine *multiconformer_ligand_bound_with_protein.pdb*, use the following command

`qfit_final_refine_ligand.sh 4ms6.mtz`
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ cvxpy
pyscipopt
scipy
scikit-learn
rdkit
173 changes: 173 additions & 0 deletions scripts/post/qfit_final_refine_ligand.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
#!/bin/bash
# This script works with Phenix version 1.20.

qfit_usage() {
echo >&2 "Usage:";
echo >&2 " $0 mapfile.mtz [multiconformer_ligand_bound_with_protein.pdb] [multiconformer_ligand_only.pdb]";
echo >&2 "";
echo >&2 "mapfile.mtz, multiconformer_ligand_bound_with_protein.pdb, and multiconformer_ligand_only.pdb MUST exist in this directory.";
echo >&2 "Outputs will be written to mapfile_qFit.{pdb|mtz|log}.";
exit 1;
}

#___________________________SOURCE__________________________________
# Check that Phenix is loaded
if [ -z "${PHENIX}" ]; then
echo >&2 "I require PHENIX but it's not loaded.";
echo >&2 "Please \`source phenix_env.sh\` from where it is installed.";
exit 1;
else
export PHENIX_OVERWRITE_ALL=true;
fi

# Check that qFit is loaded.
command -v remove_duplicates >/dev/null 2>&1 || {
echo >&2 "I require qFit (remove_duplicates) but it's not loaded.";
echo >&2 "Please activate the environment where qFit is installed.";
echo >&2 " e.g. conda activate qfit"
exit 1;
}

# Assert required files exist
mapfile=$1
multiconf=${2:-multiconformer_ligand_bound_with_protein.pdb}
multiconf_lig=${2:-multiconformer_ligand_only.pdb}
echo "mapfile : ${mapfile} $([[ -f ${mapfile} ]] || echo '[NOT FOUND]')";
echo "qfit unrefined model : ${multiconf} $([[ -f ${multiconf} ]] || echo '[NOT FOUND]')";
echo "qfit unrefined ligand model : ${multiconf_lig} $([[ -f ${multiconf_lig} ]] || echo '[NOT FOUND]')";


echo "";
if [[ ! -f "${mapfile}" ]] || [[ ! -f "${multiconf}" ]]; then
qfit_usage;
fi
pdb_name="${mapfile%.mtz}"
echo >&2 "PDB name is "${pdb_name}". ";
echo "${pdb_name}"


#__________________________________DETERMINE RESOLUTION AND (AN)ISOTROPIC REFINEMENT__________________________________
mtzmetadata=`phenix.mtz.dump "${pdb_name}.mtz"`
resrange=`grep "Resolution range:" <<< "${mtzmetadata}"`

echo "${resrange}"

res=`echo "${resrange}" | cut -d " " -f 4 | cut -c 1-5`
res1000=`echo $res | awk '{tot = $1*1000}{print tot }'`

if (( $res1000 < 1550 )); then
adp='adp.individual.anisotropic="not (water or element H)"'
else
adp='adp.individual.isotropic=all'
fi

#__________________________________DETERMINE FOBS v IOBS v FP__________________________________
# List of Fo types we will check for
obstypes=("FP" "FOBS" "F-obs" "I" "IOBS" "I-obs" "F(+)" "I(+)" "FSIM")

# Get amplitude fields
ampfields=`grep -E "amplitude|intensity|F\(\+\)|I\(\+\)" <<< "${mtzmetadata}"`
ampfields=`echo "${ampfields}" | awk '{$1=$1};1' | cut -d " " -f 1`

# Clear xray_data_labels variable
xray_data_labels=""

# Is amplitude an Fo?
for field in ${ampfields}; do
# Check field in obstypes
if [[ " ${obstypes[*]} " =~ " ${field} " ]]; then
# Check SIGFo is in the mtz too!
if grep -F -q -w "SIG$field" <<< "${mtzmetadata}"; then
xray_data_labels="${field},SIG${field}";
break
fi
fi
done
if [ -z "${xray_data_labels}" ]; then
echo >&2 "Could not determine Fo field name with corresponding SIGFo in .mtz.";
echo >&2 "Was not among "${obstypes[*]}". Please check .mtz file\!";
exit 1;
else
echo "data labels: ${xray_data_labels}"
# Start writing refinement parameters into a parameter file
echo "refinement.input.xray_data.labels=$xray_data_labels" > ${pdb_name}_refine.params
fi

#_____________________________DETERMINE R FREE FLAGS______________________________
gen_Rfree=True
rfreetypes="FREE R-free-flags"
for field in ${rfreetypes}; do
if grep -F -q -w $field <<< "${mtzmetadata}"; then
gen_Rfree=False;
echo "Rfree column: ${field}";
echo "refinement.input.xray_data.r_free_flags.label=${field}" >> ${pdb_name}_refine.params
break
fi
done
echo "refinement.input.xray_data.r_free_flags.generate=${gen_Rfree}" >> ${pdb_name}_refine.params

#__________________________________REMOVE DUPLICATE HET ATOMS__________________________________
remove_duplicates "${multiconf}"

#__________________________________NORMALIZE OCCUPANCIES________________________________________
echo "Normalizing occupancies, before refinement "

redistribute_cull_low_occupancies -occ 0.09 "${multiconf}.fixed"
mv -v "${multiconf}.f_norm.pdb" "${multiconf}.fixed"


#________________________________REMOVE TRAILING HYDROGENS___________________________________
phenix.pdbtools remove="element H" "${multiconf}.fixed"

#__________________________________GET CIF FILE__________________________________
echo "Getting the cif file with ready_set/elbow"
phenix.ready_set hydrogens=false \
trust_residue_code_is_chemical_components_code=true \
pdb_file_name="${multiconf}.f_modified.pdb"

# If ready_set doesn't generate a ligand cif file, use elbow
if [ ! -f "${multiconf}.f_modified.ligands.cif" ]; then
phenix.elbow multiconformer_ligand_only.pdb --output ${multiconf}.f_modified.ligands
fi

if [ ! -f "${multiconf}.f_modified.ligands.cif" ]; then
echo "Ligand CIF generation failed"
fi

#__________________________________FINAL REFINEMENT__________________________________

# Write refinement parameters into parameters file
echo "refinement.refine.strategy=*individual_sites *individual_adp *occupancies" >> ${pdb_name}_refine.params
echo "refinement.output.prefix=${pdb_name}" >> ${pdb_name}_refine.params
echo "refinement.output.serial=2" >> ${pdb_name}_refine.params
echo "refinement.main.number_of_macro_cycles=5" >> ${pdb_name}_refine.params
echo "refinement.main.nqh_flips=True" >> ${pdb_name}_refine.params
echo "refinement.refine.${adp}" >> ${pdb_name}_refine.params
echo "refinement.output.write_maps=False" >> ${pdb_name}_refine.params
echo "refinement.hydrogens.refine=riding" >> ${pdb_name}_refine.params
echo "refinement.main.ordered_solvent=True" >> ${pdb_name}_refine.params
echo "refinement.target_weights.optimize_xyz_weight=true" >> ${pdb_name}_refine.params
echo "refinement.target_weights.optimize_adp_weight=true" >> ${pdb_name}_refine.params

echo "refinement.input.monomers.file_name='${multiconf}.f_modified.ligands.cif'" >> ${pdb_name}_refine.params

phenix.refine "${multiconf}.f_modified.pdb" \
"${pdb_name}.mtz" \
"${pdb_name}_refine.params" \
--overwrite

#__________________________________NAME FINAL FILES__________________________________
cp -v "${pdb_name}_002.pdb" "${pdb_name}_qFit_ligand.pdb"
cp -v "${pdb_name}_002.mtz" "${pdb_name}_qFit_ligand.mtz"
cp -v "${pdb_name}_002.log" "${pdb_name}_qFit_ligand.log"

#__________________________COMMENTARY FOR USER_______________________________________
if [ -f "${pdb_name}_002.pdb" ]; then
echo ""
echo "[qfit_final_refine_ligand] Refinement is complete."
echo " Please be sure to INSPECT your refined structure, especially all new altconfs."
echo " The output can be found at ${pdb_name}_qFit_ligand.(pdb|mtz|log) ."
else
echo "Refinement did not complete."
echo "Please check for failure reports by examining log files."
fi
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ def main():
"pandas>=1.2",
"pyparsing>=2.2.0",
"tqdm>=4.0.0",
"rdkit",
]

setup(
Expand Down Expand Up @@ -70,6 +71,7 @@ def main():
scripts=[
"scripts/post/qfit_final_refine_xray.sh",
"scripts/post/qfit_final_refine_cryoem.sh",
"scripts/post/qfit_final_refine_ligand.sh",
"scripts/post/find_largest_ligand.py",
"scripts/post/find_altlocs_near_ligand.py",
"scripts/post/get_lig_chain_res.py",
Expand Down