diff --git a/README.md b/README.md index 3460a853e..0b0bee55b 100644 --- a/README.md +++ b/README.md @@ -14,59 +14,29 @@ If you use this software, please cite: - [Keedy, D. A., Fraser, J. S. & van den Bedem, H. Exposing Hidden Alternative Backbone Conformations in X-ray Crystallography Using qFit. PLoS Comput. Biol. 11, e1004507 (2015)](https://dx.doi.org/10.1371/journal.pcbi.1004507) -## Installation (conda recommended) - -We recommend using the _conda_ package manager to install _qFit_. +## Installation You will need the following tools: * git -* _conda_ package manager (which you can get by installing [Miniconda3](https://docs.conda.io/en/latest/miniconda.html)) +* pip Once these are installed, you can: -1. Create a new conda env & activate it - ```bash - conda create --name qfit "python>=3.9" - conda activate qfit - ``` - 1. Install dependencies ```bash - conda install -c anaconda mkl numpy=1.22 - conda install -c anaconda -c ibmdecisionoptimization \ - cvxopt cplex + pip install -r requirements.txt ``` - For some of the post analysis scripts, you will also need sklean - conda install -c anaconda scikit-learn -1. Clone the latest release of the qFit source, and install to your conda env +1. Clone the latest release of the qFit source and install it using pip ```bash - git clone -b main https://github.com/ExcitedStates/qfit-3.0.git + git clone https://github.com/ExcitedStates/qfit-3.0.git cd qfit-3.0 pip install . ``` 1. You're now ready to run qFit programs! See [usage examples](#sec:usage-examples) below for some examples. -### M1 Macs - -Unfortunately, the Anaconda repos don't contain 'osx-arm64' binaries for IBM's CPLEX and Intel's mkl. -We don't currently have plans to switch to a different MIQP solver (e.g. Gurobi). - -As a workaround, you'll have to force conda to install the 'osx-64' binaries for everything (x86_64). -macOS's Rosetta 2 translation will handle the Intel→AppleSilicon translation. - -Instead of the first step in the above Installation section, use this: - -1. Create a new conda env & activate it - ```bash - CONDA_SUBDIR=osx-64 conda create --name qfit "python>=3.9" - conda activate qfit; conda env config vars set CONDA_SUBDIR=osx-64; conda deactivate - conda activate qfit - ``` - -then follow the rest of the instructions. ### Advanced @@ -75,15 +45,10 @@ If you prefer to manage your environments using other methods, qFit has the foll * [Python 3.6+](https://python.org) * [numpy](https://numpy.org) * [scipy](https://scipy.org) -* [cvxopt](https://cvxopt.org) -* [IBM ILOG CPLEX Optimization Studio (Community Edition)](https://www.ibm.com/products/ilog-cplex-optimization-studio) - -Installation instructions using `pip` can be found in the `docs` folder. +* [cvxpy](https://www.cvxpy.org) Once dependencies are installed, you can clone the qFit source, and install to your env as above. -(Note: `python setup.py install` will only work if numpy has _already_ been installed.) - ## Contributing @@ -167,5 +132,3 @@ Gohlke. See file header. The `Xpleo` software and `LoopTK` package have been major inspirations for the inverse kinematics functionality. - -[1]: https://www-01.ibm.com/software/websphere/products/optimization/cplex-studio-community-edition/ "IBM website" diff --git a/docs/aws_deploy.sh b/docs/aws_deploy.sh index c12aaa1ce..87e79ed9a 100644 --- a/docs/aws_deploy.sh +++ b/docs/aws_deploy.sh @@ -2,31 +2,11 @@ # Tested on Amazon Linux 2, but should work on most RPM-based Linux distros -# install Anaconda RPM GPG keys -sudo rpm --import https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc - -# add Anaconda repository -cat </cplex/python//x86_64_ - python setup.py install - -where -- `` is the directory where you installed CPLEX, -- `` is the appropriate python version in your environment, -- `` is a platform dependent string, such as `linux` for Linux systems and `osx` for macOSX. diff --git a/environment.yml b/environment.yml index b9774c44d..98fe0ed15 100644 --- a/environment.yml +++ b/environment.yml @@ -7,11 +7,9 @@ dependencies: - libblas[build=*mkl] - numpy>=1.20,<2 - scipy>=1.0 + - cvxpy + - pyscipopt - pandas>=1.2 - pyparsing>=2.2.0 - cvxopt - - ibmdecisionoptimization::cplex - - osqp - - pip: - - "git+https://github.com/osqp/miosqp.git@ac672338b0593d865dd15b7a76434f25e24244a9#egg=miosqp" - tqdm diff --git a/pyproject.toml b/pyproject.toml index e24fe928c..21a2f0ab0 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -5,4 +5,6 @@ requires = [ "wheel", # PEP 508 specifications. "setuptools_scm", "numpy>=1.20,<2", + "cvxpy", + "pyscipopt", ] diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 000000000..c940bc598 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,5 @@ +numpy +cvxpy +pyscipopt +scipy +scikit-learn diff --git a/scripts/post/calc_rscc.py b/scripts/post/calc_rscc.py index fec4e10b5..c70ecb96c 100644 --- a/scripts/post/calc_rscc.py +++ b/scripts/post/calc_rscc.py @@ -8,64 +8,68 @@ from qfit.volume import XMap from qfit.validator import Validator -''' +""" This script will calculate the RSCC of a ligand (or any residue) defined by their ligand name (--ligand) or residue number and chain id (--resi_chain). It will only work on mtz maps with 2FOFCWT,PH2FOFCWT. To run: calc_rscc.py PDB_FILE.pdb MTZ_FILE.mtz --ligand AR6 --pdb PDB_NAME --directory /path/for/output/csv/file -''' +""" + def build_argparser(): p = ArgumentParser(description=__doc__) p.add_argument("structure", type=str, help="PDB-file containing structure.") p.add_argument("map", type=str, help="Map.") - p.add_argument("--ligand", type=str, help="name of ligand for RSCC to be calculated on") - p.add_argument("--residue", type=str, help="Chain_ID, Residue_ID for RSCC to be calculated on") + p.add_argument( + "--ligand", type=str, help="name of ligand for RSCC to be calculated on" + ) + p.add_argument( + "--residue", type=str, help="Chain_ID, Residue_ID for RSCC to be calculated on" + ) p.add_argument("--pdb", type=str, help="name of PDB") p.add_argument("--directory", type=str, help="Where to save RSCC info") return p def main(): - p = build_argparser() - options = p.parse_args() - # Load structure and prepare it - structure = Structure.fromfile(options.structure) - if options.ligand is not None: - ligand = structure.extract("resn", options.ligand, "==") - elif options.residue is not None: - chainid, resi = options.residue.split(",") - ligand = structure.extract(f"resi {resi} and chain {chainid}") - else: - print('Please provide ligand name or residue ID and chain ID') + p = build_argparser() + options = p.parse_args() + # Load structure and prepare it + structure = Structure.fromfile(options.structure) + if options.ligand is not None: + ligand = structure.extract("resn", options.ligand, "==") + elif options.residue is not None: + chainid, resi = options.residue.split(",") + ligand = structure.extract(f"resi {resi} and chain {chainid}") + else: + print("Please provide ligand name or residue ID and chain ID") + + # Load and process the electron density map: + xmap = XMap.fromfile(options.map, label="2FOFCWT,PH2FOFCWT") + scaler = MapScaler(xmap) + xmap = xmap.canonical_unit_cell() + footprint = ligand + scaler.scale(footprint, radius=1.5) - # Load and process the electron density map: - xmap = XMap.fromfile( - options.map, label='2FOFCWT,PH2FOFCWT' - ) - scaler = MapScaler(xmap) - xmap = xmap.canonical_unit_cell() - footprint = ligand - scaler.scale(footprint, radius=1.5) + xmap = xmap.extract(ligand.coor, padding=8) - xmap = xmap.extract(ligand.coor, padding=8) + # Now that the conformers have been generated, the resulting + # # conformations should be examined via GoodnessOfFit: + validator = Validator(xmap, xmap.resolution, options.directory) + rscc = validator.rscc(ligand) + print(rscc) - # Now that the conformers have been generated, the resulting - # # conformations should be examined via GoodnessOfFit: - validator = Validator(xmap, xmap.resolution, options.directory) - rscc = validator.rscc(ligand) - print(rscc) + csv_filename = f"{options.pdb}_rscc.csv" - csv_filename = f"{options.pdb}_rscc.csv" + # Write to CSV + with open(csv_filename, "w", newline="") as csvfile: + writer = csv.writer(csvfile) + # Write the header + writer.writerow(["PDB", "RSCC"]) + # Write the data + writer.writerow([options.pdb, rscc]) - # Write to CSV - with open(csv_filename, 'w', newline='') as csvfile: - writer = csv.writer(csvfile) - # Write the header - writer.writerow(["PDB", "RSCC"]) - # Write the data - writer.writerow([options.pdb, rscc]) if __name__ == "__main__": main() diff --git a/scripts/post/find_close_residues.py b/scripts/post/find_close_residues.py index 17f03fa93..dd2f5469f 100644 --- a/scripts/post/find_close_residues.py +++ b/scripts/post/find_close_residues.py @@ -22,7 +22,6 @@ from qfit.structure.rotamers import ROTAMERS - def parse_args(): p = argparse.ArgumentParser(description=__doc__) p.add_argument("structure", type=str, help="PDB-file containing structure.") @@ -69,11 +68,11 @@ def main(): close_res.loc[n, "res_id"] = residue_id close_res.loc[n, "chain"] = chain n += 1 - close_res.to_csv(pdb_name + "_" + args.ligand + "_" +str(args.dist) + "_closeres.csv", - index=False, - ) + close_res.to_csv( + pdb_name + "_" + args.ligand + "_" + str(args.dist) + "_closeres.csv", + index=False, + ) if __name__ == "__main__": main() - diff --git a/scripts/post/lig_occ.py b/scripts/post/lig_occ.py index c0a82bd06..094d8c6a8 100644 --- a/scripts/post/lig_occ.py +++ b/scripts/post/lig_occ.py @@ -42,7 +42,8 @@ def get_occ(structure, ligand, pdb): ) ) occ = pd.DataFrame( - lig, columns=["PDB", "ligand_name", "chain", "min_occ", "average_b", "num_altloc"] + lig, + columns=["PDB", "ligand_name", "chain", "min_occ", "average_b", "num_altloc"], ) occ.to_csv(pdb + "_ligand_occupancy.csv", index=False) diff --git a/setup.py b/setup.py index 243e99ca2..b0d26664a 100644 --- a/setup.py +++ b/setup.py @@ -27,6 +27,8 @@ def main(): install_requires = [ "numpy>=1.20,<2", "scipy>=1.0", + "cvxpy", + "pyscipopt", "pandas>=1.2", "pyparsing>=2.2.0", "tqdm>=4.0.0", @@ -44,10 +46,6 @@ def main(): ext_modules=ext_modules, setup_requires=setup_requires, install_requires=install_requires, - extra_dependencies={ - "cplex": ["cvxopt", "cplex"], - "osqp": ["osqp", "miosqp @ git+https://github.com/osqp/miosqp.git@ac672338b0593d865dd15b7a76434f25e24244a9#egg=miosqp"], - }, zip_safe=False, python_requires=">=3.9", entry_points={ diff --git a/src/qfit/qfit.py b/src/qfit/qfit.py index 1420a5473..c45ecdf70 100644 --- a/src/qfit/qfit.py +++ b/src/qfit/qfit.py @@ -70,7 +70,7 @@ def __init__(self): # Sampling options self.clash_scaling_factor = 0.75 self.external_clash = False - self.dofs_per_iteration = 1 + self.dofs_per_iteration = 1 self.dihedral_stepsize = 6 self.hydro = False self.rmsd_cutoff = 0.01 @@ -104,7 +104,6 @@ def __init__(self): self.rotamer_neighborhood = 24 self.remove_conformers_below_cutoff = False - # General settings # Exclude certain atoms always during density and mask creation to # influence QP / MIQP. Provide a list of atom names, e.g. ['N', 'CA'] @@ -319,7 +318,7 @@ def _solve_miqp( model_params_per_atom = 3 + int(self.options.sample_bfactors) k = ( model_params_per_atom * natoms * nconfs * 1.5 - ) #hyperparameter 1.5 determined to be the best cut off between too many conformations and improving Rfree + ) # hyperparameter 1.5 determined to be the best cut off between too many conformations and improving Rfree if segment is not None: k = nconfs # for segment, we only care about the number of conformations come out of MIQP. Considering atoms penalizes this too much BIC = n * np.log(rss / n) + k * np.log(n) @@ -414,7 +413,7 @@ def _zero_out_most_similar_conformer(self): if ( self.options.write_intermediate_conformers ): # Output all conformations before we remove them - self._write_intermediate_conformers(prefix="cplex_remove") + self._write_intermediate_conformers(prefix="qp_remove") self._occupancies[idx_to_zero] = 0 def _update_conformers(self, cutoff=0.002): @@ -478,7 +477,6 @@ def __init__(self, residue, structure, xmap, options): self.resi, self.icode = residue.id self.identifier = f"{self.chain}/{self.resn}{''.join(map(str, residue.id))}" - # Check if residue has complete heavy atoms. If not, complete it. expected_atoms = np.array(self.residue._rotamers["atoms"]) missing_atoms = np.isin( @@ -738,7 +736,9 @@ def _sample_backbone(self): # Choose direction vectors as Cβ-Cα, C-N, and then (Cβ-Cα × C-N) # Find coordinates of Cα, Cβ, N atoms pos_CA = self.residue.extract("name", "CA").coor[0] - pos_CB = self.residue.extract("name", atom_name).coor[0] # Position of CB for all residues except for GLY, which is position of O + pos_CB = self.residue.extract("name", atom_name).coor[ + 0 + ] # Position of CB for all residues except for GLY, which is position of O pos_N = self.residue.extract("name", "N").coor[0] # Set x, y, z = Cβ-Cα, Cα-N, (Cβ-Cα × Cα-N) vec_x = pos_CB - pos_CA @@ -855,7 +855,7 @@ def _sample_angle(self): new_bs = [] for coor in self._coor_set: self.residue.coor = coor - # Initialize rotator + # Initialize rotator perp_rotator = CBAngleRotator(self.residue) # Rotate about the axis perpendicular to CB-CA and CB-CG vectors for perp_angle in angles: @@ -875,14 +875,14 @@ def _sample_angle(self): mask = self.residue.e[active_mask] != "H" if np.min(values[mask]) < self.options.density_cutoff: continue - + # Move on if these coordinates cause a clash if self.options.external_clash: if self._cd() and self.residue.clashes(): continue elif self.residue.clashes(): continue - + # Valid, non-clashing conformer found! new_coor_set.append(self.residue.coor) new_bs.append(self.conformer.b) @@ -1029,7 +1029,7 @@ def _sample_sidechain(self): logger.warning( f"[{self.identifier}] Too many conformers generated ({len(self._coor_set)}). Splitting QP scoring." ) - + if not self._coor_set: msg = ( "No conformers could be generated. Check for initial " @@ -1044,9 +1044,9 @@ def _sample_sidechain(self): self._write_intermediate_conformers( prefix=f"sample_sidechain_iter{iteration}" ) - + if len(self._coor_set) <= 15000: - # If <15000 conformers are generated, QP score conformer occupancy normally + # If <15000 conformers are generated, QP score conformer occupancy normally self._convert() self._solve_qp() self._update_conformers() @@ -1087,7 +1087,7 @@ def _sample_sidechain(self): self._coor_set = section_2_coor self._bs = section_2_bs - # QP score the second section + # QP score the second section self._convert() self._solve_qp() self._update_conformers() @@ -1397,7 +1397,7 @@ def find_paths(self, segment_original): logger.debug("MIQP failed, dropping a fragment-conformer") self._zero_out_most_similar_conformer() # Remove conformer if self.options.write_intermediate_conformers: - self._write_intermediate_conformers(prefix="cplex_kept") + self._write_intermediate_conformers(prefix="miqp_kept") continue else: # No Exceptions here! Solvable! diff --git a/src/qfit/qfit_protein.py b/src/qfit/qfit_protein.py index 1e29d0373..0891e2015 100644 --- a/src/qfit/qfit_protein.py +++ b/src/qfit/qfit_protein.py @@ -69,7 +69,7 @@ def build_argparser(): "Chain, residue id, and optionally insertion code for " "residue in structure, e.g. A,105, or A,105:A.", ) - + # Map options p.add_argument( "-l", @@ -420,9 +420,9 @@ def run(self): self.pdb = self.options.pdb + "_" else: self.pdb = "" - - if self.options.residue is not None: #run qFit residue - multiconformer = self._run_qfit_residue_parallel() + + if self.options.residue is not None: # run qFit residue + multiconformer = self._run_qfit_residue_parallel() elif self.options.only_segment: multiconformer = self._run_qfit_segment(self.structure) multiconformer = self._create_refine_restraints(multiconformer) @@ -481,7 +481,7 @@ def _run_qfit_residue_parallel(self): .extract("resn", "HOH", "!=") .single_conformer_residues ) - + else: residues = list( self.structure.extract("record", "HETATM", "!=") @@ -608,7 +608,7 @@ def _error_cb(e): ) continue residue_multiconformer = Structure.fromfile(fname) - fname = f'{residue.shortcode}_qFit_residue.pdb' + fname = f"{residue.shortcode}_qFit_residue.pdb" residue_multiconformer.tofile(fname) if self.options.residue is None: @@ -642,7 +642,6 @@ def _error_cb(e): residue_multiconformer ) - # Write out multiconformer_model.pdb only if in debug mode. # This output is not a final qFit output, so it might confuse users. if self.options.debug: @@ -679,7 +678,7 @@ def _run_qfit_segment(self, multiconformer): ) if self.options.scale or self.options.cryst_info: multiconformer.tofile( - fname, self.options.scale_info, self.options.cryst_info + fname, self.options.scale_info, self.options.cryst_info ) else: multiconformer.tofile(fname) diff --git a/src/qfit/samplers.py b/src/qfit/samplers.py index f0834aeea..03983a7cc 100644 --- a/src/qfit/samplers.py +++ b/src/qfit/samplers.py @@ -134,20 +134,21 @@ def __call__(self, angle): R @ self._coor_to_rotate.T ).T + self._origin + class BisectingAngleRotator: - """" - Deflects a residue's sidechain by bending the CA-CB-CG angle. Rotate the aromatic side chain about an axis bisecting this angle. + """ " + Deflects a residue's sidechain by bending the CA-CB-CG angle. Rotate the aromatic side chain about an axis bisecting this angle. Attributes: residue (qfit._BaseResidue): Residue being manipulated. atoms_to_rotate (np.ndarray[int]): Atom indices that will be moved by the flexion. """ - + def __init__(self, residue): """ Inits a BisectingAngleRotator to rotate the sidechain about an axis bisecting the CA-CB-CG angle of a residue - + Args: residue (qfit._BaseResidue): Residue to manipulate. """ @@ -165,7 +166,7 @@ def __init__(self, residue): # Define the origin of rotation as the coordinates of the CB atom self._origin = self.residue.extract("name", "CB").coor[0] - # Get the coordinates of the atoms to rotate + # Get the coordinates of the atoms to rotate self._coor_to_rotate = self.residue._coor[self.atoms_to_rotate] self._coor_to_rotate -= self._origin # Extract the coordinates of the CA and CG atoms, and translate them relative to the origin @@ -177,12 +178,11 @@ def __init__(self, residue): vec_CG = axis_coor[1] vec_CA /= np.linalg.norm(vec_CA) vec_CG /= np.linalg.norm(vec_CG) - # Define a new rotation axis as the normalized sum of vec_CA and vec_CG. This makes the axis bisect the angle between these two vectors + # Define a new rotation axis as the normalized sum of vec_CA and vec_CG. This makes the axis bisect the angle between these two vectors new_axis = vec_CA + vec_CG new_axis /= np.linalg.norm(new_axis) self.new_axis = new_axis - # Align the new rotation axis to the Z-axis to simplify the rotation transformation aligner = ZAxisAligner(new_axis) self._forward = aligner.forward_rotation @@ -196,6 +196,8 @@ def __call__(self, angle): self.residue._coor[self.atoms_to_rotate] = ( R @ self._coor_to_rotate.T ).T + self._origin + + class GlobalRotator: """Rotate ligand around its center.""" diff --git a/src/qfit/solvers.py b/src/qfit/solvers.py index 01c92d24a..49dab3495 100644 --- a/src/qfit/solvers.py +++ b/src/qfit/solvers.py @@ -11,13 +11,14 @@ import scipy as sci import scipy.sparse # pylint: disable=unused-import from numpy.typing import NDArray -from cplex.exceptions import CplexSolverError + +import cvxpy as cp from .utils.optional_lazy_import import lazy_load_module_if_available logger = logging.getLogger(__name__) -SolverError: tuple[type[Exception], ...] = (RuntimeError) +SolverError: tuple[type[Exception], ...] = RuntimeError __all__ = [ "available_qp_solvers", @@ -148,579 +149,104 @@ def solve_miqp( ############################### -class CVXOPTSolver(QPSolver): - driver_pkg_name = "cvxopt" - driver = lazy_load_module_if_available(driver_pkg_name) - if TYPE_CHECKING: - import cvxopt - - def __init__(self, target: NDArray[np.float_], models: NDArray[np.float_]) -> None: - if TYPE_CHECKING: - self.driver = self.cvxopt - assert self.driver is not None - - # Initialize variables - self.target = target - self.models = models - - # self.quad_obj: self.driver.matrix - # self.lin_obj: self.driver.matrix - # self.le_constraints: self.driver.spmatrix - # self.le_bounds: self.driver.matrix - - self.weights: Optional[NDArray[np.float_]] = None - self.objective_value: Optional[float] = None - - self.solution: dict[str, Any] - self.nconformers = models.shape[0] - - # Get the driver & set options - self.driver.solvers.options["show_progress"] = False - self.driver.solvers.options["abstol"] = 1e-8 - self.driver.solvers.options["reltol"] = 1e-7 - self.driver.solvers.options["feastol"] = 1e-8 - - def solve_qp(self) -> None: - if TYPE_CHECKING: - self.driver = self.cvxopt - assert self.driver is not None - - # Set up the matrices and restraints - logger.debug( - "Building cvxopt matrix, size: (%i,%i)", - self.nconformers, - self.nconformers, - ) - - # minimize 0.5 x.T P x + q.T x - # where P = self.quad_obj = ρ_model.T ρ_model - # q = self.lin_obj = - ρ_model.T ρ_obs - self.quad_obj = self.driver.matrix(np.inner(self.models, self.models), tc="d") - self.lin_obj = self.driver.matrix(-np.inner(self.models, self.target), tc="d") - - # subject to: - # G x ≤ h - # where G = self.le_constraints - # h = self.le_bounds - # - # Each weight x falls in the closed interval [0..1], and the sum of all weights is <= 1. - # This corresponds to (2 * nconformers + 1) constraints, imposed on (nconformers) variables: - # the lower bound accounts for (nconformers) constraints, - # the upper bound accounts for (nconformers) constraints, - # the summation constraint accounts for 1. - # We construct G with a sparse matrix. - # constraint idx=[0..N) constraint idx=[N..2N) constraint idx=2N - rowidxs = [*range(0, self.nconformers)] + [*range(self.nconformers, 2 * self.nconformers)] + (self.nconformers * [2 * self.nconformers]) # fmt:skip - # -x_i ≤ 0 x_i ≤ 1 Σ_i x_i ≤ 1 - coeffs = (self.nconformers * [-1.0]) + (self.nconformers * [1.0]) + (self.nconformers * [1.0]) # fmt:skip - colidxs = [*range(0, self.nconformers)] + [*range(0, self.nconformers)] + [*range(0, self.nconformers)] # fmt:skip - threshs = (self.nconformers * [0.0]) + (self.nconformers * [1.0]) + [1.0] # fmt:skip - self.le_constraints = self.driver.spmatrix(coeffs, rowidxs, colidxs, tc="d") - self.le_bounds = self.driver.matrix(threshs, tc="d") - - # Solve - self.solution = self.driver.solvers.qp( - self.quad_obj, self.lin_obj, self.le_constraints, self.le_bounds - ) - - # Store the density residual and the weights - self.objective_value = ( - 2 * self.solution["primal objective"] + - np.inner(self.target, self.target) - ) # fmt:skip - self.weights = np.asarray(self.solution["x"]).ravel() - - -class CPLEXSolver(QPSolver, MIQPSolver): - driver_pkg_name = "cplex" +class CVXPYSolver(QPSolver, MIQPSolver): + driver_pkg_name = "cvxpy" driver = lazy_load_module_if_available(driver_pkg_name) - if TYPE_CHECKING: - import cplex - def __init__( - self, target: NDArray[np.float_], models: NDArray[np.float_], nthreads: int = 1 - ) -> None: - if TYPE_CHECKING: - self.driver = self.cplex - assert self.driver is not None - # Initialize variables + def __init__(self, target, models, nthreads=1): self.target = target self.models = models - self.quad_obj: list[object] = [] - self.lin_obj: list[tuple[int, float]] = [] + self.quad_obj = None + self.lin_obj = None - self.weights: Optional[NDArray[np.float_]] = None - self.objective_value: Optional[float] = None + self.weights = None + self._objective_value = None self.nconformers = models.shape[0] - self.nthreads = nthreads - - # Get the driver & append raisable Exceptions to SolverError class in module (global) scope - global SolverError - SolverErrors = (SolverError, CplexSolverError) - - def compute_quadratic_coeffs(self) -> None: - """Precompute the quadratic coefficients (P, q). - - These values don't depend on threshold/cardinality. - Having these objectives pre-computed saves a few cycles when MIQP - is evaluated for multiple values of threshold/cardinality. - """ - if TYPE_CHECKING: - self.driver = self.cplex - assert self.driver is not None - + self.valid_indices = [] + self.redundant_indices = [] + + def find_redundant_conformers(self, threshold=1e-6): + for i in range(self.nconformers): + if i in self.redundant_indices: + continue + self.valid_indices.append(i) + for j in range(i + 1, self.nconformers): + if j in self.redundant_indices: + continue + if np.linalg.norm(self.models[i] - self.models[j]) < threshold: + self.redundant_indices.append(j) + assert len(self.valid_indices) + len(self.redundant_indices) == self.nconformers + + def compute_quadratic_coeffs(self): # minimize 0.5 x.T P x + q.T x # where P = self.quad_obj = ρ_model.T ρ_model # q = self.lin_obj = - ρ_model.T ρ_obs - quad_obj_coeffs = np.inner(self.models, self.models) - lin_obj_coeffs = -np.inner(self.models, self.target) - - # We have to unpack the arrays for CPLEX into CSR format (even though they're dense) - self.quad_obj = [] - for row in quad_obj_coeffs: - idxs, vals = zip(*enumerate(row.tolist())) - self.quad_obj.append(self.driver.SparsePair(ind=idxs, val=vals)) - - # CPLEX requires linear objectives as a list of (idx, val) tuples - self.lin_obj = list(enumerate(lin_obj_coeffs)) - - def solve_miqp( - self, - threshold: Optional[float] = None, - cardinality: Optional[int] = None, - exact: bool = False, - ) -> None: - if TYPE_CHECKING: - self.driver = self.cplex - assert self.driver is not None - - if not (self.quad_obj and self.lin_obj): - self.compute_quadratic_coeffs() - - # Create and configure the cplex object - miqp = self.driver.Cplex() - miqp.set_results_stream(None) - miqp.set_log_stream(None) - miqp.set_warning_stream(None) - miqp.set_error_stream(None) - if self.nthreads is not None: - miqp.parameters.threads.set(self.nthreads) - - # Create variables and set linear constraints - # w_i ≤ 1 - variable_names = [f"w{n}" for n in range(self.nconformers)] - upper_bounds = self.nconformers * [1.0] - miqp.variables.add(names=variable_names, ub=upper_bounds) - - # Σ_i w_i ≤ 1 - ind = [f"w{n}" for n in range(self.nconformers)] - val = self.nconformers * [1.0] - miqp.linear_constraints.add( - lin_expr=[self.driver.SparsePair(ind=ind, val=val)], - senses=["L"], - rhs=[1.0], + # note that ρ_model is the transpose of self.models + self.find_redundant_conformers() + self.quad_obj = ( + self.models[self.valid_indices] @ self.models[self.valid_indices].T ) - - # Setup quadratic objective of the MIQP - miqp.objective.set_quadratic(self.quad_obj) - miqp.objective.set_linear(self.lin_obj) - miqp.objective.set_sense(miqp.objective.sense.minimize) - - # If cardinality or threshold is specified, the problem is a MIQP, - # so we need to add binary integer variables z_i. - if cardinality not in (None, 0) or threshold not in (None, 0): - # z_i ∈ {0, 1} - integer_names = [f"z{n}" for n in range(self.nconformers)] - variable_types = self.nconformers * miqp.variables.type.binary - miqp.variables.add(names=integer_names, types=variable_types) - - # Only count weights for which z_i is 1 - # w_i - z_i ≤ 0 - # (∵ z_i ∈ {0,1} and 0 ≤ w_i ≤ 1, this is only true when z_i = 1) - lin_expr = [ - self.driver.SparsePair(ind=[f"w{n}", f"z{n}"], val=[1, -1]) - for n in range(self.nconformers) - ] - senses = self.nconformers * ["L"] - rhs = self.nconformers * [0.0] - miqp.linear_constraints.add(lin_expr=lin_expr, senses=senses, rhs=rhs) - - # Set the threshold constraint if applicable - if threshold not in (None, 0): - # tdmin z_i - w_i ≤ 0, i.e. w_i ≥ tdmin - lin_expr = [ - self.driver.SparsePair(ind=[f"z{n}", f"w{n}"], val=[threshold, -1]) - for n in range(self.nconformers) - ] - senses = self.nconformers * ["L"] - rhs = self.nconformers * [0.0] - miqp.linear_constraints.add(lin_expr=lin_expr, senses=senses, rhs=rhs) - - # Set the cardinality constraint if applicable - if cardinality not in (None, 0): - # Σ z_i ≤ cardinality - cardinality = cast(int, cardinality) # typing noop - ind = [f"z{n}" for n in range(self.nconformers)] - val = self.nconformers * [1.0] - lin_expr = [self.driver.SparsePair(ind=ind, val=val)] - if exact: - senses = ["E"] - rhs = [min(cardinality, self.nconformers)] + self.lin_obj = -1 * self.models[self.valid_indices] @ self.target + + def construct_weights(self): + self.weights = [] + j = 0 + for i in range(self.nconformers): + if i in self.redundant_indices: + self.weights.append(0) else: - senses = ["L"] - rhs = [cardinality] - miqp.linear_constraints.add(lin_expr=lin_expr, senses=senses, rhs=rhs) - - try: - result = miqp.solve() - except CplexSolverError: - raise SolverError("CPLEX encountered an error: Non-convex objective function") - - - # Store the density residual and the weights - self.objective_value = ( - 2 * miqp.solution.get_objective_value() - + np.inner(self.target, self.target) - ) # fmt:skip - self.weights = np.array(miqp.solution.get_values()[: self.nconformers]) - - # Close the cplex object - miqp.end() - - def solve_qp(self) -> None: - if TYPE_CHECKING: - self.driver = self.cplex - assert self.driver is not None - - # We can re-use the MIQP code above, provided no cardinality or threshold. - self.solve_miqp(threshold=None, cardinality=None, exact=False) - - -class OSQPSolver(QPSolver): - driver_pkg_name = "osqp" - driver = lazy_load_module_if_available(driver_pkg_name) - if TYPE_CHECKING: - import osqp - - OSQP_SETTINGS = { - "eps_abs": 1e-06, - "eps_rel": 1e-06, - "eps_prim_inf": 1e-07, - "verbose": False, - } - - def __init__(self, target: NDArray[np.float_], models: NDArray[np.float_]) -> None: - self.target = target - self.models = models - - self.nconformers = models.shape[0] - - self.quad_obj: sci.sparse.csc_matrix - self.lin_obj: NDArray[np.float_] - self.constraints: sci.sparse.csc_matrix - self.lower_bounds: NDArray[np.float_] - self.upper_bounds: NDArray[np.float_] - - self.weights: Optional[NDArray[np.float_]] = None - self.objective_value: Optional[float] = None - - def compute_quadratic_coeffs(self) -> None: - # minimize 0.5 x.T P x + q.T x - # where P = self.quad_obj = ρ_model.T ρ_model - # q = self.lin_obj = - ρ_model.T ρ_obs - quad_obj_coeffs = np.inner(self.models, self.models) - lin_obj_coeffs = -np.inner(self.models, self.target) - - # OSQP requires quadratic objectives in a scipy CSC sparse matrix - self.quad_obj = sci.sparse.csc_matrix(quad_obj_coeffs) - self.lin_obj = lin_obj_coeffs - - def compute_constraints(self) -> None: - # subject to: - # l ≤ A x ≤ u - # where l = self.lower_bounds - # A = self.constraints - # u = self.upper_bounds - # - # Each weight x falls in the closed interval [0..1], and the sum of all weights is <= 1. - # This corresponds to (1 * nconformers + 1) constraints, imposed on (nconformers) variables: - # the lower and upper bounds account for (nconformers) constraints, - # the summation constraint accounts for 1. - shape = (self.nconformers + 1, self.nconformers) - # We construct A with a sparse matrix. - # constraint idx=[0..N) constraint idx=N - rowidxs = [*range(0, self.nconformers)] + (self.nconformers * [self.nconformers]) # fmt:skip - # 0 ≤ x_i ≤ 1 0 ≤ Σ_i x_i ≤ 1 - coeffs = (self.nconformers * [1.0]) + (self.nconformers * [1.0]) # fmt:skip - colidxs = [*range(0, self.nconformers)] + [*range(0, self.nconformers)] # fmt:skip - lowers = (self.nconformers * [0.0]) + [0.0] # fmt:skip - uppers = (self.nconformers * [1.0]) + [1.0] # fmt:skip - - self.constraints = sci.sparse.csc_matrix( - (coeffs, (rowidxs, colidxs)), - shape=shape, - ) - self.lower_bounds = np.array(lowers) - self.upper_bounds = np.array(uppers) - - def solve_qp(self) -> None: - if TYPE_CHECKING: - self.driver = self.osqp - assert self.driver is not None - - self.compute_quadratic_coeffs() - self.compute_constraints() - - qp = self.driver.OSQP() - qp.setup( - P=self.quad_obj, - q=self.lin_obj, - A=self.constraints, - l=self.lower_bounds, - u=self.upper_bounds, - **self.OSQP_SETTINGS, - ) - result = qp.solve() - - self.weights = np.array(result.x).ravel() - self.objective_value = ( - 2 * result.info.obj_val - + np.inner(self.target, self.target) - ) # fmt:skip - - -class MIOSQPSolver(MIQPSolver): - driver_pkg_name = "miosqp" - driver = lazy_load_module_if_available(driver_pkg_name) - if TYPE_CHECKING: - import miosqp - - MIOSQP_SETTINGS = { - # integer feasibility tolerance - "eps_int_feas": 1e-06, - # maximum number of iterations - "max_iter_bb": 10000, - # tree exploration rule - # [0] depth first - # [1] two-phase: depth first until first incumbent and then best bound - "tree_explor_rule": 0, - # branching rule - # [0] max fractional part - "branching_rule": 0, - "verbose": False, - "print_interval": 1, - } - - OSQP_SETTINGS = { - "eps_abs": 1e-06, - "eps_rel": 1e-06, - "eps_prim_inf": 1e-07, - "verbose": False, - } - - def __init__(self, target: NDArray[np.float_], models: NDArray[np.float_]) -> None: - self.target = target - self.models = models - - self.nconformers = models.shape[0] - - self.quad_obj: Optional[sci.sparse.csc_matrix] = None - self.lin_obj: Optional[NDArray[np.float_]] = None - self.constraints: sci.sparse.csc_matrix - self.lower_bounds: NDArray[np.float_] - self.upper_bounds: NDArray[np.float_] - self.binary_vars: NDArray[np.int_] - self.lower_ints: NDArray[np.int_] - self.upper_ints: NDArray[np.int_] - - self.weights: Optional[NDArray[np.float_]] = None - self.objective_value: Optional[float] = None - - # Append raisable Exceptions to SolverError class in module (global) scope - global SolverError - SolverErrors = (SolverError, CplexSolverError) - - def compute_quadratic_coeffs(self) -> None: - """Precompute the quadratic coefficients (P, q). - - These values don't depend on threshold/cardinality. - Having these objectives pre-computed saves a few cycles when MIQP - is evaluated for multiple values of threshold/cardinality. - """ - # Since this is an MIQP problem, the objective function f(x) - # takes the vector x = [w_0 .. w_i, z_0 .. z_i], - # where w_i are the weights, - # z_i are the integer selections. - - # We wish to minimize 0.5 x.T P x + q.T x - # where P = self.quad_obj = ρ_model.T ρ_model - # q = self.lin_obj = - ρ_model.T ρ_obs - quad_obj_coeffs = np.inner(self.models, self.models) - lin_obj_coeffs = -np.inner(self.models, self.target) - - # This is sufficient for the non-integer weights, - # but we must extend our matrix so that - # P is of shape (2*nconfs, 2*nconfs), and - # q is of len (2*nconfs), - # to match the dimensions of vector x. - P_shape = (2 * self.nconformers, 2 * self.nconformers) - - # MIOSQP requires quadratic objectives in a scipy CSC sparse matrix - # Get an index into the dense _quad_obj_coeffs array, - # and construct P with csc_array((data, (row_ind, col_ind)), [shape=(M, N)]). - rowidx, colidx = np.indices(quad_obj_coeffs.shape) - self.quad_obj = sci.sparse.csc_matrix( - (quad_obj_coeffs.ravel(), (rowidx.ravel(), colidx.ravel())), - shape=P_shape, - ) - # Extend q to appropriate shape - self.lin_obj = np.append(lin_obj_coeffs, np.zeros((self.nconformers,))) - - def compute_mixed_int_constraints( - self, - threshold: Optional[float], - cardinality: Optional[int], - ) -> None: - # subject to: - # l ≤ A x ≤ u - # x[i] ∈ Z, for i in i_idx - # il[i] <= x[i] <= iu[i], for i in i_idx - # where l = self.lower_bounds - # A = self.constraints - # u = self.upper_bounds - # i_idx = self.binary_vars: a vector of indices of which variables are integer - # il = self.lower_ints: the lower bounds on the integer variables - # iu = self.upper_ints: the upper bounds on the integer variables. - - # We will construct A with a sparse matrix. - - # Presently, we have no constraints. - n_constraints = 0 - # Our constraints will be applied over the variables [w_0 .. w_i, z_0 .. z_i], - # where w_i are the weights, - # z_i are the integer selections. - n_vars = 2 * self.nconformers - - # Since cardinality or threshold will be specified, the problem is a MIQP, - # so we need to add binary integer variables z_i. - # z_i ∈ {0, 1} - self.binary_vars = np.arange(self.nconformers, 2 * self.nconformers) - self.lower_ints = np.array(self.nconformers * [0]) - self.upper_ints = np.array(self.nconformers * [1]) - - # fmt:off - - # Each weight w_i falls in the closed interval [0..1], and the sum of all weights is <= 1. - # This corresponds to (1 * nconformers + 1) constraints, imposed on (nconformers) variables: - # the lower and upper bounds account for (nconformers) constraints, - # the summation constraint accounts for 1. - # constraint idx=[0..N) constraint idx=N - rowidx = [*range(0, self.nconformers)] + (self.nconformers * [self.nconformers]) - # 0 ≤ w_i ≤ 1 0 ≤ Σ_i w_i ≤ 1 - coeffs = (self.nconformers * [1.0]) + (self.nconformers * [1.0]) - colidx = [*range(0, self.nconformers)] + [*range(0, self.nconformers)] - lowers = (self.nconformers * [0.0]) + [0.0] - uppers = (self.nconformers * [1.0]) + [1.0] - n_constraints += self.nconformers + 1 - - # Introduce an implicit cardinality constraint - # Only count weights for which z_i is 1 - # 0 <= z_i - w_i <= 1 - # (∵ z_i ∈ {0,1} and 0 ≤ w_i ≤ 1, this is only true when z_i = 1) - # constraint idx=[N+1..2N+1) - rowidx += ( - [*range(n_constraints, n_constraints + self.nconformers)] + - [*range(n_constraints, n_constraints + self.nconformers)] - ) - # 0 <= -w_i + z_i <= 1 - coeffs += ((self.nconformers * [-1.0]) + (self.nconformers * [1.0])) - colidx += ([*range(0, self.nconformers)] + [*range(self.nconformers, 2 * self.nconformers)]) - lowers += self.nconformers * [0.0] - uppers += self.nconformers * [1.0] - n_constraints += self.nconformers - - # Set the threshold constraint if applicable - # tdmin z_i - w_i ≤ 0, i.e. w_i ≥ tdmin - if threshold is not None: - # constraint idx=[2N+1..3N+1) - rowidx += ( - [*range(n_constraints, n_constraints + self.nconformers)] + - [*range(n_constraints, n_constraints + self.nconformers)] - ) - # 0 <= wi - t * zi <= 1 - coeffs += (self.nconformers * [1.0]) + (self.nconformers * [-threshold]) - colidx += [*range(0, self.nconformers)] + [*range(self.nconformers, 2 * self.nconformers)] - lowers += (self.nconformers * [0.0]) - uppers += (self.nconformers * [1.0]) - n_constraints += self.nconformers - - # fmt:on - - # Set the cardinality constraint if applicable - # Σ z_i ≤ cardinality - if cardinality is not None: - # constraint idx=2N+2, if no threshold constraint; or - # constraint idx=3N+2, if a threshold constraint was applied - rowidx += self.nconformers * [n_constraints] - # 0 <= Σ z_i <= cardinality - coeffs += self.nconformers * [1] - colidx += [*range(self.nconformers, 2 * self.nconformers)] - lowers += [0] - uppers += [cardinality] - n_constraints += 1 - - # MIOSQP requires constraints as a sparse matrix - self.constraints = sci.sparse.csc_matrix( - (coeffs, (rowidx, colidx)), - shape=(n_constraints, n_vars), - ) - self.lower_bounds = np.array(lowers) - self.upper_bounds = np.array(uppers) + self.weights.append(self._weights[j]) + j += 1 + self.weights = np.array(self.weights) + assert len(self.weights) == self.nconformers - def solve_miqp( - self, - threshold: Optional[float] = None, - cardinality: Optional[int] = None, - exact: bool = False, - ) -> None: - if TYPE_CHECKING: - self.driver = self.miosqp - assert self.driver is not None - - if cardinality is threshold is None: - raise ValueError("Set either cardinality or threshold.") + def solve_miqp(self, threshold=0, cardinality=0): + if self.quad_obj is None or self.lin_obj is None: + self.compute_quadratic_coeffs() - if not (self.quad_obj and self.lin_obj): + m = len(self.valid_indices) + P = self.quad_obj + q = self.lin_obj + + w = cp.Variable(m) + z = cp.Variable(m, boolean=True) + objective = cp.Minimize(0.5 * cp.quad_form(w, cp.psd_wrap(P)) + q.T @ w) + constraints = [np.ones(m).T @ w <= 1] + if threshold: + constraints += [w - z <= 0, w >= threshold * z] + # The first constraint requires w_i to be zero if z_i is + # The second requires that each non-zero w_i is greater than the threshold + if cardinality: + constraints += [w - z <= 0, np.ones(m).T @ z <= cardinality] + prob = cp.Problem(objective, constraints) + prob.solve(solver="SCIP") + self.objective_value = prob.value + # I'm not sure why objective_values is calculated this way, but doing + # so to be compatible with the former CPLEXSolver class + self.objective_value = 2 * prob.value + self.target.T @ self.target + self._weights = w.value + self.construct_weights() + + def solve_qp(self): + if self.quad_obj is None or self.lin_obj is None: self.compute_quadratic_coeffs() - self.compute_mixed_int_constraints(threshold, cardinality) - - # Construct the MIOSQP solver & solve! - miqp = self.driver.MIOSQP() - miqp.setup( - P=self.quad_obj, - q=self.lin_obj, - A=self.constraints, - l=self.lower_bounds, - u=self.upper_bounds, - i_idx=self.binary_vars, - i_l=self.lower_ints, - i_u=self.upper_ints, - settings=self.MIOSQP_SETTINGS, - qp_settings=self.OSQP_SETTINGS, - ) - try: - result = miqp.solve() - except CplexSolverError: - raise SolverError("CPLEX encountered an error: Non-convex objective function") - - # Destructure results - self.weights = np.array(result.x[0 : self.nconformers]) - self.objective_value = ( - 2 * result.upper_glob - + np.inner(self.target, self.target) - ) # fmt:skip + + m = len(self.valid_indices) + P = self.quad_obj + q = self.lin_obj + w = cp.Variable(m) + objective = cp.Minimize(0.5 * cp.quad_form(w, cp.psd_wrap(P)) + q.T @ w) + constraints = [w >= np.zeros(m), np.ones(m).T @ w <= 1] + prob = cp.Problem(objective, constraints) + prob.solve() + self.objective_value = prob.value + # I'm not sure why objective_values is calculated this way, but doing + # so to be compatible with the former CPLEXSolver class + self.objective_value = 2 * prob.value + self.target.T @ self.target + self._weights = w.value + self.construct_weights() ############################### diff --git a/src/qfit/structure/structure.py b/src/qfit/structure/structure.py index 7787fad90..0126f4bb3 100644 --- a/src/qfit/structure/structure.py +++ b/src/qfit/structure/structure.py @@ -350,7 +350,7 @@ def reorder(self): def normalize_occupancy(self): """This function will scale the occupancy of protein residues to make the sum(occ) equal to 1 for all. - The goal of this function is to determine if the sum(occ) of each residue coming out of qFit residue or segement is < 1 (as it can be in CPLEX), + The goal of this function is to determine if the sum(occ) of each residue coming out of qFit residue or segement is < 1 (as it can be due to QP), and scale each alt conf occupancy for the residue to sum to one. To accomplish this, if the sum(occ) is less than 1, then we will divide each conformer occupancy by the sum(occ). """ diff --git a/tests/test_solvers.py b/tests/test_solvers.py index 9829f813e..c281bd3ed 100644 --- a/tests/test_solvers.py +++ b/tests/test_solvers.py @@ -74,7 +74,7 @@ def test_miqp_solver_with_cardinality_3( expected_weights = np.array([1 / 3, 1 / 3, 1 / 3]) expected_objective = 0.0 assert np.allclose(solver.weights, expected_weights, atol=1e-3) - assert np.isclose(solver.objective_value, expected_objective, atol=1e-6) + assert np.isclose(solver.objective_value, expected_objective, atol=1e-4) def test_miqp_solver_with_cardinality_2( self, solver_class: type[qfit.solvers.MIQPSolver]