Skip to content

Releases: CDDLeiden/DrugEx

v3.4.7

06 May 13:32
Compare
Choose a tag to compare

Change Log

From v3.4.6 to v3.4.7

Fixes

  • Prevent the QSPRpred scorer from crashing when an empty list of molecules is supplied. The scorer now returns an empty list of scores and outputs a warning.

Changes

  • update depency rdkit-pypi to rdkit

Removed Features

None.

New Features

None.

v3.4.6

07 Mar 16:15
Compare
Choose a tag to compare

Change Log

From v3.4.5 to v3.4.6

Fixes

None.

Changes

  • For the generator CLI, environment variables are now read from the generator meta file automatically. Now unused arguments are removed from the CLI.
  • Compatibility updates to make the package work with the latest QSPRpred scorers in version 3.0.0 and higher. Older scorers will still work if an older version is installed alongside DrugEx. Only the unit tests will fail since the models used there assume QSPPRpred v3.0.0 or later.

Removed Features

None.

New Features

None.

v3.4.5

21 Sep 07:16
Compare
Choose a tag to compare

Change Log

From v3.4.4 to v3.4.5

Fixes

  • Fixed a bug in calculation of the Pareto fronts (fronts are now calculated for maximization of objectives instead of objective minimization).
  • Patch a bug that that caused a crash when an invalid smiles was encountered in the fragment generation step. This
    bug was introduced in v3.4.4, now invalid smiles are skipped and a warning is printed to the log.

Changes

  • Installation of pip package with pyproject.toml instead of setup.cfg.
  • Methods cpu_non_dominated_sort and gpu_non_dominated_sort have been replace by get_Pareto_fronts.
  • Improve calculation of crowding distance.
  • The rewards module is refactored and the RankingStrategy class was replace by ParetoRankingScheme class.
    • The final reward calcuation for ParetoRankingScheme-based methods is now directly the scaled rank of the molecules.
    • The ParetoTanimotoDistance now has a attribute distance_metric which can be "min", "mean" or "mutual" instead of attribute ranking.
  • DrugEx is now compatible with the latest version of qsprpred v2.0.1, previous versions of qsprpred are no longer supported.
  • drugex.generate CLI environment arguments are no longer overwritten by environment variables from generator.

Removed Features

None.

New Features

  • When installing package with pip, the commit hash and date of the installation is saved into qsprpred._version
  • Added an automated Docker runner for tests that can run on GPUs. See testing/runner/README.md for more information.
  • When installing package with pip, the commit hash and date of the installation is saved into drugex._version. This information is also used as a basis of a new dynamic versioning scheme for the package. The version number is generated automatically upon installation of the package and saved to drugex.__version__.
  • QSPRPred is now available as an optional dependency that can be installed with DrugEx using the [qsprpred] option.

v3.4.4

15 May 10:27
Compare
Choose a tag to compare

Change Log

From v3.4.3 to v3.4.4

Fixes

  • Fixed a bug that may have caused the standardizer to return molecules failing in standardization in their original form instead of removing them (14fd58d).

Changes

None.

Removed Features

None.

New Features

None.

v3.4.3

20 Mar 16:47
Compare
Choose a tag to compare

Change Log

From v3.4.2 to v3.4.3

Fixes

  • The dataset CLI script now captures test, train and unique sets in the right order for the fragment-based methods.
  • Examples in the CLI tutorial were updated and fixed.
  • The Linux command line script was fixed so that all arguments are passed correctly.
  • The train and generate CLI scripts now have empty predictor (-p, --predictor) by default. This makes error messages less confusing.
  • Fixes bug in desirability calculation during generation

Changes

None.

Removed Features

None.

New Features

  • New fragmenter FragmenterWithSelectedFragment which produces only fragments-molecule pair in which the input fragments contain the specific fragment given by the user

Update 3.4.2

03 Mar 13:38
Compare
Choose a tag to compare

Change Log

From v3.4.1 to v3.4.2

Fixes

  • The QSPRPredScorer now functions properly when presented with rdkit molecules instead of SMILES strings. It also does not modify the input list anymore.

Changes

None.

Removed Features

None.

New Features

None.

Update 3.4.1

02 Mar 18:23
Compare
Choose a tag to compare

Change Log

From v3.4.0 to v3.4.1

Fixes

  • Content of output files during model training and molecule generation (broken due to refactoring in v3.4.0):
    • During fine-tuning, the training (train_loss) and the validation (valid_loss) loss, the rations of valid (valid_ratio) and accurate (accurate_ratio, only for transformers) molecules are saved in _fit.tsv
    • During RL, the rations of valid (valid_ratio), accurate (accurate_ratio, only for transformers), unique (unique_ratio) and desired (desired_ratio) molecules and the average arithmetic (avg_amean) and geometric (avg_gmean) of the modified scores are saved in _fit.tsv
  • In DrugExEnvironment.getScores() set all modified scores to 0 for invalid molecules (fixes bug resulting from refactoring in v3.4.0)
  • Fixed the CLI so that it supports new QSPRPred models.
  • Fixed the tutorial for scaffold-based generation.

Changes

  • Minimal supported version of QSPRPred compatible with the tutorial and CLI is now v1.3.0.dev0.
  • The train CLI script now uses the '-p', '--predictor' option to specify the QSPRPred model to use. It takes a path to the model's _meta.json file. More models can be specified this way.
    • This changes the original meaning of the '-ta', '--active_targets', '-ti', '--inactive_targets' and '-tw', '--window_targets' options. These now serve to link the models to the particular type of target. The name of the QSPRPred model is used to determine the type of target it represents. For example, if the QSPRPred model is called A2AR_RandomForestClassifier, then the '-ta', '--active_targets' option will be used to link to the A2AR_RandomForestClassifier as a predictor predicting activity towards a target.
  • Standard crowding distance is now the default ranking method for the train script (equiv. to --scheme PRCD, previously was --scheme PRTD).

Removed Features

None.

New Features

None.

DrugEx Version 3.4.0

20 Feb 15:20
Compare
Choose a tag to compare

Change Log

From v3.3.0 to v3.4.0

Fixes

None.

Changes

Major refactoring of drugex.training

  • Moving generators from drugex.training.models to drugex.training.generators, and harmonizing and renaming them

    • RNN -> SequenceRNN
    • GPT2Model -> SequenceTransformer
    • GraphModel -> GraphTransformer
  • Moving explorers from drugex.training.models to drugex.training.explorers, harmonizing and renaming them

    • SmilesExplorerNoFrag -> SequenceExplorer
    • SmilesExplorer -> FragSequenceExplorer
    • GraphExplorer -> FragGraphExplorer
  • Removal of all obsolete modules related to the two discontinued fragment-based LSTM models from DrugEx v3.

  • The generators' sample_smiles() has been replaced by a generate() function

  • Clafification of the terms qualifying the generated molecules to have the following unique and constant definitions (replacing ambigous VALID and DESIRE terms)

    • Valid : molecule can be parsed with rdkit
    • Accurate : molecule contains given input fragments
    • Desired : molecule fulfils all given objectives
  • Revise implementation of Tanimoto distance-based Pareto ranking scheme(SimilarityRanking) to correspond to the method described in DrugEx v2. Add option to use minimum Tanimoto distance between molecules in a front instead the mean distance.

  • Remove all references to NN-based RAscore (already discontinued)

Refactoring of CLI

  • Refactoring dataset.py and train.py to object based
  • Writting a single .txt.vocab file per dataset preprocessing instead of separate (duplicate) files for each subset in dataset.py

Removed

  • --save_voc argument in dataset.py as redundant
  • --pretrained_model argment in train.py (merged with --agent_path)
  • memory parameter and all associated code from in SequenceRNN

New Features

  • GRU-based RNN added to the CLI
  • added another possible implementation of similarity ranking (MutualSimilaritySortRanking), this is based on the code in the original repository of DrugEx

DrugEx version 3.3.0

13 Jan 09:13
Compare
Choose a tag to compare

Change Log

From v3.2.0 to v3.3.0

Fixes

  • resolved pretraining issues of GTP2Model

Changes

  • Improve scaffold-based encoding. New dummyMolsFromFragments to create dummy molecules from set of fragments to be called as the fragmenter in FragmentCorpusEncoder. This makes the ScaffoldSequenceCorpus, ScaffoldGraphCorpus, SmilesScaffoldDataSet and GraphScaffoldDataSet classes obsolete.
  • The early stopping criterion of reinforcement learning is changed back to the ratio of desired molecules.
  • Renamed GaphModel.sampleFromSmiles to GraphModel.sample_smiles,
    • argument min_samples was renamed to num_samples,
    • exactly num_samples are returned,
    • arguments drop_duplicates, drop_invalid were added,
    • argument keep_frags was added.
  • The sample_smiles method was added to the SequenceTranformer GTP2Model and to the RNN classes.
  • Changed the GTP2Model adaptive learning rate settings to resolve pretraining issues
  • Progress bars were added for models' fitting (pretraining, fine-tuning and reinforcement learning).
  • Tokens _ and . always present in VocSmiles have been removed.
  • RNN models deposited on Zenodo and pretrained on ChEMBL31 and Papyrus 05.5 were updated while the RNN model pretrained on ChEMBL27 did not need to.
  • Moved encoding of tokens for SMILES-based models to the parallel preprocessing steps to improve performance
  • All testing code that is not unit tests was moved to testing

New Features

  • Tutorial for scaffold-based generation.
  • Added tests to testing that allow to check consistency of models between versions.

DrugEx version 3.2.0

26 Sep 11:49
Compare
Choose a tag to compare

From v3.1.0 to v3.2.0

Fixes

  • fixes to SmilesExplorerNoFrag (wrong best state was saved and a TypeError while logging was eliminated, see !40)
  • optimized how memory is used a little bit (!50)
  • fix #55

Changes

  • generated SMILES are now not reported in the logger of SmilesExplorerNoFrag (see !40), but should still be available to the supplied training monitor

  • Training QSAR models is restructured (see !41), only CLI still environ.py, actually functionality moved to environment.
    As well as unittests added for this part of the code.

  • Early stopping patience is now softcoded for all models and for reinforcement learning criteria was changed from ratio of desired molecules to the mean average score (see !46)

New Features

  • add option to remove molecules with tokens not occuring in voc (in dataset.py), see !39.

  • add grid search for DNN QSAR model (see !41)

  • add bayes optimization for DNN QSAR model (see !42)

  • add option to use different environment algorithms during RL

  • add option to use selectivity window predictor for RL

  • add option to specify chunk_size in the dataset.py script to control how data is supplied to parallel processes (bigger chunk size -> more memory used, but more efficient use of multiple CPUs, see !50)