Releases: CDDLeiden/DrugEx
v3.4.7
Change Log
From v3.4.6 to v3.4.7
Fixes
- Prevent the QSPRpred scorer from crashing when an empty list of molecules is supplied. The scorer now returns an empty list of scores and outputs a warning.
Changes
- update depency rdkit-pypi to rdkit
Removed Features
None.
New Features
None.
v3.4.6
Change Log
From v3.4.5 to v3.4.6
Fixes
None.
Changes
- For the generator CLI, environment variables are now read from the generator meta file automatically. Now unused arguments are removed from the CLI.
- Compatibility updates to make the package work with the latest QSPRpred scorers in version 3.0.0 and higher. Older scorers will still work if an older version is installed alongside DrugEx. Only the unit tests will fail since the models used there assume QSPPRpred v3.0.0 or later.
Removed Features
None.
New Features
None.
v3.4.5
Change Log
From v3.4.4 to v3.4.5
Fixes
- Fixed a bug in calculation of the Pareto fronts (fronts are now calculated for maximization of objectives instead of objective minimization).
- Patch a bug that that caused a crash when an invalid smiles was encountered in the fragment generation step. This
bug was introduced in v3.4.4, now invalid smiles are skipped and a warning is printed to the log.
Changes
- Installation of pip package with pyproject.toml instead of setup.cfg.
- Methods
cpu_non_dominated_sort
andgpu_non_dominated_sort
have been replace byget_Pareto_fronts
. - Improve calculation of crowding distance.
- The rewards module is refactored and the
RankingStrategy
class was replace byParetoRankingScheme
class.- The final reward calcuation for
ParetoRankingScheme
-based methods is now directly the scaled rank of the molecules. - The
ParetoTanimotoDistance
now has a attributedistance_metric
which can be "min", "mean" or "mutual" instead of attributeranking
.
- The final reward calcuation for
- DrugEx is now compatible with the latest version of qsprpred v2.0.1, previous versions of qsprpred are no longer supported.
drugex.generate
CLI environment arguments are no longer overwritten by environment variables from generator.
Removed Features
None.
New Features
- When installing package with pip, the commit hash and date of the installation is saved into
qsprpred._version
- Added an automated Docker runner for tests that can run on GPUs. See testing/runner/README.md for more information.
- When installing package with pip, the commit hash and date of the installation is saved into
drugex._version
. This information is also used as a basis of a new dynamic versioning scheme for the package. The version number is generated automatically upon installation of the package and saved todrugex.__version__
. - QSPRPred is now available as an optional dependency that can be installed with DrugEx using the
[qsprpred]
option.
v3.4.4
v3.4.3
Change Log
From v3.4.2 to v3.4.3
Fixes
- The
dataset
CLI script now captures test, train and unique sets in the right order for the fragment-based methods. - Examples in the CLI tutorial were updated and fixed.
- The Linux command line script was fixed so that all arguments are passed correctly.
- The
train
andgenerate
CLI scripts now have empty predictor (-p, --predictor
) by default. This makes error messages less confusing. - Fixes bug in desirability calculation during generation
Changes
None.
Removed Features
None.
New Features
- New fragmenter
FragmenterWithSelectedFragment
which produces only fragments-molecule pair in which the input fragments contain the specific fragment given by the user
Update 3.4.2
Change Log
From v3.4.1 to v3.4.2
Fixes
- The
QSPRPredScorer
now functions properly when presented with rdkit molecules instead of SMILES strings. It also does not modify the input list anymore.
Changes
None.
Removed Features
None.
New Features
None.
Update 3.4.1
Change Log
From v3.4.0 to v3.4.1
Fixes
- Content of output files during model training and molecule generation (broken due to refactoring in
v3.4.0
):- During fine-tuning, the training (
train_loss
) and the validation (valid_loss
) loss, the rations of valid (valid_ratio
) and accurate (accurate_ratio
, only for transformers) molecules are saved in_fit.tsv
- During RL, the rations of valid (
valid_ratio
), accurate (accurate_ratio
, only for transformers), unique (unique_ratio
) and desired (desired_ratio
) molecules and the average arithmetic (avg_amean
) and geometric (avg_gmean
) of the modified scores are saved in_fit.tsv
- During fine-tuning, the training (
- In
DrugExEnvironment.getScores()
set all modified scores to 0 for invalid molecules (fixes bug resulting from refactoring inv3.4.0
) - Fixed the CLI so that it supports new QSPRPred models.
- Fixed the tutorial for scaffold-based generation.
Changes
- Minimal supported version of QSPRPred compatible with the tutorial and CLI is now
v1.3.0.dev0
. - The
train
CLI script now uses the'-p', '--predictor'
option to specify the QSPRPred model to use. It takes a path to the model's_meta.json
file. More models can be specified this way.- This changes the original meaning of the
'-ta', '--active_targets'
,'-ti', '--inactive_targets'
and'-tw', '--window_targets'
options. These now serve to link the models to the particular type of target. The name of the QSPRPred model is used to determine the type of target it represents. For example, if the QSPRPred model is calledA2AR_RandomForestClassifier
, then the'-ta', '--active_targets'
option will be used to link to theA2AR_RandomForestClassifier
as a predictor predicting activity towards a target.
- This changes the original meaning of the
- Standard crowding distance is now the default ranking method for the
train
script (equiv. to--scheme PRCD
, previously was--scheme PRTD
).
Removed Features
None.
New Features
None.
DrugEx Version 3.4.0
Change Log
From v3.3.0 to v3.4.0
Fixes
None.
Changes
Major refactoring of drugex.training
-
Moving generators from
drugex.training.models
todrugex.training.generators
, and harmonizing and renaming themRNN
->SequenceRNN
GPT2Model
->SequenceTransformer
GraphModel
->GraphTransformer
-
Moving explorers from
drugex.training.models
todrugex.training.explorers
, harmonizing and renaming themSmilesExplorerNoFrag
->SequenceExplorer
SmilesExplorer
->FragSequenceExplorer
GraphExplorer
->FragGraphExplorer
-
Removal of all obsolete modules related to the two discontinued fragment-based LSTM models from DrugEx v3.
-
The generators'
sample_smiles()
has been replaced by agenerate()
function -
Clafification of the terms qualifying the generated molecules to have the following unique and constant definitions (replacing ambigous
VALID
andDESIRE
terms)Valid
: molecule can be parsed with rdkitAccurate
: molecule contains given input fragmentsDesired
: molecule fulfils all given objectives
-
Revise implementation of Tanimoto distance-based Pareto ranking scheme(
SimilarityRanking
) to correspond to the method described in DrugEx v2. Add option to use minimum Tanimoto distance between molecules in a front instead the mean distance. -
Remove all references to NN-based RAscore (already discontinued)
Refactoring of CLI
- Refactoring
dataset.py
andtrain.py
to object based - Writting a single
.txt.vocab
file per dataset preprocessing instead of separate (duplicate) files for each subset indataset.py
Removed
--save_voc
argument indataset.py
as redundant--pretrained_model
argment intrain.py
(merged with--agent_path
)memory
parameter and all associated code from inSequenceRNN
New Features
- GRU-based RNN added to the CLI
- added another possible implementation of similarity ranking (
MutualSimilaritySortRanking
), this is based on the code in the original repository of DrugEx
DrugEx version 3.3.0
Change Log
From v3.2.0 to v3.3.0
Fixes
- resolved pretraining issues of
GTP2Model
Changes
- Improve scaffold-based encoding. New
dummyMolsFromFragments
to create dummy molecules from set of fragments to be called as thefragmenter
inFragmentCorpusEncoder
. This makes theScaffoldSequenceCorpus
,ScaffoldGraphCorpus
,SmilesScaffoldDataSet
andGraphScaffoldDataSet
classes obsolete. - The early stopping criterion of reinforcement learning is changed back to the ratio of desired molecules.
- Renamed
GaphModel.sampleFromSmiles
toGraphModel.sample_smiles
,- argument
min_samples
was renamed tonum_samples
, - exactly
num_samples
are returned, - arguments
drop_duplicates
,drop_invalid
were added, - argument
keep_frags
was added.
- argument
- The
sample_smiles
method was added to the SequenceTranformerGTP2Model
and to theRNN
classes. - Changed the
GTP2Model
adaptive learning rate settings to resolve pretraining issues - Progress bars were added for models' fitting (pretraining, fine-tuning and reinforcement learning).
- Tokens
_
and.
always present inVocSmiles
have been removed. - RNN models deposited on Zenodo and pretrained on ChEMBL31 and Papyrus 05.5 were updated while the RNN model pretrained on ChEMBL27 did not need to.
- Moved encoding of tokens for SMILES-based models to the parallel preprocessing steps to improve performance
- All testing code that is not unit tests was moved to
testing
New Features
- Tutorial for scaffold-based generation.
- Added tests to
testing
that allow to check consistency of models between versions.
DrugEx version 3.2.0
From v3.1.0 to v3.2.0
Fixes
- fixes to SmilesExplorerNoFrag (wrong best state was saved and a TypeError while logging was eliminated, see !40)
- optimized how memory is used a little bit (!50)
- fix #55
Changes
-
generated SMILES are now not reported in the logger of SmilesExplorerNoFrag (see !40), but should still be available to the supplied training monitor
-
Training QSAR models is restructured (see !41), only CLI still environ.py, actually functionality moved to environment.
As well as unittests added for this part of the code. -
Early stopping patience is now softcoded for all models and for reinforcement learning criteria was changed from ratio of desired molecules to the mean average score (see !46)
New Features
-
add option to remove molecules with tokens not occuring in voc (in dataset.py), see !39.
-
add grid search for DNN QSAR model (see !41)
-
add bayes optimization for DNN QSAR model (see !42)
-
add option to use different environment algorithms during RL
-
add option to use selectivity window predictor for RL
-
add option to specify
chunk_size
in thedataset.py
script to control how data is supplied to parallel processes (bigger chunk size -> more memory used, but more efficient use of multiple CPUs, see !50)