New split type, cleaner predictions file, backward compatibility, bug fixes, and testing improvements
Features
New split type
The split type --split_type cv
already existed to perform k
-fold cross-validation (where k
is set by --num_folds
). In each fold, 1/k
of the data is put in the test set, 1/k
of the data is in put in the validation set, and the remaining (k-2)/k
of the data is put in the training set.
Now, a new split type --split_type cv-no-test
exists which is essentially identical except that it assigns no data to the test set on each fold (b56ca98). Instead, 1/k
of the data is put in the validation set and (k-1)/k
of the data is put in the training set with no test data. The purpose of this split type is to maximize the training data when training a model in cases where the test performance is already known (or is not important) and doesn't need to be determined. Note that the validation set is still necessary to perform early stopping.
Dropping extra columns during prediction
Previously, when using predict.py
, all the columns from the test_path
file were copied to the preds_path
file and then the predictions were added as additional columns at the end. Now there is an option called --drop_extra_columns
which will not copy over these extraneous columns to preds_path
(83ea4c0 and 0613395). When --drop_extra_columns
is used, preds_path
will only contain columns with the SMILES and with the prediction values.
Bug Fixes
Backward compatibility for load_checkpoint
Previously, newer versions of Chemprop incorrectly loaded checkpoints that were trained using older versions of Chemprop due to a change in the names of the parameters. Backward compatibility has now been added to allow this version of Chemprop to load checkpoints with either set of names (5371b29 and 206950c).
Saving SMILES splits
Due to new Chemprop features such as the ability to load multiple molecules, the feature --save_smiles_splits
, which saves the SMILES corresponding to the train, validation, and test splits, had broken (#110). This was fixed in #117.
Fixing interpret.py
Similar to the issue with saving SMILES splits, interpret.py
broke due to the Chemprop feature that enables multiple molecules to be used as input (#107 and #113). This was fixed in #128.
Updating Dockerfile
The Dockerfile has been updated to address #100 and #129. This was fixed in #131.
Fixing atom descriptors
The atom_descriptors
feature did not work in predict.py
(#120). This was fixed in #114.
Logging
Logging to the terminal and to files (quiet.log
and verbose.log
in the save_dir
) broke for some OS systems (#106). This was fixed in #118.
README additions
Some of the relatively new features, like custom atomic features, were missing from the README (#121). This was fixed in #122.
Infrastructure Changes
Migrating from Travis CI to GitHub Actions
Chemprop previously used Travis CI to run automated tests upon pushing to master or creating a pull request, but Travis changed its pricing structure and no longer offers unlimited free testing. For this reason, Chemprop now uses GitHub Actions to run automated tests. The results of the test runs can be seen in the Actions tab of the repo.