Skip to content

New split type, cleaner predictions file, backward compatibility, bug fixes, and testing improvements

Compare
Choose a tag to compare
@swansonk14 swansonk14 released this 26 Jan 17:23
· 389 commits to master since this release
8258fcb

Features

New split type

The split type --split_type cv already existed to perform k-fold cross-validation (where k is set by --num_folds). In each fold, 1/k of the data is put in the test set, 1/k of the data is in put in the validation set, and the remaining (k-2)/k of the data is put in the training set.

Now, a new split type --split_type cv-no-test exists which is essentially identical except that it assigns no data to the test set on each fold (b56ca98). Instead, 1/k of the data is put in the validation set and (k-1)/k of the data is put in the training set with no test data. The purpose of this split type is to maximize the training data when training a model in cases where the test performance is already known (or is not important) and doesn't need to be determined. Note that the validation set is still necessary to perform early stopping.

Dropping extra columns during prediction

Previously, when using predict.py, all the columns from the test_path file were copied to the preds_path file and then the predictions were added as additional columns at the end. Now there is an option called --drop_extra_columns which will not copy over these extraneous columns to preds_path (83ea4c0 and 0613395). When --drop_extra_columns is used, preds_path will only contain columns with the SMILES and with the prediction values.

Bug Fixes

Backward compatibility for load_checkpoint

Previously, newer versions of Chemprop incorrectly loaded checkpoints that were trained using older versions of Chemprop due to a change in the names of the parameters. Backward compatibility has now been added to allow this version of Chemprop to load checkpoints with either set of names (5371b29 and 206950c).

Saving SMILES splits

Due to new Chemprop features such as the ability to load multiple molecules, the feature --save_smiles_splits, which saves the SMILES corresponding to the train, validation, and test splits, had broken (#110). This was fixed in #117.

Fixing interpret.py

Similar to the issue with saving SMILES splits, interpret.py broke due to the Chemprop feature that enables multiple molecules to be used as input (#107 and #113). This was fixed in #128.

Updating Dockerfile

The Dockerfile has been updated to address #100 and #129. This was fixed in #131.

Fixing atom descriptors

The atom_descriptors feature did not work in predict.py (#120). This was fixed in #114.

Logging

Logging to the terminal and to files (quiet.log and verbose.log in the save_dir) broke for some OS systems (#106). This was fixed in #118.

README additions

Some of the relatively new features, like custom atomic features, were missing from the README (#121). This was fixed in #122.

Infrastructure Changes

Migrating from Travis CI to GitHub Actions

Chemprop previously used Travis CI to run automated tests upon pushing to master or creating a pull request, but Travis changed its pricing structure and no longer offers unlimited free testing. For this reason, Chemprop now uses GitHub Actions to run automated tests. The results of the test runs can be seen in the Actions tab of the repo.