Mask calibration data (#295)

* Add masking for calibration and evaluation target gaps * Add mask for ence evaluator * Bugfix for multitarget mve_weighting calibration * Testing updates * positive delta in testing with negative scores * new test data * Updated test targets * Loosen test delta * Remove nanfunction dependence * Fix miscalibration area index error * Fix the other index error * Loop over tasks in uncertainty cal for z and t scaling * Fix dtype error in mask tensors * Remove unused import * Fix index error * Update test values * Loop through tasks for mve weighting * simplify masking code * Fix mve_weighting task loop error * Add a note for masking uncertainty targets * Correct matrix size annotations * Mask input data instead of output nll for calibration nll functions. * Replace all remaining nll where functions with task loops * Correct nll loop variables * Update nll test values * Missing masking on multiclass nll evaluator * Remove unused variable
chemprop · Jun 27, 2022 · 2c171a1 · 2c171a1
1 parent 4236b65
commit 2c171a1
Show file tree

Hide file tree

Showing 10 changed files with 1,427 additions and 272 deletions.
diff --git a/README.md b/README.md
@@ -368,7 +368,7 @@ The uncertainty of predictions made in Chemprop can be estimated by several diff
 
 ### Uncertainty Calibration
 
-Uncertainty predictions may be calibrated to improve their performance on new predictions. Calibration methods are selected using `--calibration_method <method>`, options provided below. An additional dataset to use in calibration is provided through `--calibration_path <path>`, along with necessary features like `--calibration_features_path <path>`.
+Uncertainty predictions may be calibrated to improve their performance on new predictions. Calibration methods are selected using `--calibration_method <method>`, options provided below. An additional dataset to use in calibration is provided through `--calibration_path <path>`, along with necessary features like `--calibration_features_path <path>`. As with the data used in training, calibration data for multitask models are allowed to have gaps and missing targets in the data.
 
 **Regression** Calibrated regression outputs can be in the form of a standard deviation or an interval, as specified with the argument `--regression_calibrator_metric <"stdev" or "interval">`. The interval can be set using `--calibration_interval_percentile <float>` in the range (1,100).
 * `zscaling` Assumes that errors are normally distributed according to the estimated variance for each prediction. Applies a constant multiple to all stdev or interval outputs in order to minimize the negative log likelihood for the normal distributions. (https://arxiv.org/abs/1905.11659)
@@ -381,7 +381,7 @@ Uncertainty predictions may be calibrated to improve their performance on new pr
 
 ### Uncertainty Evaluation Metrics
 
-The performance of uncertainty predictions (calibrated or uncalibrated) as evaluated on the test set using different evaluation metrics as specified with `--evaluation_methods <[methods]>`. Evaluation scores will be saved at the path provided with `--evaluation_scores_path <path.csv>`. If no path is provided to save the scores, then the results will only appear in the output trace. Multiple evaluation methods can be provided and they will be calculated separately for each model task. Evaluation is only available when the target values are provided with the data in `--test_path <path.csv>`.
+The performance of uncertainty predictions (calibrated or uncalibrated) as evaluated on the test set using different evaluation metrics as specified with `--evaluation_methods <[methods]>`. Evaluation scores will be saved at the path provided with `--evaluation_scores_path <path.csv>`. If no path is provided to save the scores, then the results will only appear in the output trace. Multiple evaluation methods can be provided and they will be calculated separately for each model task. Evaluation is only available when the target values are provided with the data in `--test_path <path.csv>`. As with the data used in training, evaluation data for multitask models are allowed to have gaps and missing targets in the data.
 
 * Any valid classification or multiclass metric. Because classification and multiclass outputs are inherently probabilistic, any metric used to assess them during training is appropriate to evaluate the confidences produced after calibration.
 * `nll` Returns the average negative log likelihood of the real target as indicated by the uncertainty predictions. Enabled for regression, classification, and multiclass dataset types.

diff --git a/chemprop/data/data.py b/chemprop/data/data.py
@@ -395,6 +395,16 @@ def targets(self) -> List[List[Optional[float]]]:
         :return: A list of lists of floats (or None) containing the targets.
         """
         return [d.targets for d in self._data]
+
+    def mask(self) -> List[List[bool]]:
+        """
+        Returns whether the targets associated with each molecule and task are present.
+
+        :return: A list of list of booleans associated with targets.
+        """
+        targets = self.targets()
+
+        return [[t is not None for t in dt] for dt in targets]
 
     def gt_targets(self) -> List[np.ndarray]:
         """

diff --git a/chemprop/train/make_predictions.py b/chemprop/train/make_predictions.py
@@ -213,7 +213,7 @@ def predict_and_save(
         print(f"Evaluating uncertainty for tasks {task_names}")
         for evaluator in evaluators:
             evaluation = evaluator.evaluate(
-                targets=evaluation_data.targets(), preds=preds, uncertainties=unc
+                targets=evaluation_data.targets(), preds=preds, uncertainties=unc, mask=evaluation_data.mask()
             )
             evaluations.append(evaluation)
             print(

diff --git a/chemprop/train/train.py b/chemprop/train/train.py
@@ -45,11 +45,11 @@ def train(model: MoleculeModel,
     for batch in tqdm(data_loader, total=len(data_loader), leave=False):
         # Prepare batch
         batch: MoleculeDataset
-        mol_batch, features_batch, target_batch, atom_descriptors_batch, atom_features_batch, bond_features_batch, data_weights_batch = \
-            batch.batch_graph(), batch.features(), batch.targets(), batch.atom_descriptors(), \
+        mol_batch, features_batch, target_batch, mask_batch, atom_descriptors_batch, atom_features_batch, bond_features_batch, data_weights_batch = \
+            batch.batch_graph(), batch.features(), batch.targets(), batch.mask(), batch.atom_descriptors(), \
             batch.atom_features(), batch.bond_features(), batch.data_weights()
 
-        mask = torch.tensor([[x is not None for x in tb] for tb in target_batch], dtype=torch.bool) # shape(batch, tasks)
+        mask = torch.tensor(mask_batch, dtype=torch.bool) # shape(batch, tasks)
         targets = torch.tensor([[0 if x is None else x for x in tb] for tb in target_batch]) # shape(batch, tasks)
 
         if args.target_weights is not None: