Train-time data-augmentation for parameterised learning #68

GilesStrong · 2020-06-08T09:56:27Z

Overview

Parameterised learning is useful in HEP, for example in cases where a classifier should learn multiple signal hypotheses (e.g. a heavy Higgs of several possible masses) see Baldi et al., 2016.

In this example the signal would have a parameterised input equal to the true resonant mass, and the background would be randomly assigned resonant masses. Once trained, the entire dataset can be set to a particular resonant mass in order to perform inference for a given hypothesis. This last part is already possible with the ParametrisedPrediction class.

Data augmentation for parameterised learning

Currently the random assignment of parameterised-feature values for background (in the example above) is performed once when preparing the data for training. It could well be possible that it is useful to perform this random assignment during training, which may provide some of the benefits of train-time data augmentation.

Implementation

To avoid conflicts with HEPAugFoldYielder, and due to the fact that this only wants to be performed during training, this secondary form of augmentation should probably implemented as a callback. It also needs to account for the possibility that multiple parameterisation features may be used, and that only a subset of the data may need to be changed.

The text was updated successfully, but these errors were encountered:

GilesStrong added enhancement New feature or request low priority Not urgent and won't degrade with time labels Jun 8, 2020

GilesStrong mentioned this issue Jun 8, 2020

Change HEPAugFoldYielder to callback? #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train-time data-augmentation for parameterised learning #68

Train-time data-augmentation for parameterised learning #68

GilesStrong commented Jun 8, 2020

Train-time data-augmentation for parameterised learning #68

Train-time data-augmentation for parameterised learning #68

Comments

GilesStrong commented Jun 8, 2020

Overview

Data augmentation for parameterised learning

Implementation