Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change HEPAugFoldYielder to callback? #73

Open
GilesStrong opened this issue Jun 8, 2020 · 0 comments
Open

Change HEPAugFoldYielder to callback? #73

GilesStrong opened this issue Jun 8, 2020 · 0 comments
Labels
disruptive Something which will likely cause large or breaking changes improvement Something which would improve current status, but not add anything new investigation Something which might require a careful study medium priority Not urgent but should be dealt with sooner rather than later

Comments

@GilesStrong
Copy link
Owner

Current status

HEPAugFoldYielder applied train-time and test-time data augmentaitons to HEP data (phi rotations, transverse & longitudinal flips). This is performed when loading the data since originally, this was the last point at which the feature names for the data were known to the model. Later changes to LUMIN, now mean that the model has a list of named features and how they map to the input features. This means that instead the data augmentation could be performed by a callback during training (similar to the suggestion of issue #68).

Discussion

It seems a bit strange that the choice of whether or not to augment the data is made by changing how the data is loaded from file. Specifying the choice as a callback make a bit more sense (to me). This also avoids complications once addition forms of augmentation are added, which may otherwise require their ownFoldYielder classes, and we must then account for all possible combinations of different types of augmentation.

Depending on the choices made in issue #50, this may reduce the efficiency of augmentation, but it's possible that augmenting the data inplace on device may actually be more efficient by since it could be done multithreaded. This would perhaps avoid the need to augment as a pandas.DataFrame, and maybe pre-cached rotation matrices could be used, in some part, to speed things up. Since the data is already on device, this would actually be quicker than loaded from disc, augmenting, and then loading to device; this is known to cause particular slow-down when working on GPU

Possible change

The callback would need to mimic the behaviour of HEPAugFoldYielder, i.e. provide random augmentation during training, and a choice of either set transformations during testing or random ones. It would need to be passed as a callback during training and prediction.

Additionally, tests should be done to compare the speed and memory usage of the callback to HEPAugFoldYielder.

If successful, this would depreciate HEPAugFoldYielder.

@GilesStrong GilesStrong added improvement Something which would improve current status, but not add anything new medium priority Not urgent but should be dealt with sooner rather than later investigation Something which might require a careful study disruptive Something which will likely cause large or breaking changes labels Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disruptive Something which will likely cause large or breaking changes improvement Something which would improve current status, but not add anything new investigation Something which might require a careful study medium priority Not urgent but should be dealt with sooner rather than later
Projects
None yet
Development

No branches or pull requests

1 participant