The user may want to use the distribution of, for example, src.patient.sex to derive the dst.patient.sex value. We could add noise to existing data (anonymisation) or sample from a distribution (synthetic data). Either way, the amount of noise should be customisable.
Longer term, it would be nice to specify an epsilon or privacy budget in some form. For now, we shall simply:
- Add a convenient way to specify that the source data should be used to populate the target column.