This repo is study of the significance of entropy of the classification vector of a classification model, in relationship to membership inference attack (MIA), as specified in this paper: He, Xinlei, et al. "Membership-Doctor: Comprehensive Assessment of Membership Inference Against Machine Learning Models." arXiv preprint arXiv:2208.10445 (2022).
More specifically, this repo tries to compare several variations of defenses possible through increasing the entropy of the final output of a given classification model.
With current consideration a MNIST classification MLP, following is list of new defenses possible when attempting to increase the entropy of final output.
It is well known that one needs to lookout for both utility, cost and effectiveness of a defense. Therefore, in terms of cost, following is the comparison of inference latency as compared to base model.
In terms of utility, following is the comparison of accuracy of base model, and defended model (refined model).
In terms of effectiveness, following is the comparison of MIA attack AUC and attacker's advantage for above defenses. The results are evaluated using TF privacy, and the attacks considered are threshold based or shadow training based, where the attacker model is either logistic classifier, MLP or KNN classifer. Furthermore, the attack dataset includes the original dataset's training and validation data, so the attack premise is strong. The attack is evaluated over the whole dataset. For conciseness, only best attack AUC and advanatagec comaparison from these parameters are shown below.
By evaluating over cases: entire dataset, by class and by classification correctness, following is the comparison of TP vs FP plot.
Furthermore, following is the comparison plot of threshold, training and LiRA based evaluation of MIA over the defenses, by setting number of shadow models to 2. Probably the scores would be low, so increasing the number of shadow models would improve the scores.
One thing, that can be remarked is that by directly increasing the entropy of probability vectors during training (by modifying loss), as per M-17, doesnot amerliorates the MIA at all.
Ofcourse, further conclusions can be drawn, but this is the end of my investigation for now.