labels: experimental, interpretability, regularization, differentiable_permutation
since this is apparently something I've been thinking about on and off in the context of learning functional regions for interpretability and evolutionary strategies
- Learning Representations of Sets through Optimized Permutations - Has a cost function for scoring permutations matrices
- Learning Permutations with Sinkhorn Policy Gradient - "Sinkhorn layer produces continuous relaxations of permutation matrices"
- https://paperswithcode.com/paper/git-re-basin-merging-models-modulo
- Learning Latent Permutations with Gumbel-Sinkhorn Networks
- assign a learnable parameter that will serve as a permutation matrix
- segment weights into (non overlapping?) tiles, post-permutation
- compute per-tile activation variances (sampled tiles?)
- define the permutation score as a statistic over the per-tile variances (e.g. sum, mean, p90...)
- learn the permutation matrix which minimizes the permutation score