Corrector Network How it works It uses multi head similarity to predict similarity using different methods, then concats the vectors into a matrix of size NxH Back propagation