IFN702SpamDetection To solve the problem of gradient vanish: I have changed: out = self.rnn(out)[0][:, -1, :] to out = self.rnn(out)[0].sum(dim=1) Best tunned results SSCL 0.9384 GatedCNN 0.9476 SelfAttn (Transformer) 0.9345