PixelSelfAttention Trying to adapt BERT for images... w Work in progress. Although it works much better than MLP, it's still worse than vanilla CNN. I think it will need to use CNN for learning higher-dimensional embeddings, otherwise it's not powerful enough.