用于论文笔记,项目实验记录以及后续一些代码解析和学习等
大部分文章都会在知乎更新,本repo只做备份记录
部分文章已经转换PDF,方便阅读,存放在pdf文件夹中
- ViViT: A Video Vision Transformer
- TimeSFormer: Is Space-Time Attention All You Need for Video Understanding?
- MViT:Improved Multiscale Vision Transformers for Classification and Detection
- Mformer:Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
- COVER:Co-training Transformer with Videos and Images Improves Action Recognition
- CMT: Convolutional Neural Networks Meet Vision Transformers
- CROSSFORMER: A VERSATILE VISION TRANSFORMER BASED ON CROSS-SCALE ATTENTION
- CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
- MViT:Improved Multiscale Vision Transformers for Classification and Detection
- [First Order Motion Model for Image Animation](Image_Animation/First Order.MD)