-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TODO LIST #15
Comments
PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture https://arxiv.org/pdf/2201.00978.pdf ELSA: Enhanced Local Self-Attention for Vision Transformer https://arxiv.org/pdf/2112.12786v1.pdf unicorn (Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling) deepspeed-moe 15초 nerf (Instant Neural Graphics Primitives with a Multiresolution Hash Encoding) True Few-Shot Learning with Language Models FLAN T0 ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization https://arxiv.org/abs/2201.06910 Transformer Quality in Linear Time Hyperparm searchWarm Starting CMA-ES for Hyperparameter Optimization SSLUNDERSTANDING DIMENSIONAL COLLAPSE IN CONTRASTIVE SELF-SUPERVISED LEARNING OCR학습 초기?On Warm-Starting Neural Network Training The break-even point on optimization trajectories of deep neural networks. Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization On the Origin of Implicit Regularization in Stochastic Gradient Descent loss landscapeSharpness-Aware Minimization for Efficiently Improving Generalization ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks Towards Efficient and Scalable Sharpness-Aware Minimization (LookSAM) A Loss Curvature Perspective on Training Instability in Deep Learning Surrogate Gap Minimization Improves Sharpness-Aware Training Augmentation 의 역할은 뭘까?When vision transformers outperform resnets without pretraining or strong data augmentations ICLR Oral |
prompt
Calibrate Before Use: Improving Few-Shot Performance of Language Models (https://arxiv.org/abs/2102.09690)
p-tuning (https://arxiv.org/abs/2104.08691)
Do Prompt-Based Models Really Understand the Meaning of their Prompts? (https://arxiv.org/abs/2109.01247)
An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models (https://arxiv.org/pdf/2109.02772.pdf)
FLAN (https://arxiv.org/pdf/2109.01652.pdf)
Text Style Transfer (https://arxiv.org/abs/2109.03910)
prompt 생성해서 NMT (https://arxiv.org/abs/2110.05448)
LM
BART (https://arxiv.org/abs/1910.13461)
Primer (https://arxiv.org/abs/2109.08668)
NormFormer (https://arxiv.org/abs/2110.09456)
HTLM (https://arxiv.org/abs/2107.06955)
KIE Pretraining
LayoutLM (https://arxiv.org/abs/1912.13318)
LayoutLMv2 (https://arxiv.org/abs/2012.14740)
StructuralLM (https://arxiv.org/abs/2105.11210)
MarkupLM (https://arxiv.org/abs/2110.08518)
The text was updated successfully, but these errors were encountered: