Skip to content

Latest commit

 

History

History
121 lines (116 loc) · 52.5 KB

video-action-and-event-understanding.md

File metadata and controls

121 lines (116 loc) · 52.5 KB

CVPR-2023-Papers

Application App
New collections Conference

Video: Action and Event Understanding

Section Papers Preprint Papers Papers with Open Code Papers with Video

Title Repo Paper Video
Open Set Action Recognition via Multi-Label Evidential Learning GitHub thecvf
arXiv
YouTube
FLAG3D: A 3D Fitness Activity Dataset with Language Instruction GitHub Page thecvf
arXiv
YouTube
MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition GitHub thecvf
arXiv
YouTube
The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction GitHub Page
GitHub
thecvf
arXiv
YouTube
Use Your Head: Improving Long-Tail Video Recognition GitHub Page
GitHub
thecvf
arXiv
YouTube
Decomposed Cross-Modal Distillation for RGB-based Temporal Action Detection thecvf
arXiv
YouTube
Video Test-Time Adaptation for Action Recognition GitHub Page
GitHub
thecvf
arXiv
YouTube
How Can Objects Help Action Recognition? GitHub thecvf
arXiv
YouTube
Text-Visual Prompting for Efficient 2D Temporal Video Grounding GitHub thecvf
arXiv
YouTube
Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition GitHub thecvf
arXiv
YouTube
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition GitHub thecvf
arXiv
YouTube
Learning Video Representations from Large Language Models
CVPR - Highlight
GitHub Page
GitHub
thecvf
arXiv
YouTube
Fine-tuned CLIP Models are Efficient Video Learners GitHub Page
GitHub
thecvf
arXiv
YouTube
Efficient Movie Scene Detection Using State-Space Transformers GitHub thecvf
arXiv
YouTube
AdamsFormer for Spatial Action Localization in the Future thecvf YouTube
A Light Weight Model for Active Speaker Detection GitHub thecvf
arXiv
YouTube
System-Status-Aware Adaptive Network for Online Streaming Video Understanding thecvf
arXiv
YouTube
STMixer: A One-Stage Sparse Action Detector GitHub thecvf
arXiv
YouTube
Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring GitHub thecvf
arXiv
YouTube
Distilling Vision-Language Pre-Training To Collaborate With Weakly-Supervised Temporal Action Localization GitHub Page
GitHub
thecvf
arXiv
Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video GitHub thecvf
arXiv
YouTube
Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning
CVPR - Highlight
GitHub thecvf YouTube
Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization GitHub thecvf
arXiv
YouTube
Learning Discriminative Representations for Skeleton based Action Recognition GitHub thecvf
arXiv
YouTube
Learning Procedure-Aware Video Representation from Instructional Videos and their Narrations GitHub thecvf
arXiv
YouTube
Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception GitHub thecvf
PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization thecvf YouTube
Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization thecvf
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks thecvf
arXiv
SVFormer: Semi-Supervised Video Transformer for Action Recognition GitHub thecvf
arXiv
YouTube
AutoAD: Movie Description in Context
CVPR - Highlight
WEB Page thecvf
arXiv
YouTube
STMT: A Spatial-Temporal Mesh Transformer for MoCap-based Action Recognition GitHub thecvf
arXiv
YouTube
Boosting Weakly-Supervised Temporal Action Localization with Text Information GitHub thecvf
arXiv
YouTube
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations WEB Page
GitHub
thecvf
arXiv
YouTube
Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels GitHub thecvf
arXiv
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos GitHub thecvf
arXiv
YouTube
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline GitHub Page
GitHub
thecvf
arXiv
YouTube
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment GitHub thecvf
Search-Map-Search: A Frame Selection Paradigm for Action Recognition thecvf
arXiv
YouTube
3Mformer: Multi-Order Multi-Mode Transformer for Skeletal Action Recognition thecvf
arXiv
YouTube
ProTeGe: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding thecvf
Egocentric Video Task Translation
CVPR - Highlight
WEB Page
GitHub
thecvf
arXiv
YouTube
Look Around for Anomalies: Weakly-Supervised Anomaly Detection via Context-Motion Relational Learning thecvf YouTube
Proposal-based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization GitHub thecvf
arXiv
YouTube
TriDet: Temporal Action Detection with Relative Boundary Modeling GitHub thecvf
arXiv
YouTube
Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-based Action Recognition
CVPR - Highlight
GitHub Page
GitHub
thecvf
arXiv
YouTube
EVAL: Explainable Video Anomaly Localization thecvf
arXiv
YouTube
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning GitHub thecvf
arXiv
Weakly Supervised Temporal Sentence Grounding with Uncertainty-guided Self-Training thecvf YouTube
Leveraging Temporal Context in Low Representational Power Regimes WEB Page thecvf YouTube
PIVOT: Prompting for Video Continual Learning thecvf
arXiv
On the Benefits of 3D Pose and Tracking for Human Action Recognition WEB Page
GitHub
thecvf
arXiv
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory WEB Page
GitHub
thecvf
arXiv
Selective Structured State-Spaces for Long-Form Video Understanding thecvf
arXiv
YouTube
Frame Flexible Network GitHub thecvf
arXiv
YouTube
ASPnet: Action Segmentation with Shared-Private Representation of Multiple Data Sources thecvf
Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling thecvf
arXiv
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge GitHub thecvf
arXiv
YouTube
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning GitHub thecvf
arXiv
YouTube
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models GitHub thecvf
arXiv
YouTube
Procedure-Aware Pretraining for Instructional Video Understanding GitHub thecvf
arXiv
YouTube
Latency Matters: Real-Time Action Forecasting Transformer
CVPR - Highlight
GitHub Page thecvf YouTube
Generating Anomalies for Video Anomaly Detection with Prompt-based Feature Mapping thecvf YouTube
HierVL: Learning Hierarchical Video-Language Embeddings
CVPR - Highlight
WEB Page
GitHub
thecvf
arXiv
YouTube
Two-Stream Networks for Weakly-Supervised Temporal Action Localization with Semantic-Aware Mechanisms thecvf YouTube
Hybrid Active Learning via Deep Clustering for Video Action Detection GitHub thecvf YouTube
Prompt-guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features thecvf
arXiv
Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection GitHub thecvf
arXiv
YouTube
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking GitHub thecvf
arXiv
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
CVPR - Highlight
GitHub thecvf
arXiv
YouTube
Learning Action Changes by Measuring Verb-Adverb Textual Relationships GitHub thecvf
arXiv
YouTube
Reducing the Label Bias for Timestamp Supervised Temporal Action Segmentation thecvf YouTube
Video Event Restoration based on Keyframes for Video Anomaly Detection thecvf
arXiv
YouTube
Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition thecvf YouTube
Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting GitHub thecvf
arXiv
Post-Processing Temporal Action Detection GitHub Page
GitHub
thecvf
arXiv
YouTube
Relational Space-Time Query in Long-Form Videos
CVPR - Highlight
thecvf YouTube
Therbligs in Action: Video Understanding through Motion Primitives thecvf
arXiv
Dual-Path Adaptation from Image to Video Transformers GitHub thecvf
arXiv
Hierarchical Semantic Contrast for Scene-Aware Video Anomaly Detection GitHub thecvf
arXiv
Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection GitHub thecvf
arXiv
YouTube
Unbiased Scene Graph Generation in Videos GitHub thecvf
arXiv
YouTube