Visual & Acoustic Discriminative Relevance for Activity Recognition

This repository holds experiment code for Visual & Acoustic Discriminative Relevance (VADR) demonstrated on the UCF-101 action recognition dataset.

The model architecture used is a concatenated fusion of VGGish and C3D at the layer before the bottleneck, with a simple MLP for classification of actions.

UCF-101

https://www.crcv.ucf.edu/papers/UCF101_CRCV-TR-12-01.pdf

Unfortunately the original release of UCF-50 included videos without corresponding audio tracks, therefore for the visual-audio aspect of this work, we are using a subset of UCF-101 that only contains classes with audio tracks.

Architecture

We use a C3D model pretrained on UCF-101 as the video subnetwork, and VGGish, trained on AudioSet for the audio subnetwork. Both architectures are based off VGG. They are concatenated at the feature embedding layer.

Explainability

We use discriminative relevance (https://arxiv.org/abs/1908.01536), a modification of layerwise relevance backpropagation, to show relevant components in a four tuple consisting of .

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
docs		docs
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

src

src

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Visual & Acoustic Discriminative Relevance for Activity Recognition

UCF-101

Architecture

Explainability

About

Releases

Packages

Contributors 2

Languages

dais-ita/vadr

Folders and files

Latest commit

History

Repository files navigation

Visual & Acoustic Discriminative Relevance for Activity Recognition

UCF-101

Architecture

Explainability

About

Resources

Stars

Watchers

Forks

Languages