Skip to content

arsm/MPLSTM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Perspective Long Short Term Memory

Teaser image

Multi-Perspective LSTM for Joint Visual Representation Learning
Alireza Sepas-Moghaddam, Fernando Pereira, Paulo Lobato Correia, Ali Etemad

CVPR'21 Paper

Abstract: We present a novel LSTM cell architecture capable of Multi-Perspective LSTM (MP-LSTM) cell architecture learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level. We demonstrate that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks. We validate the performance of our proposed architecture in the context of two multi-perspective visual recognition tasks namely lip reading and face recognition. Three relevant datasets are considered and the results are compared against fusion strategies, other existing multi-input LSTM architectures, and alternative recognition solutions. The experiments show the superior performance of our solution over the considered benchmarks, both in terms of recognition accuracy and computational complexity.

Requirements

  • Both Linux and Windows are supported. Linux is recommended for performance and compatibility reasons.
  • 64-bit Python 3.6 installation. We recommend Anaconda3 with numpy 1.19.5 or newer.
  • We recommend TensorFlow 1.14, which we used for all experiments in the paper, but newer versions of TensorFlow might work as well.
  • You need to use Keras 2.1.5.
  • You need to use Keras-VGGFace package to extract RESNET50 spatial embeddings.
  • One or more high-end NVIDIA GPUs, NVIDIA drivers, and CUDA 10.0 toolkit.

Preparing Datasets

The OuluVS2, Light Field Faces in the Wild (LFFW), and Face Constrained (LFFC) datasets are used to evaluate the performance of MPLSTM. After you have downloaded the dataset successfully, you need to split the data into training, validation, and testing as disscussed in OuluVS2 paper and LFFW and LFFC paper. The organization of the files should be as follows:

OuluVS2 Dataset  
├  Test Test folder
├  Train Train folder
├  Validation Validation folder
   ├  CAM1 Camera 1 folder
   ├  CAM2 Camera 2 folder
   ├  CAM3 Camera 3 folder
      ├  01 Utterance 1 folder containing speach videos
      ├  02 Utterance 2 folder containing Speach videos
      ├  . .
      ├  . .
      ├  . .
      ├  20 Utterance 20 folder containing speach videos
LFFW and LFFC Datasets  
├  Test Test folder
├  Train Train folder
├  Validation Validation folder
   ├  Hor Horizontal viewpoint sequences floder
   ├  Ver Vertical viewpoint sequences folder
      ├  01 Subject 1 folder containing horizontal/vertical videos
      ├  02 Subject 2 folder containing horizontal/vertical videos
      ├  . .
      ├  . .
      ├  . .
      ├  53 Subject 53 folder containing horizontal/vertical videos

Training and Testing

Demo codes for training and testing using 3-perspective combination are respectively available in Training_3Views.py and Testing_3Views.py. The source code of MPLSTM when adopting 2 and 3 perspectives are available in Library\MPLSTM_2inputs and Library\MPLSTM_3inputs, respectively.

Inquiries

For inquiries, please contact alireza.sepasmoghaddam@queensu.ca

Citation

@inproceedings{Sepas2021MPLSTM,
  title     = {Multi-Perspective {LSTM} for Joint Visual Representation Learning},
  author    = {Alireza Sepas-Moghaddam and Fernando Pereira and Paulo Lobato Correia and Ali Etemad},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2021}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages