Skip to content

SpectData/MONAH

Repository files navigation

MONAH: Multi-Modal Narratives for Humans

Problem

Analyzing videos in video format (visual + audio + text) needs a lot of human expertise and end-to-end deep learning methods are less interpretable.

Solution

Inspired by how the linguistics community analyze conversations using the Jefferson transcription system. MONAH creates a multi-modal text narrative for dyadic (two-people) video-recorded conversations by weaving what is being said with _ how_ its being said.

ScreenCast

To add later

Required Inputs

Two videos, one for each speaker. Works best when the camera is in front of the speaker, instead of from an angle. Verbatim Transcript from YouTube.

User Interface

Text menu based for easy configuration.

alt text

Support modalities in the narratives

alt text

Output - MONAH Narrative

To add later

Dependencies (Technology Stack)

To add as we build this repo up.

Fine Narratives

Actions

Prosody

Coarse Narratives

Demographics

Semantics

  • Sentiment
  • Questions

Mimicry

  • Dynamic Time Wrapping

Contributions

MOANH is meant to be a modular system that allows for additions to be simple. Joshua to add architectural diagram.

Pipeline (Intermediate Artifacts)

To add later

Continuous Integration

Joshua to add PyLint Python Style Tests Joshua to add Compulsory Unit Tests

Citation

If you find MONAH useful in any of your publications we ask you to cite the following:

Features introduced in Paper 1 are in white, features introduced in paper 2 are in blue.

alt text

  • Paper 1 (white features) Kim, J. Y., Kim, G. Y., & Yacef, K. (2019). Detecting depression in dyadic conversations with multimodal narratives and visualizations. In Australasian Joint Conference on Artificial Intelligence (pp. 303-314). Springer, Cham.
  • Paper 2 (blue features) Kim, J. Y., Yacef, K., Kim, G., Liu, C., Calvo, R., & Taylor, S. (2021, April). MONAH: Multi-Modal Narratives for Humans to analyze conversations. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 466-479).

About

Multi-Modal Narratives for Humans

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages