[2024.2.20] Our MMAD has been accepted at COLING 2024! Welcome to watch 👀 this repository for the latest updates.
MMAD exhibits remarkable AD generation capabilities in movies, by utilizing multiple modal inputs.
- You are required to install the dependencies. If you have conda installed, you can run the following:
git clone https://github.com/Daria8976/MMAD.git
cd MMAD
bash environment.sh
- Download weights from pretrained model:
- checkpoint_step_50000.pth under
checkpoint
folder - base.pth under
AudioEnhancing/configs
folder - LanguageBind/Video-LLaVA-7B under
VideoCaption
folder
-
prepare REPLICATE_API_TOKEN in
llama.py
-
Prepare demo data (We provide four demo video here):
- put demo.mp4 under
Example/Video
- put [character photo] (Photos should be named with the corresponding character name) under
Example/ActorCandidate
python infer.py
Finally, we organized 10 vision health volunteers, 10 BVI people (including 3 totally blind and 7 partially sighted) for human evaluation via Likert scale, and we merged the statistical results into the result table of the paper.
Here are some great resources we benefit or utilize from:
- Video-LLaVA and Pengi for Our code base.