Skip to content

[CVPR 2024] EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

Notifications You must be signed in to change notification settings

aimmemotion/EmoVIT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 

Repository files navigation

EmoVIT

Official code for the paper "EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning" | CVPR 2024

EmoSet/
|
+--LAVIS
|
+--emo
    |
    +--annotation (Results of EmoSet decompression.)
    |
    +--cap-ano (Create the folders required for program execution before running it.)
    |
    +--caption (Create the folders required for program execution before running it.)
    |
    +--reasoning (Create the folders required for program execution before running it.)
    |
    +--conversation_new100 (Create the folders required for program execution before running it.)
    |
    +--prompt
    |
    +--image
        +--amusement (Results of EmoSet decompression)
        |
        +--anger (Results of EmoSet decompression)
        |
        .
        .
        .
        |
        +--train_image (EmoVIT does not need all photos; place the photos required for training here.)
                |
                ........

You can find two main folders in our project structure: emo and LAVIS.

  • The LAVIS folder can be obtained from here.
  • Arrange the image data into the correct locations as described above. For example, EmoSet can be obtained from EmoSet.

Install Related Packages

conda create --name emovit python=3.8
conda activate emovit
cd emovit
pip install -r requirements.txt

Install LAVIS

pip install salesforce-lavis
# If not work, please proceed as follows.
cd ..
git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e . # Please remove 'open3d' from the 'requirements.txt' file to avoid version conflicts.
# Cut the 'lavis' folder and paste it into the 'lib' folder.

Emotion Instruction Data Generation

  1. Run python ./emo/caption.py to obtain image captions. Select the 'path' based on the class to be processed.
  2. Run python ./emo/cap-anno.py to write the attributes and captions of the image into a file. Select the 'path' based on the class to be processed.
  3. Run python ./emo/gpt4_reasoning.py or python ./emo/gpt4_conversation.py to instruct GPT-4 to generate questions using the above file as input data.
    • Remember to change the key.
    • If you wish to adjust the prompt, you can go to the 'prompt' folder.
  4. Run python ./emo/all.py to integrate the results of reasoning, conversation, and classification.

Following these steps, you can create instructions. If you want to skip this step, you can use the instructions we created using EmoSet. (However, image data must still be downloaded from EmoSet's official website.)

The generation method of categorical data does not need to rely on GPT for creation; it can be directly produced (you can observe the prompt in all.py).

Train EmoVIT

Prepare Weights

You can obtain the weights for Vicuna from this page. We are using version 1.1. Place the downloaded file into LAVIS/lavis/weight/vicuna-7b-2/.

Run

Training

cd LAVIS
python train.py --cfg-path FT.yaml

Parameter Settings

  • LAVIS/FT.yaml: Setting of hyperparameters
  • LAVIS/lavis/configs/models/blip2/blip2_instruct_vicuna7b.yaml: Select the location of LLM weight
  • LAVIS/lavis/configs/datasets/coco/defaults_vqa.yaml: Select the location of your data LAVIS/lavis/runners/runner_base.py (Change the name of the weight file to be saved.)

Inference EmoVIT

If you haven't trained your own weights yet, you can use the model_weights1.pth provided in the LAVIS folder.

python ./LAVIS/test.py  

Citation

If you found this paper is helpful, please consider cite our paper:

@inproceedings{Xie2024EmoVIT,
  title={EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning},
  author={Hongxia Xie and Chu-Jun Peng and Yu-Wen Tseng and Hung-Jen Chen and Chan-Feng Hsu and Hong-Han Shuai and Wen-Huang Cheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

About

[CVPR 2024] EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages