[Paper] [Model Card] [Deployment Demo]
In this work, we propose a narrative generation pipeline to co-create visual stories with the users. The pipeline allows the user to control events and emotions on the generated content.The pipeline includes two parts: narrative and image generation. In narrative generation, we plan the narrative based on the keywords and emotional trends in sentences and generate the following story sentence. In image generation, we use both Disco Diffusion and Stable Diffusion to create a visually appealing image that captures the story's main plot; we further implement object recognition to allow objects in the images to be mentioned in future story development.
Domain | Name | Description | Language model type | Model Card | 🤗 link |
---|---|---|---|---|---|
Suggester | Emotion Suggester | This model is finetuned under StoryCommensense used to provided suggestions of the sentiment in next sentence | DeBERTa-v2-xlarge | Yuetian/deberta-finetuned-next-sentence-emotion | Hugging Face |
Suggester | Emotion Suggester | This model is finetuned under StoryCommensense that used to provided suggestion of the sentiment in next sentence | BERT-base-uncased | Yuetian/bert-base-uncased-finetuned-plutchik-emotion | Hugging Face |
Suggester | Keyword Suggester | This model is finetuned under ROCStories that used to provided suggestion of name entities in next sentence | OPT-1.3B | Stay tuned | Stay tuned |
Text pipe | Next-sentence generator | This model take context, keyword and sentiment together and generate next sentence in a ROCStories style | T5-base-finetuned-commenGen | Yuetian/T5-finetuned-storyCommonsense | Hugging Face |
We implement a simple demo showing the deployment ver. of our framework here. Please referred to the Q&A section for more information
We demonstrate a performance distribution of the baseline model and the prompt-optimized model in 3,748 sets of experiments under different metrics. The blue box on the left side of each figure represents our method and the orange on the right side represents the baseline model.
Here is several example stories you can generate using this framework.
@misc{chen2023visual,
title={Visual Story Generation Based on Emotion and Keywords},
author={Yuetian Chen and Ruohua Li and Bowen Shi and Peiru Liu and Mei Si},
year={2023},
eprint={2301.02777},
archivePrefix={arXiv},
primaryClass={cs.AI}
}