We propose JESTER, a text-to-meme generation engine. Enter any text and get a relevant meme in seconds, all from your browser!
JESTER consists of two parts as shown in the figure below:
-
A meme template retrieval system which use RoBERTa model finetuned using Contrastive Loss. The model is trained to create meme embeddings in a high-dimensional landscape that capture the inherent sarcasm and language context of a meme. The model retrives the templates whose embeddings are closest to the user input embedding by cosine similarity scores.
-
A caption generation system that uses GPT-3 to generate creative caption of the meme based on the user input and the meme template context. A few manually labeled examples are sent as QA pairs along with the user input as the prompt to GPT-3.
In the figure below, we show some sample generated using JESTER. Both the input user prompt and the final generated image are shown.
We use the Deep Humour dataset. Due to limited computational budget, we restrict ourselves to only 100 templates, with a total of 300,000 captions. Since we're running our Streamlit app right from GitHub, we've put all the data on the repo in the data\
folder. The cleaned and preprocessed data used for training is the data/meme_900k_cleaned_data_v2.pkl
file. We make use of UUIDs for addressing each template. The pkl
file contains the following dictionaries:
Dictionary | Mapping |
---|---|
label_uuid_dic |
Template Label (like "not sure if") to UUID |
uuid_label_dic |
UUID to Template Tabel |
uuid_caption_dic |
UUID to List of Captions (for that template) |
uuid_image_path_dic |
UUID to Template Image Path |
If you wish the train the model yourselves, you can use the following notebooks:
notebooks/transformer_training.ipynb
: Finetuning a softmax-based vanilla RoBERTa modelnotebooks/sentencebert-finetuning.ipynb
: Training the Sentence-RoBERTa model (used in the final demo) based on Contrastive Loss.
The final template embeddings are all stored in pkl
files in models/model_utils
. We use Git-LFS to store the model checkpoints, referenced at `model/sentence_transformer_roberta_samples_100_epochs_5/'.
For some reason (Why??) if you wish to use a notebook demo instead of the web demo, that's available at notebooks/Final-Demo.ipynb
.
Memes push the boundaries of what is comfortable. In every dataset we sought to use, including the Deep Humour dataset, there was a significant amount of hate speech in various forms. It was simply impossible for us to completely filter out these datasets. All the labelled examples we feed into GPT-3 has been carefully chosen to weed out sensitive content. However, appropriate use of the application is still left to the user.
The model should not be used to spread messages/ ideas that in any way is unlawful, defamatory, obscene, or otherwise objectionable.