GitHub - HITsz-TMG/Multimodal-In-Context-Tuning

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

If you have any questions, please feel free to contact me by e-mail: liyunxin987@163.com, Twitter: @LyxTg.

The main contributions are:

We present a product description generation paradigm that is based only on the image and several marketing keywords. For this new setting, we propose a straightforward and effective multimodal in-context tuning approach, named ModICT, integrating the power from the frozen language model and visual encoder.
Our work is the first one to investigate utilizing the in-context learning and text generation capabilities of various frozen language models for multimodal E-commerce product description generation. ModICT can be plugged into various types of language models and the training process is parameter-efficient.
We conduct extensive experiments on our newly built three-category product datasets. The experimental results indicate that the proposed method achieves state-of-the-art performance on a wide range of evaluation metrics. Using the proposed multimodal in-context tuning technical, small models also achieve competitive performance compared to LLMs.

🚀 Our Training Approach: ModICT

The overall workflow of ModICT. The left part depicts the process of in-context reference construction. The right parts show the efficient multimodal in-context tuning ways for the sequence-to- sequence language model (1) and autoregressive language model (2). Blocks with red lines are learnable.

🤗 Our Proposed Dataset: MD2T

MD2T is a new setting for multimodal E-commerce Description generation based on structured keywords and images.

MD2T Dataset Statistics

MD2T	Cases&Bags	Clothing	Home Appliances
#Train	18,711	200,000	86,858
#Dev	983	6,120	1,794
#Test	1,000	8,700	2,200
Avg_N #MP	5.41	6.57	5.48
Avg_L #MP	13.50	20.34	18.30
Avg_L #Desp	80.05	79.03	80.13

Table: The detailed statistics of MD2T. Avg_N and Avg_L represent the average number and length respectively. MP and Desp indicate the marketing keywords and description.

Our preprocessed data (Text + Images) can be downloaded from https://huggingface.co/datasets/YunxinLi/MD2T.

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

@article{li2024multimodal,
  title={A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation},
  author={Li, Yunxin and Hu, Baotian and Luo, Wenhan and Ma, Lin and Ding, Yuxin and Zhang, Min},
  journal={LREC-COLING},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
product_model.png		product_model.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

product_model.png

product_model.png

Repository files navigation

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

🚀 Our Training Approach: ModICT

🤗 Our Proposed Dataset: MD2T

MD2T Dataset Statistics

✏️ Citation

About

Releases

Packages

HITsz-TMG/Multimodal-In-Context-Tuning

Folders and files

Latest commit

History

README.md

README.md

product_model.png

product_model.png

Repository files navigation

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

🚀 Our Training Approach: ModICT

🤗 Our Proposed Dataset: MD2T

MD2T Dataset Statistics

✏️ Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages