K3M: A Model of Knowledge Perceived Multi-modal Pretraining in E-commerce

This repository is the official implementation of our multi-modal pretrain model K3M, which is model proposed in a paper: Knowledge Perceived Multi-modal Pretraining in E-commerce, accepted by ACM MM 2021 main conference. This project is based on an open source framework for multi-modal pretraining VilBERT.

(Due to the data access problem of Alibaba Group, we cannot open all the data used in paper. In the "data" folder, we release a small data sample for training K3M.)

Brief Introduction

Modality-missing and modality-noise are two pervasive problems of multi-modal product data in real E-commerce scenarios. K3M corrects the noise and supplements the missing of image and text modalities by introducing knowledge modality in multi-modal pretraining. K3M learns the multi-modal information of products through three steps :(1) encode the independent information of each mode, corresponding to the modal-coding layer; (2) model the interaction between modes, corresponding to the modal-interaction layer; (3) optimize the model through the supervision information of each mode, corresponding to the modal-task layer.

Environment requirements and how to run our code

Note: The environment configuration for this project is complex and different environments are required to run the code for the data processing part and the pre-training part. We will introduce the two parts separately. Please strictly configure the environment and run the code step by step according to the following commands.

Part1: Data Processing (require cuda10.0-10.2 and torch 1.4.0)

conda create -n K3M_data python=3.6
conda activate K3M_data
cd K3M

Step1 Process the raw data and download the product images. (Running the following command will generate two files in the "data" folder: "id_title_pvs_cls.txt0" and "id_title_pvs_cls.txt1", the downloaded product images will be saved in the "data/image" folder.)

python 0_deal_raw_data_segment.py

Step2 Generate the corresponding JSON file. (Running the following command will generate two files "df_train.csv" and "df_val.csv" and corresponding JSON files in the "data/image_lmdb_json" folder.)

python 1_generate_json_ali.py

Step3 Extracting image features. (Run the following commands to install the image feature extraction tool py-bottom-up-attention.)

pip install pycocotools
pip install jsonlines
pip install -U fvcore
pip install torch==1.4.0
pip install torchvision==0.5.0
pip install cython
pip install opencv-python

git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make
cd ../..

git clone https://github.com/airsplay/py-bottom-up-attention.git
cd py-bottom-up-attention
sudo python setup.py build develop
cd ..

Download the pre-trained bottom-up-attention model faster_rcnn_from_caffe.pkl and put it in the "fast-rcnn-pkl" folder.

cd faster-rcnn-pkl
wget http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe.pkl
cd ..

Extract image features and store them in TSV files. (Running the following command will generate two files in the "data/image_features" folder: "train.tsv.0" and "dev.tsv.0".)

python 2_generate_tsv_ali.py

Step4 Convert the data to the LMDB data format required for pre-training. Run the following command to install the relevant TensorPack and LMDB packages.

pip install tensorpack==0.9.4
pip install lmdb==0.94

Generate LMDB files. (Running the following command will generate two files in the "data/image_lmdb_json" folder: "training_feat_all.lmdb" and "validation_feat_all.lmdb".)

python 3_generate_lmdb_ali.py

Part 2: Run Pre-training (require cuda10.0-10.2 and torch 1.3.0)

conda create -n K3M_train python=3.6
conda activate K3M_train
cd K3M

Step1 Install torch 1.3.0.

pip install torch==1.3.0

Step2 Install libpcap-1.10.0.

sudo apt-get update
sudo apt-get install m4
sudo apt-get install flex
sudo apt-get install bison
cd libpcap-1.10.0 
./configure
make
sudo make install
sudo apt-get install build-essential libcap-dev
cd ..

Step3 Install other dependency packages.

pip install -r requirements.txt
pip install pytorch_transformers==1.1.0
pip install pycocotools
pip uninstall tensorboard

Step4 Download the pre-trained weight file pytorch_model.bin of pre-train language model bert-base-chinese to the "bert-base-chinese" folder.

Step5 Run the pre-training code.

python train_concap_struc.py

References

Parts of our codes based on and thanks for:

facebookresearch/vilbert-multi-task

Papers for the Project & How to Cite

If you use or extend our work, please cite the following paper:

@inproceedings{DBLP:conf/mm/ZhuZZYCZC21,
  author    = {Yushan Zhu and
               Huaixiao Zhao and
               Wen Zhang and
               Ganqiang Ye and
               Hui Chen and
               Ningyu Zhang and
               Huajun Chen},
  editor    = {Heng Tao Shen and
               Yueting Zhuang and
               John R. Smith and
               Yang Yang and
               Pablo Cesar and
               Florian Metze and
               Balakrishnan Prabhakaran},
  title     = {Knowledge Perceived Multi-modal Pretraining in E-commerce},
  booktitle = {{MM} '21: {ACM} Multimedia Conference, Virtual Event, China, October
               20 - 24, 2021},
  pages     = {2744--2752},
  publisher = {{ACM}},
  year      = {2021},
  url       = {https://doi.org/10.1145/3474085.3475648},
  doi       = {10.1145/3474085.3475648},
  timestamp = {Mon, 03 Jan 2022 22:17:05 +0100},
  biburl    = {https://dblp.org/rec/conf/mm/ZhuZZYCZC21.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
bert-base-chinese		bert-base-chinese
cocoapi		cocoapi
config		config
data		data
faster-rcnn-pkl		faster-rcnn-pkl
libpcap-1.10.0		libpcap-1.10.0
pics		pics
py-bottom-up-attention		py-bottom-up-attention
tools		tools
vilbert_k3m		vilbert_k3m
.gitignore		.gitignore
0_deal_raw_data_segment.py		0_deal_raw_data_segment.py
1_generate_json_ali.py		1_generate_json_ali.py
2_generate_tsv_ali.py		2_generate_tsv_ali.py
3_generate_lmdb_ali.py		3_generate_lmdb_ali.py
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
train_concap_struc.py		train_concap_struc.py

License

YushanZhu/K3M

Folders and files

Latest commit

History

Repository files navigation

K3M: A Model of Knowledge Perceived Multi-modal Pretraining in E-commerce

Brief Introduction

Environment requirements and how to run our code

Part1: Data Processing (require cuda10.0-10.2 and torch 1.4.0)

Part 2: Run Pre-training (require cuda10.0-10.2 and torch 1.3.0)

References

Papers for the Project & How to Cite

About

Resources

License

Stars

Watchers

Forks

Languages