GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding

This repository contains the PyTorch implementation and the data of the paper: GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding. Libo Qin, Qiguang Chen, Tianbao Xie, Qixin Li, Jian-Guang Lou, Wanxiang Che, Min-Yen Kan. ACL2022.[PDF] .

This code has been written using PyTorch >= 1.9. If you use any source codes or the datasets included in this toolkit in your work, please cite the following paper. The bibtex are listed below:

@misc{qin2022glclef,
      title={GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding}, 
      author={Libo Qin and Qiguang Chen and Tianbao Xie and Qixin Li and Jian-Guang Lou and Wanxiang Che and Min-Yen Kan},
      year={2022},
      eprint={2204.08325},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Abstract

Due to high data demands of current methods, attention to zero-shot cross-lingual spoken language understanding (SLU) has grown, as such approaches greatly reduce human annotation effort. However, existing models solely rely on shared parameters, which can only perform implicit alignment across languages. We present Global-Local Contrastive Learning Framework (GL-CLeF) to address this shortcoming. Specifically, we employ contrastive learning, leveraging bilingual dictionaries to construct multilingual views of the same utterance, then encourage their representations to be more similar than negative example pairs, which achieves explicitly align representations of similar sentences across languages. In addition, a key step in GL-CLeF is a proposed Local and Global component, which achieves a fine-grained cross-lingual transfer (i.e., sentence-level Local intent transfer, token-level Local slot transfer, and semantic-level Global transfer across intent and slot). Experiments on MultiATIS++ show that GL-CLeF achieves state-of-the-art performance and successfully pulls representations of similar sentences across languages closer.

Dataset

Please visit https://github.com/amazon-research/multiatis to get the dataset and put them into ./MultiATIS++/data folder.

You can pick a portion of that train set as a dev set to finetune parameters.

Tips: The complete MultiAtis++ is the concatenation of MultiAtis and MultiAtis++

Model

Preparation

The packages we used are listed follow:

-- scikit-learn==0.24.2
-- numpy=1.21.1
-- pytorch==1.9.0
-- fitlog==0.9.13
-- tqdm==4.61.2
-- transformers==4.8.2

We highly suggest you using Anaconda to manage your python environment. If so, you can run the following command directly on the terminal to create the environment:

conda env create -f py37.yaml

How to run it

The script train.py acts as a main function to the project, you can run the experiments by the following commands:

python -u train.py

Due to the different environments, you can adjust some parameters to get the desired results.

The parameters we use are configured in the main function in train.py. If you need to adjust them, you can modify them in the relevant files or append parameters to the command.

Finally, you can check the results in logs folder. Also, you can run fitlog command to visualize the results:

fitlog log log/

How to use checkpoint

We provide our finetuned checkpoint for your deeper study. And the paper results can be reproduced using the checkpoint we provide. The running command is shown below:

python -u train.py --load_weights True --load_model_path [MODEL_PATH] --train False

Contact us

Just feel free to open issues or send us email(Libo, Qiguang) if you have any problems or find some mistakes in this work.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
MUSE_dict		MUSE_dict
MultiATIS++/data		MultiATIS++/data
img		img
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
metric.py		metric.py
py37.yaml		py37.yaml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MUSE_dict

MUSE_dict

MultiATIS++/data

MultiATIS++/data

img

img

models

models

utils

utils

.gitignore

.gitignore

README.md

README.md

init.py

init.py

metric.py

metric.py

py37.yaml

py37.yaml

train.py

train.py

Repository files navigation

GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding

Abstract

Dataset

Model

Preparation

How to run it

How to use checkpoint

Contact us

About

Releases

Packages

Languages

LightChen233/GL-CLeF

Folders and files

Latest commit

History

Repository files navigation

GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding

Abstract

Dataset

Model

Preparation

How to run it

How to use checkpoint

Contact us

About

Resources

Stars

Watchers

Forks

Languages