Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Improved Cross-Lingual Question Retrieval for Community Question Answering

This repository contains the data and code to reproduce the results of our paper:

Please use the following citation:

  title = {Improved Cross-Lingual Question Retrieval for Community Question Answering},
  author = {R{\"u}ckl{\'e}, Andreas and Swarnkar, Krishnkant and Gurevych, Iryna},
  publisher = {ACM},
  booktitle = {The World Wide Web Conference (WWW 2019)},
  pages = {3179--3186},
  year = {2019},
  location = {San Francisco, California, USA},
  doi = {10.1145/3308558.3313502},
  url = {},

Abstract: We perform cross-lingual question retrieval in community question answering (cQA), i.e., we retrieve similar questions for queries that are given in another language. The standard approach to cross-lingual information retrieval, which is to automatically translate the query to the target language and continue with a monolingual retrieval model, typically falls short in cQA due to translation errors. This is even more the case for specialized domains such as in technical cQA, which we explore in this work. To remedy, we propose two extensions to this approach that improve cross-lingual question retrieval: (1) we enhance an NMT model with monolingual cQA data to improve the translation quality, and (2) we improve the robustness of a state-of-the-art neural question retrieval model to common translation errors by adding back-translations during training. Our results show that we achieve substantial improvements over the baseline approach and considerably close the gap to a setup where we have access to an external commercial machine translation service (i.e., Google Translate), which is often not the case in many practical scenarios.

Contact person: Andreas Rücklé

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.


The data is available on our public fileserver.

  • askubuntu-human-translations: This folder contains the human translations of the AskUbuntu dev/test queries
  • backtranslations-nmt-training: Contains the parallel sentences obtained by translating titles from AskUbuntu and StackOverflow (java+python splits) to German using the standard en->de Transformer model
  • rcnn-data
    • StackExchange-Monolingual: The StackOverflow dataset (monolingual)
    • StackExchange-Monolingual-Paraphrases: The StackOverflow dataset with paraphrases obtained by backtranslating titles of query questions from en to de and back to en (GT)
    • AskUbuntu-Monolingual-Paraphrases: The AskUbuntu dataset with paraphrases
    • Translations
      • AskUbuntu-de-en(GT), AskUbuntu-de-en(TR-CQA), etc.: Titles of questions that were translated from German back to English using GT, TR-CQA, etc.


The source code of our RCNN adaptation is available here: RCNN-adaptation


No description, website, or topics provided.







No releases published


No packages published