Note

Some of the contents (inclduing datasets) are subject to copyright. Please refer to their terms and conditions before using them.

Reference

@inproceedings{gain-etal-2022-low,
    title = "Low Resource Chat Translation: A Benchmark for {H}indi{--}{E}nglish Language Pair",
    author = "Gain, Baban  and
      Appicharla, Ramakrishna  and
      Chennabasavraj, Soumya  and
      Garera, Nikesh  and
      Ekbal, Asif  and
      Chelliah, Muthusamy",
    booktitle = "Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)",
    month = sep,
    year = "2022",
    address = "Orlando, USA",
    publisher = "Association for Machine Translation in the Americas",
    url = "https://aclanthology.org/2022.amta-research.7",
    pages = "83--96",
    abstract = "Chatbots or conversational systems are used in various sectors such as banking, healthcare, e-commerce, customer support, etc. These chatbots are mainly available for resource-rich languages like English, often limiting their widespread usage to multilingual users. Therefore, making these services or agents available in non-English languages has become essential for their broader applicability. Machine Translation (MT) could be an effective way to develop multilingual chatbots. Further, to help users be confident about a product, feedback and recommendation from the end-user community are essential. However, these question-answers (QnA) can be in a different language than the users. The use of MT systems can reduce these issues to a large extent. In this paper, we provide a benchmark setup for Chat and QnA translation for English-Hindi, a relatively low-resource language pair. We first create the English-Hindi parallel corpus comprising of synthetic and gold standard parallel sentences. Thereafter, we develop several sentence-level and context-level neural machine translation (NMT) models, and measure their effectiveness on the newly created datasets. We achieve a BLEU score of 58.7 and 62.6 on the English-Hindi and Hindi-English subset of the gold-standard version of the WMT20 Chat dataset. Further, we achieve BLEU scores of 52.9 and 76.9 on the gold-standard Multi-modal Dialogue Dataset (MMD) English-Hindi and Hindi-English datasets. For QnA, we achieve a BLEU score of 49.9. Further, we achieve BLEU scores of 50.3 and 50.4 on question and answers subsets, respectively. We also perform thorough qualitative analysis of the outputs by the real users.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data		data
en_hi		en_hi
hi_en		hi_en
scripts		scripts
README.md		README.md
mmd_en_hi.md		mmd_en_hi.md
mmd_hi_en.md		mmd_hi_en.md
wmt20_chat.md		wmt20_chat.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

en_hi

en_hi

hi_en

hi_en

scripts

scripts

README.md

README.md

mmd_en_hi.md

mmd_en_hi.md

mmd_hi_en.md

mmd_hi_en.md

wmt20_chat.md

wmt20_chat.md

Repository files navigation

Note

Reference

About

Releases

Packages

Languages

babangain/en_hi_chat_qna_translation

Folders and files

Latest commit

History

Repository files navigation

Note

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages