A multilingual linked idioms data set.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


alt tag

LIdioms - A Multilingual Linked Idioms Data Set

The data set is intended to support natural language processing applications by providing links between idioms across languages. The underlying data was crawled and integrated from various sources. To ensure the quality of the crawled data, all idioms were evaluated by at least two native speakers. The resulting data set complies with best practices in accordance with Linguistic Linked Open Data Community.

Current Languages:

American and British English, Brazilian and European Portuguese, Italian, German, Russian.

Further Languages (We are looking for natives from these languages!, in case you are, send me a message) :

Bulgarian, Chinese, Korean, Czech, Finnish, French, Japanese, Arabic, Polish, Swedish and Polish

Lidioms Endoint - http://lid.aksw.org/sparql

SPARQL examples can be found in our Wiki.

Examples -

  1. http://lid.aksw.org/en/kill_two_birds_with_one_stone
  2. http://lid.aksw.org/en/when_pigs_fly_sense - Translation examples.

How to cite

  author = {Diego Moussallem ,Mohamed Ahmed Sherif ,Diego Esteves ,Marcos Zampieri and Axel-Cyrille Ngonga Ngomo},
  title = {LIdioms: A Multilingual Linked Idioms Data Set},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-00-9},
  language = {english}


licensed under Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)