Public repository for the paper REGIS: A Test Collection for Geoscientific Documents in Portuguese
Lucas Lima de Oliveira, Regis Kruel Romeu, Viviane Pereira Moreira. "REGIS: A Test Collection for Geoscientific Documents in Portuguese". ACM SIGIR Forum, Association for Computing Machinery, 2021. DOI: 10.1145/3404835.3463256
Paper link: https://doi.org/10.1145/3404835.3463256
Video presentation: https://www.youtube.com/watch?v=WkwszoWEqLY
Slides presentation: https://drive.google.com/file/d/15HLmChpmTI7N2YsBy73gaY_Lfrja5O6f/view
Experimental validation is key to the development of Information Retrieval (IR) systems. The standard evaluation paradigm requires a test collection with documents, queries, and relevance judgments. Creating test collections requires significant human effort, mainly for providing relevance judgments. As a result, there are still many domains and languages that, to this day, lack a proper evaluation testbed. Portuguese is an example of a major world language that has been overlooked in terms of IR research -- the only test collection available is composed of news articles from 1994 and a hundred queries. With the aim of bridging this gap, in this paper, we developed REGIS (Retrieval Evaluation for Geoscientific Information Systems), a test collection for the geoscientific domain in Portuguese. REGIS contains 20K documents and 34 query topics along with relevance assessments. We describe the procedures for document collection, topic creation, and relevance assessment. In addition, we report on results of standard IR techniques on REGIS so that they can serve as a baseline for future research.
@inproceedings{oliveira2021,
author = {Lima de Oliveira, Lucas and Romeu, Regis Kruel and Moreira, Viviane Pereira},
title = {REGIS: A Test Collection for Geoscientific Documents in Portuguese},
year = {2021},
isbn = {9781450380379},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3404835.3463256},
doi = {10.1145/3404835.3463256},
booktitle = {Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {2363–2368},
numpages = {6},
keywords = {test collection, information retrieval, geoscientific data},
location = {Virtual Event, Canada},
series = {SIGIR '21}
}