ArgIR repository, a tool for annotation and retrieval of argumentative information from textual content. A case study in the Decide Madrid database.
We present a tool that not only allows to retrieve argumentative information, but also to annotate new arguments and/or validate them (in terms of their topical relevance and rhetorical quality). The search runs on Apache Lucene and the results (proposals and comments) are re-ranked according to their level of controversy or the number and quality of arguments they have.
This project takes advantage of the arguments previously extracted (from the citizen proposals of the Decide Madrid platform) in the argrecsys/arg-miner repository.
This work (v1.0) was presented as a long paper at CIRCLE (Joint Conference of the Information Retrieval Communities in Europe) 2022. CIRCLE 2022 was hosted by the Université de Toulouse, France, 4-7th July 2022. The paper can be found here.
Argument-enhanced Information Retrieval tool: allows the retrieval of argumentative information from textual content.
Arguments Annotation form: allows manual annotation and validation of arguments.
The tool allows you to annotate/edit arguments, as well as validate their relevance and quality. Below is an example of the generated validation file.
proposal_id | argument_id | relevance | quality | timestamp | username |
---|---|---|---|---|---|
7 | 7-85675-1-1 | VERY_RELEVANT | SUFFICIENT | 10/3/2022 20:53:00 | andres.segura |
1419 | 1419-30381-1-1 | RELEVANT | SUFFICIENT | 17/02/2022 23:04 | andres.segura |
2576 | 2576-0-1-1 | VERY_RELEVANT | HIGH_QUALITY | 16/02/2022 17:31 | andres.segura |
10996 | 10996-0-1-1 | VERY_RELEVANT | HIGH_QUALITY | 24/02/2022 20:12 | andres.segura |
26787 | 26787-204339-1-1 | NOT_RELEVANT | LOW_QUALITY | 2022-03-09 16:39:43 | andres.segura |
As a preliminary offline evaluation, using the developed tool, we manually validated 20% of the arguments extracted by the simple syntactic pattern-based method. For the topical relevance metric, 8.6% of the arguments were labeled as spam, 36.9% as not relevant, 39.9% as relevant, and 14.6% as very relevant, whereas for the rhetoric quality metric, 42.3% of the arguments were of low quality, 40.6% of sufficient quality, and 17.1% of high quality. Although these results are modest, they can be considered acceptable as baseline values, taking into account they were obtained with a heuristic method that does not require training data and parameter tuning.
The implemented solutions depend on or make use of the following libraries and .jar files:
- JDK 16
- Apache Lucene 9.0
- MySQL Connector 8.0.22
- MongoDB Java Driver 3.12.10
- Snake YAML 1.9
- JSON Java 20210307
- OpenCSV 4.1
The project has an executable package in the \jar
folder, called ArgumentIR.jar
. To run the tool from the Command Prompt (CMD), execute the following commands:
cd "arg-ir-tool\jar\"
java -jar ArgumentIR.jar
Please read the contributing and code of conduct documentation.
Created on Jan 25, 2022
Created by:
This project is licensed under the terms of the Apache License 2.0.
This work was supported by the Spanish Ministry of Science and Innovation (PID2019-108965GB-I00).