Codebase for the server and user interface of the annotation platform used in the publication: The GDN-CC Dataset: Automatic Corpus Clarification for AI-enhanced Democratic Citizen Consultations.
This repository provides a complete environment to replicate or extend the annotation process, featuring a Next.js frontend for annotators and a Flask backend for handling data and interacting with LLMs.
Do not hesitate to contact me by email at lequeu (at) isir.upmc.fr or by raising an issue on this repository for any question or help.
The main annotation interface. On the left is the citizen contribution to annotate. Each colored rectangle is an argumentative unit. For each argumentative unit can be segmented "Affirmations" (statements), "arguments" (premises) and solutions. On the right is the clarifications given by the LLM, which can be modifed by the annotator.
The platform is split into two main components:
platformServer/: A Flask server handling data persistence, endpoints for LLM models, and summary generation.platformUI/: A Next.js web application serving the user and admin interfaces.
- Node.js and npm
- Python (managed via
uv)
Navigate to the server directory and install dependencies:
cd platformServer
uv syncEnvironment Variables: You need three environment variables :
GROQ_API_KEY: Your Groq API key.OPENAI_API_KEY: Your OpenAI API key.ANNOTATION_DATA_FILE: Path to your target data file.
Navigate to the frontend directory and install dependencies:
cd ../platformUI
npm installYou need to run both the server and the UI concurrently in separate terminal instances.Start the Backend Server:
cd platformServer
uv run app.py --port 3002The server will run on localhost:3002.
Start the Frontend Interface:
cd platformUI
npm run devThe interface will run on localhost:3000.
The admin interface is intentionally unlinked from the main navigation. You must access it directly by navigating to http://localhost:3000/admin in your browser.
To aggregate and export all completed annotations, run the following command from the platformServer/ directory:
uv run adminPower.py --save-allThis generates the all_annotations.jsonl file.JSONL SchemaEach line in the exported file follows this structure:
{
"opinion": {
"authorName": "str",
"len": "int",
"opinionId": "int",
"text": "str"
},
"results": [
{
"color": "str",
"segments": {
"segmentid": {
"color": "str",
"start": "int",
"end": "int",
"type": "str",
"hex": "str",
"text": "str"
}
},
"LLMtext": "str",
"text": "str"
}
],
"llm": "str",
"annotator": "str",
"time": "float",
"date": "str"
}| Field | Type | Description |
|---|---|---|
opinion.authorName |
String |
Represents the theme of the opinion, not the actual author. (Fixed in final dataset). |
opinion.len |
Integer |
Total character length of the source text. |
opinion.opinionId |
Integer |
Unique identifier for the opinion. |
results.color |
String |
Index identifier for the Argumentative Unit (AU). |
segment.type |
String |
Classification of the segment (solution, claim, or premise). |
results.LLMtext |
String |
Raw text output generated by the LLM. |
results.text |
String |
Final text validated/edited by the human annotator. |
llm |
String |
LLM used for the clarification. |
annotator |
String |
Annotator ID |
time |
Float |
Total time spent (in seconds) from loading the opinion to accepting the clarification. |
date |
String |
Datetime of the annotation. |
preprint:
@article{lequeu2026gdn,
title={The GDN-CC Dataset: Automatic Corpus Clarification for AI-enhanced Democratic Citizen Consultations},
author={Lequeu, Pierre-Antoine and Labat, L{\'e}o and Cave, Laur{\`e}ne and Lejeune, Ga{\"e}l and Yvon, Fran{\c{c}}ois and Piwowarski, Benjamin},
journal={arXiv preprint arXiv:2601.14944},
year={2026}
}