-
Notifications
You must be signed in to change notification settings - Fork 271
Description
Hi, we stumbled on this inconsistency while auditing Stanza's morphological tagging quality across languages.
Summary
Across all Italian UD treebanks, the pronouns nulla, niente, and nessuno are almost
universally tagged PronType=Ind. Every other UD language tags their exact equivalents as
PronType=Neg. I believe this is an annotation inconsistency rather than a deliberate
convention, and propose harmonizing to PronType=Neg.
The UD definition
From the UD feature documentation:
Neg: negative pronoun or determiner. In some languages, weights between negative and
indefinite are very fuzzy. [...] Negative pronouns are used to express the non-existence
of something.
This describes nulla, niente, and nessuno exactly. They inherently denote
non-existence — "nothing", "nobody" — without requiring an additional negation particle.
Compare with genuinely indefinite pronouns like qualcosa (something), qualcuno
(someone), chiunque (whoever), where PronType=Ind is appropriate.
Cross-linguistic evidence
I checked the dev/train splits of the major UD treebanks for each language. Every language
except Italian uses PronType=Neg for the same lexical class:
| Language | Treebank | "nothing" | "nobody" | PronType | Count |
|---|---|---|---|---|---|
| Russian | SynTagRus | ничто (ничего) | никто | Neg |
168 PRON, 100% Neg |
| Spanish | AnCora | nada | nadie | Neg |
34 PRON, 100% Neg |
| French | GSD | rien | personne | Neg |
5 PRON, 100% Neg |
| German | GSD | nichts | niemand | Neg |
12 PRON, 100% Neg |
| Italian | ISDT/VIT/ParTUT/PoSTWITA/TWITTIRO | nulla/niente | nessuno | Ind |
~110 PRON, 99% Ind |
Zero variation in RU/ES/FR/DE — every single instance is PronType=Neg. Italian is the
sole outlier.
Detailed Italian treebank breakdown
nulla (113 occurrences across 5 treebanks)
| Treebank | PRON (Ind) |
ADV | NOUN | Notes |
|---|---|---|---|---|
| ISDT | 36 | 0 | 2 | NOUN = "il nulla" (the nothingness), "nulla osta" (permit) — legitimate |
| VIT | 27 | 5 | 1 | 2 ADV tokens have PronType=Neg; 3 ADV have no PronType |
| ParTUT | 7 | 0 | 0 | |
| PoSTWITA | 30 | 0 | 1 | |
| TWITTIRO | 4 | 0 | 0 |
The NOUN cases are legitimate (substantivized "the nothingness"). The ADV cases in VIT are
a mix: one genuine adverbial use ("per nulla comparabile" = "not at all comparable"), the
rest appear to be mistagged pronominal objects (e.g., "Nulla dice la normativa" = fronted
obj, not adverb).
Only 3 tokens total (2 nulla + 1 niente, all VIT) use PronType=Neg — suggesting
someone recognized the issue but the fix wasn't applied systematically.
niente (145 occurrences)
| Treebank | PRON (Ind) |
DET (Ind) |
ADV | NOUN | Notes |
|---|---|---|---|---|---|
| ISDT | 19 | 1 | 1 | 0 | |
| VIT | 25 | 4 | 17 | 6 | NOUN cases look like PRON errors; ADV mixed |
| ParTUT | 2 | 0 | 0 | 0 | |
| PoSTWITA | 41 | 17 | 1 | 0 | DET = "niente politici" (no politicians) — genuine |
| TWITTIRO | 8 | 1 | 1 | 0 |
The DET usage ("niente + noun" = "no X") is a genuine syntactic function, but even there,
the semantics are negative ("no politicians"), not indefinite ("some politicians").
Linguistic argument
Nulla/niente/nessuno carry inherent negation. "Nessuno ha detto nulla" doesn't need
non before nulla for it to mean "nothing" — the pronoun itself negates. This is the
defining property of PronType=Neg per UD.
PronType=Ind describes a different class. Indefinite pronouns refer to an unspecified
entity: qualcosa (something), qualcuno (someone), uno (one). Nulla doesn't refer to
an unspecified entity — it denotes the absence of any entity. That's negation, not
indefiniteness.
Italian negative pronouns behave identically to their cross-linguistic equivalents:
Italian nessuno = Spanish nadie = French personne = German niemand = Russian никто
Same lexical class, same semantics, same syntactic distribution. The only difference is the
PronType label.
Downstream impact. Any NLP task relying on PronType for negation detection, sentiment
analysis, or scope resolution gets the wrong signal from Ind. Neg correctly marks these
as negation-bearing elements.
Proposal
Change PronType=Ind → PronType=Neg on nulla, niente, and nessuno (and their
inflected forms like nessuna, nessun) when functioning as pronouns (UPOS=PRON) across
all Italian treebanks. This would bring Italian in line with the UD feature definition and
with every other language's practice.
The DET uses of niente/nessuno ("nessun problema", "niente politici") could also be
updated to PronType=Neg, since the determiner is semantically negative ("no problem", not
"some problem"), but I'd defer to the Italian treebank maintainers on that.
Happy to help with the actual data changes if this proposal is accepted.