Skip to content

Italian negative pronouns tagged PronType=Ind instead of PronType=Neg #1207

@fingoldo

Description

@fingoldo

Hi, we stumbled on this inconsistency while auditing Stanza's morphological tagging quality across languages.

Summary

Across all Italian UD treebanks, the pronouns nulla, niente, and nessuno are almost
universally tagged PronType=Ind. Every other UD language tags their exact equivalents as
PronType=Neg. I believe this is an annotation inconsistency rather than a deliberate
convention, and propose harmonizing to PronType=Neg.

The UD definition

From the UD feature documentation:

Neg: negative pronoun or determiner. In some languages, weights between negative and
indefinite are very fuzzy. [...] Negative pronouns are used to express the non-existence
of something.

This describes nulla, niente, and nessuno exactly. They inherently denote
non-existence — "nothing", "nobody" — without requiring an additional negation particle.
Compare with genuinely indefinite pronouns like qualcosa (something), qualcuno
(someone), chiunque (whoever), where PronType=Ind is appropriate.

Cross-linguistic evidence

I checked the dev/train splits of the major UD treebanks for each language. Every language
except Italian uses PronType=Neg for the same lexical class:

Language Treebank "nothing" "nobody" PronType Count
Russian SynTagRus ничто (ничего) никто Neg 168 PRON, 100% Neg
Spanish AnCora nada nadie Neg 34 PRON, 100% Neg
French GSD rien personne Neg 5 PRON, 100% Neg
German GSD nichts niemand Neg 12 PRON, 100% Neg
Italian ISDT/VIT/ParTUT/PoSTWITA/TWITTIRO nulla/niente nessuno Ind ~110 PRON, 99% Ind

Zero variation in RU/ES/FR/DE — every single instance is PronType=Neg. Italian is the
sole outlier.

Detailed Italian treebank breakdown

nulla (113 occurrences across 5 treebanks)

Treebank PRON (Ind) ADV NOUN Notes
ISDT 36 0 2 NOUN = "il nulla" (the nothingness), "nulla osta" (permit) — legitimate
VIT 27 5 1 2 ADV tokens have PronType=Neg; 3 ADV have no PronType
ParTUT 7 0 0
PoSTWITA 30 0 1
TWITTIRO 4 0 0

The NOUN cases are legitimate (substantivized "the nothingness"). The ADV cases in VIT are
a mix: one genuine adverbial use ("per nulla comparabile" = "not at all comparable"), the
rest appear to be mistagged pronominal objects (e.g., "Nulla dice la normativa" = fronted
obj, not adverb).

Only 3 tokens total (2 nulla + 1 niente, all VIT) use PronType=Neg — suggesting
someone recognized the issue but the fix wasn't applied systematically.

niente (145 occurrences)

Treebank PRON (Ind) DET (Ind) ADV NOUN Notes
ISDT 19 1 1 0
VIT 25 4 17 6 NOUN cases look like PRON errors; ADV mixed
ParTUT 2 0 0 0
PoSTWITA 41 17 1 0 DET = "niente politici" (no politicians) — genuine
TWITTIRO 8 1 1 0

The DET usage ("niente + noun" = "no X") is a genuine syntactic function, but even there,
the semantics are negative ("no politicians"), not indefinite ("some politicians").

Linguistic argument

Nulla/niente/nessuno carry inherent negation. "Nessuno ha detto nulla" doesn't need
non before nulla for it to mean "nothing" — the pronoun itself negates. This is the
defining property of PronType=Neg per UD.

PronType=Ind describes a different class. Indefinite pronouns refer to an unspecified
entity: qualcosa (something), qualcuno (someone), uno (one). Nulla doesn't refer to
an unspecified entity — it denotes the absence of any entity. That's negation, not
indefiniteness.

Italian negative pronouns behave identically to their cross-linguistic equivalents:

Italian nessuno = Spanish nadie = French personne = German niemand = Russian никто

Same lexical class, same semantics, same syntactic distribution. The only difference is the
PronType label.

Downstream impact. Any NLP task relying on PronType for negation detection, sentiment
analysis, or scope resolution gets the wrong signal from Ind. Neg correctly marks these
as negation-bearing elements.

Proposal

Change PronType=IndPronType=Neg on nulla, niente, and nessuno (and their
inflected forms like nessuna, nessun) when functioning as pronouns (UPOS=PRON) across
all Italian treebanks. This would bring Italian in line with the UD feature definition and
with every other language's practice.

The DET uses of niente/nessuno ("nessun problema", "niente politici") could also be
updated to PronType=Neg, since the determiner is semantically negative ("no problem", not
"some problem"), but I'd defer to the Italian treebank maintainers on that.

Happy to help with the actual data changes if this proposal is accepted.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions