GitHub - agnesedaff/Implicit_obj_completion: This repository contains material for a master thesis' project at the University of Pavia: "Automatic Implicit Object completion in Italian: an exploration with BERT"

This repository contains the complete annotated corpus, results and scripts used for my master thesis' project on the authomatic completion of Implicit Arguments in Italian at the University of Pavia (academic year 2022/2023).

TITLE: "Automatic Implicit Object Completion in Italian: an exploration with BERT".

ABSTRACT: This thesis describes an experiment on automatic Implicit Object completion in Italian. The task is structured as a fill-mask or cloze-task to be applied to five Italian BERT models, fully exploiting their bidirectional capabilities. Firstly, starting from a selected Ontology of 30 verbs (37 semantic patterns from the T-PAS resource), a corpus of 1.200 sentences is created. The corpus is divided into two datasets, called EXPLICIT and IMPLICIT. The second dataset, containing Implicit Objects, is manually annotated by two experts with both a Gold Standard (GS) Noun and the type of omission occurring, understood as a Defaulting strategy that can apply either lexically or pragmatically (Jezek, 2018). The manual annotation shows a significant correlation between the type of Defaulting and the range of possible completions for each verb. Subsequently, the experiment is applied and the results are evaluated by calculating the cosine similarity between the model's output and the manual GS completion. It is demonstrated that the model bert-base-italian-xxl-cased performs better than lighter models in the task, thanks to its ability to guess the most frequent collocations in Lexical Defaulting contexts. It is confirmed what has been observed in previous studies, namely that BERT models tend to favor the frequency of n-grams, with some difficulty in completing the Object when a deeper understanding of semantic relationships is required (e.g., output = "Il postino suona [il pianoforte] sempre due volte"). Furthermore, it is observed that the models tend to return words in metonymic relation to the GS, replicating the mechanism of semantic coercion (Pustejovsky and Jezek, 2008), and possess limited sensitivity to linguistic boundaries in the explicitation of Shadow Arguments.

The corpus has also been presented at the conference CliC-it 2023 with the paper:

Daffara, Agnese e Jezek, Elisabetta (2023). Towards an Italian Corpus for Implicit Object Completion. In Proceedings of the ninth Italian Conference on Computational Linguistics CliC-it 2023, Venice, Italy. https://ceur-ws.org/Vol-3596/paper19.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
README.md		README.md
Tesi.pdf		Tesi.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

agnesedaff/Implicit_obj_completion

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages