SEMarkup-2023

A shared task devoted to the automatic semantic markup.

Overview

The shared task contains 2 tracks:

base (Codalab): create a solution that would produce a semantic markup with a dependency head (using the morphosyntactic markup, if possible).
hard (Codalab): create a solution that would produce a simultaneous morpho-, syntactic and semantic markup.

Both tracks imply, among other things, the solution of the All-words WSD problem - disambiguation for all polysemous words (homonyms), as participants have to assign semantic classes to all words.
The presence of morphosyntactic markup in the training dataset makes it possible to take these data into account and, in addition, to find out the connection between different levels of markup.

Markup example

Let us look at the markup using this example sentence:

Еду готовили на костре. (The food was cooked on a fire.)

The markup of the base track consists of 3 types of tags: dependency heads, semantic slots and semantic classes.

Dependency heads: words in a sentence are related. Basically, one word is a dependent, and the other is its head, i.e. manages it in some way. This dependency is both semantic and syntactic. Thus, the token еду (food) depends on the token готовили (was cooked).
Semantic slots (Глубинные позиции, ГП) - semantic roles that specific words occupy in a sentence. In the example sentence еда (food) is the (Object) of cooking, and костер (fire) is the place where cooking was located ((Locative)).
Semantic classes (семантические классы, СК) are semantic categories, particular interpretations of words. I.e., еда (food) would have a semantic class FOOD, as well as готовить (was cooked) TO_PREPARE_FOOD_SUBSTANCE.

The whole markup of this example for the BASE track sentence runs as follows:

# text = Еду готовили на костре.
1	Еду	_	_	_	_	2	_	Object	FOOD
2	готовили _	_	_	_	0	_	Predicate	TO_PREPARE_FOOD_SUBSTANCE
3	на	_	_	_	_	4	_	_	PREPOSITION
4	костре _	_	_	_	2	_	Locative	OBJECT_BY_FUNCTION_AND_PROPERTY
5	.	_	_	_	_	2	_	_	_

Here special attention should be paid to homonyms еду (food) and готовили (was cooked). The token еду, apart from semantic tags Object FOOD, can be interpreted in a lexicon as Predicate TO_GO_AND_TRANSFER (1 person sing. verb form of ехать), whereas готовили may also have tags Predicate READINESS.

The markup of this example for the HARD track sentence runs as follows:

# text = Еду готовили на костре.
1	Еду	еда	NOUN	_	Animacy=Inan|Case=Acc|Gender=Fem|Number=Sing	2	obj	Object	FOOD	_
2	готовили	готовить	VERB	_	Aspect=Imp|Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Act	0	root	Predicate	TO_PREPARE_FOOD_SUBSTANCE	_
3	на	на	ADP	_	_	4	case	_	PREPOSITION	_
4	костре	костёр	NOUN	_	Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing	2	obl	Locative	OBJECT_BY_FUNCTION_AND_PROPERTY	_
5	.	.	PUNCT	_	_	2	punct	_	punct	_

Besides labelling dependency heads, semantic slots and classes, we suggest that the participants mark up lemmas, PoS tags, grammatical features and dependency relations according to UD (Universal Dependencies).

Dataset

Train dataset

We have created and published the first open corpus for Russian which contains 3-level markup:

Morphology (UD)
Syntax (UD)
Semantics (Simplified Compreno format)

We believe that simultaneous markup of these three language levels is a challenge even more complicated than Dialogue GramEval-2020 competition, where 2 language levels were introduced, morphology and syntax.
The dataset for this task is based on the news texts of the NewsRU site. It was labelled automatically by the Compreno system. This markup was checked manually and automatically converted to the UD format. The conversion was also partially hand-checked.

Important links

Tagsets and other useful information

Detailed Description – a detailed description of the corpus, formats and conversion features
Semantic Slots – a list of semantic slots with their unsimplified counterparts
Semantic Classes – a list of semantic classes (unsimplified) with their hyperonyms which were used in the simplified version of the format
UD Morphology tagset - UD morphological tagset: PoS-tags and grammatical features (the link is provided for the tagset published in GramEval-2020 competition)
UD Dependency relations (syntax) - UD dependency relations
Acknowledgements – project participants

Timeline:

20 January - train dataset is published;
6 February - test dataset and CodaLab is published;
20 March - shared task deadline, results publication;
1 April - paper submission deadline.

Organizers

Maria Petrova
Alexandra Ivoylova (RSUH)
Ilya Bayuk
Darya Dyachkova (RSUH)
Mariia Michurina (RSUH)
Angela Shumilova (RSUH)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
baseline		baseline
evaluate		evaluate
img		img
tagsets		tagsets
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
README_RU.md		README_RU.md
README_detailed_description.md		README_detailed_description.md
README_detailed_description_RU.md		README_detailed_description_RU.md
acknowledgements.md		acknowledgements.md
train.conllu		train.conllu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEMarkup-2023

Overview

Markup example

Dataset

Important links

Tagsets and other useful information

Timeline:

Organizers

About

Releases

Packages

Languages

License

AngelaShumilova/SEMarkup-2023

Folders and files

Latest commit

History

Repository files navigation

SEMarkup-2023

Overview

Markup example

Dataset

Important links

Tagsets and other useful information

Timeline:

Organizers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages