About

This repository contains code utilized for training and evaluating enhanced trankit models. These models were trained on bigger datasets (UD Slovenian-SSJ r2.12 and UD Slovenian-SST r2.12) than those provided by trankit authors. An iteration trained on UD Slovenian-SSJ data outperformed the original trankit model over all metrics on the SloBench leaderboard.

For a detailed understanding of the inner workings and trankit library options, please refer to the original documentation. This repository serves as an illustration, demonstrating how to leverage the improved models developed during this project. These models are accessible via the CLARIN.SI repository.

Usage example

Below, we provide a step-by-step guide on how to use our models with the trankit tool.

Step 1: Initialization

from trankit import Pipeline, trankit2conllu

# Initialize trankit
p = Pipeline(lang='customized', cache_dir='<PATH TO DOWNLOADED MODELS>', embedding='xlm-roberta-large')

Step 2: Process Input

There are two options for processing input:

Option 1 - Using Text Input:

text = 'Example text!'
dict_output = p(text)

Option 2 - Using a Pre-tokenized List:

pretokenized_list = [['Example', 'pre-tokenized', 'list', '!']]
dict_output = p(pretokenized_list)

Step 3: Convert Output to CONLLu Format

# Convert output from dictionary to CONLLu format
conllu_output = trankit2conllu(dict_output)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
conllu2text.py		conllu2text.py
eval-classla.py		eval-classla.py
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

conllu2text.py

conllu2text.py

eval-classla.py

eval-classla.py

eval.py

eval.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

About

Usage example

Step 1: Initialization

Step 2: Process Input

Option 1 - Using Text Input:

Option 2 - Using a Pre-tokenized List:

Step 3: Convert Output to CONLLu Format

About

Releases

Packages

Languages

License

clarinsi/trankit-train

Folders and files

Latest commit

History

Repository files navigation

About

Usage example

Step 1: Initialization

Step 2: Process Input

Option 1 - Using Text Input:

Option 2 - Using a Pre-tokenized List:

Step 3: Convert Output to CONLLu Format

About

Resources

License

Stars

Watchers

Forks

Languages