SloT5-tools

Scripts used for training and evaluation of SloT5 models

Training

Corpora used for training the SloT5 models is the same as for SloBERTa model. For pre-processing the corpora, please refer to https://github.com/clarinsi/Slovene-BERT-Tool For SloT5 we just reformat the txt files (before sentencepiece tokenization) into TSV format, using training/txt2tsv.py

If we have a nvidia enroot container, with text-to-text-transfer-transformer installed in the container, we can run the pre-training with the training/t5_pretraining.sh script, where we provide the desired .gin files, containing the model architecture and other parameters.

Evaluation

For evaluation, we use the provided evaluation/run_summarization.py code by Huggingface. For each evaluation task, a bash script is provided in the evaluation folder with the parameters used for fine-tuning the T5 models.

After fine-tuning, we can calculate the F1 and accuracy scores for each classification task using the evaluation/t5-predictions-analysis.py script.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
evaluation		evaluation
training		training
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SloT5-tools

Training

Evaluation

About

Releases

Packages

Languages

License

MatejUlcar/SloT5-tools

Folders and files

Latest commit

History

Repository files navigation

SloT5-tools

Training

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages