ToTTo Fine-tuning in colab

ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description.

This repository serves as the hub for the ToTTo dataset, offering a collection of diverse T5-base models fine-tuned on the ToTTo dataset. We present a comparative analysis of various T5-base models and state-of-the-art (SOTA) models to assess their performance on the controlled table-to-text generation task proposed by ToTTo.

In addition, for beginners to learn easily, all tasks come with Colab notebooks for seamless execution.

Dataset

!wget https://storage.googleapis.com/totto-public/totto_data.zip
!unzip totto_data.zip

Models

Model	Train Colab Link	Evaluation Link
t5-small [Full fine-tuning]
t5-small [LoRA fine-tuning]
t5-base [Full fine-tuning]
t5-base [LoRA fine-tuning]
LATTICE(t5-small)		-
LATTICE(t5-base)		-

Metrics

BLEU
PARENT

BLEURT

Requirements

!git clone https://github.com/Song-Joo-Young/language.git language_repo
!pip install git+https://github.com/google-research/bleurt.git
%cd language_repo
# Downloads the BLEURT-base checkpoint.
!wget https://storage.googleapis.com/bleurt-oss-21/BLEURT-20.zip .
!unzip BLEURT-20.zip
!pip3 install -r language/totto/eval_requirements.txt

To evaluate

!bash language/totto/totto_eval.sh --prediction_path /content/drive/MyDrive/ToTTo_T5-base/generation_dev_epoch.txt --target_path /content/drive/MyDrive/ToTTo_T5-base/totto_dev_data.jsonl

Results

Model	BLEU	PARENT	BLEURT	BLEU (Overlap)	PARENT (Overlap)	BLEURT (Overlap)	BLEU (Non-Overlap)	PARENT (Non-Overlap)	BLEURT (Non-Overlap)
T5-small	44.9	55.96	0.6514	52.0	59.91	0.6908	38.0	52.15	0.6134
T5-small (LoRA)	42.2	53.96	0.6340	48.9	57.50	0.6721	35.7	50.55	0.5973
LATTICE (T5-small)	47.4	57.8	-	55.6	62.3	-	39.1	53.3	-
T5-base	47.3	57.73	0.6677	54.9	61.52	0.7050	40.0	54.06	0.6318
T5-base (LoRA)	44.7	56.08	0.6530	51.6	59.82	0.6893	38.0	52.47	0.6180
LATTICE (T5-base)	48.4	58.1	-	56.1	62.4	-	40.4	53.9	-

LATTICE result : ToTTo Official Leaderboard refer to Results

Model	t5-small Full fine-tuning	t5-small LoRA	t5-small LATTICE	t5-base Full fine-tuning	t5-base LoRA	t5-base LATTICE
Epoch	10	10	-	5	3	-
Learning rate	0.0001	0.001	-	0.0001	0.001	-
Batch size	16	auto find	-	8	auto find	-
Learning Time	12:44:05	9:40:07	-	18:19:41	10:09:47	-

Training parameter: LoRA Tuning Seq2SeqTrainingArguments - auto_find_batch_size = True. LATTICE provides code only due to a lack of GPU tokens. ToTTo Official Leaderboard refer to Results

Appendix

Full Fine-tuning Visualization
Report

Reference

T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5 Paper)
LATTICE: Robust (Controlled) Table-to-Text Generation with Structure-Aware Equivariance Learning (LATTICE Paper)
ToTTo: A Controlled Table-To-Text Generation Dataset (ToTTo Paper / ToTTo GitHub Repository)
ToTTo Evaluation supplementary repository (GitHub Repository)
BLEURT: a Transfer Learning-Based Metric for Natural Language Generation (GitHub Repository)
BLUERT Cheackpoints (GitHub Repository)

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
LATTICE		LATTICE
figures		figures
t5-base_Full-fine-tuning		t5-base_Full-fine-tuning
t5-base_LoRA-fine-tuning		t5-base_LoRA-fine-tuning
t5-small_Full-fine-tuning		t5-small_Full-fine-tuning
t5-small_LoRA-fine-tuning		t5-small_LoRA-fine-tuning
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LATTICE

LATTICE

figures

figures

t5-base_Full-fine-tuning

t5-base_Full-fine-tuning

t5-base_LoRA-fine-tuning

t5-base_LoRA-fine-tuning

t5-small_Full-fine-tuning

t5-small_Full-fine-tuning

t5-small_LoRA-fine-tuning

t5-small_LoRA-fine-tuning

README.md

README.md

Repository files navigation

ToTTo Fine-tuning in colab

Dataset

Models

Metrics

Results

Appendix

Reference

About

Releases

Packages

Languages

Song-Joo-Young/ToTTo-Fine-tuning-in-colab

Folders and files

Latest commit

History

Repository files navigation

ToTTo Fine-tuning in colab

Dataset

Models

Metrics

Results

Appendix

Reference

About

Resources

Stars

Watchers

Forks

Languages