T5 for Sentence Split in English:

Sentence Split is task of dividing complex sentence in two simple sentences ex. complex sentence

Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.

can be divided in

Mary likes to play football in her freetime whenever she meets with her friends.

and

Her friends are very nice people.

Goal:

To make best sentence split model available till now

Demo:

Check out the Demo

How to use in your python code:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("flax-community/t5-base-wikisplit")
model = AutoModelForSeq2SeqLM.from_pretrained("flax-community/t5-base-wikisplit")

complex_sentence = "This comedy drama is produced by Tidy , the company she co-founded in 2008 with her husband David Peet , who is managing director ."
sample_tokenized = tokenizer(complex_sentence, return_tensors="pt")

answer = model.generate(sample_tokenized['input_ids'], attention_mask = sample_tokenized['attention_mask'], max_length=256, num_beams=5)
gene_sentence = tokenizer.decode(answer[0], skip_special_tokens=True)
gene_sentence

"""
Output:
This comedy drama is produced by Tidy. She co-founded Tidy in 2008 with her husband David Peet, who is managing director.
"""

Application:

Sentence Simplification
Data Augmentation
Sentence Rephrase

Current Basline from paper

Our Results:

Model	Exact	SARI	BLEU
t5-base-wikisplit	17.93	67.5438	76.9
t5-v1_1-base-wikisplit	18.1207	67.4873	76.9478
byt5-base-wikisplit	11.3582	67.2685	73.1682
t5-large-wikisplit	18.6632	68.0501	77.1881

Accomplishment:

All of our models are having better result for two metrics(Exact and SARI scores) than baseline models
Our t5-base-wikisplit and t5-v1_1-base-wikisplit model are achieving comparative results with half model size or weights that will enable faster inference
We added wikisplit metrics which is freely available at huggingface datasets. It will be easy to calculate relevent scores for this task from now on

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
StreamlitApp		StreamlitApp
images		images
script		script
Create_WikiSplit_in_format.ipynb		Create_WikiSplit_in_format.ipynb
README.md		README.md
preprocess_data.ipynb		preprocess_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StreamlitApp

StreamlitApp

images

images

script

script

Create_WikiSplit_in_format.ipynb

Create_WikiSplit_in_format.ipynb

README.md

README.md

preprocess_data.ipynb

preprocess_data.ipynb

Repository files navigation

T5 for Sentence Split in English:

Goal:

Demo:

How to use in your python code:

Application:

Current Basline from paper

Our Results:

Accomplishment:

Contributors:

To Do:

About

Releases

Packages

Contributors 2

Languages

bhadreshpsavani/t5-sentence-split

Folders and files

Latest commit

History

Repository files navigation

T5 for Sentence Split in English:

Goal:

Demo:

How to use in your python code:

Application:

Current Basline from paper

Our Results:

Accomplishment:

Contributors:

To Do:

About

Topics

Resources

Stars

Watchers

Forks

Languages