# Text Summarisation with Text-to-Text Transfer Transformer (T5) Model

## 1.0. Introduction

In today's digital age, news flows in an endless stream from various sources. We have great amount of news articles everyday. But, there are a small amount of useful information in the articles and it is hard to extract useful information manually. As a result, there are lots of news articles but, it is hard to read all of articles and find informative news manually. One of the solutions for this problem is to summarize texts in the article.

<p align='center'>
    <img src="https://blog.fpt-software.com/hs-fs/hubfs/image-8.png?width=376&name=image-8.png" alt="Text Summarisation Visual" />
</p>

### 1.1. Problem Statement
Text summarisation automatically gives the reader a summary containing important sentences and relevant information about an article. This is highly useful because it shortens the time needed to capture the meaning and main events of an article. Broadly, there are 2 ways of performing text summarisation - abstractive and extractive. 

**Abstractive.** Abstractive methods analyse input texts and generate new texts that capture the essence of the original text. If trained correctly, they convey the same meaning as the original text, yet are more concise.

**Extractive.** Extractive methods, on the other, take out the important texts from the original text and joins them to form a summary. Hence, they do not generate any new texts.

In this assignment, we'll use the abstractive method to solve the following problem - **given a news article, can we return a succinct summary of the article?**

### 1.2. Abstractive Text Summarisation
Abstractive text summarisation can be achieved with transformer models. Specifically, we will apply transfer learning from pre-trained models that will be fine tuned on a downstream task. 

Transfer learning, in the context of transformers, are very helpful in improving model performances. Fortunately, there are available pre-trained transformers that are available to be used. Notable examples include the Bi-Directional Encoder Representations from Transformers (BERT), and Text-to-Text Transfer Transformer (T5). For our assignment, we will use the T5 model. It is an encoder-decoder model that's been pre-trained on multiple types of tasks. As a result, it works well on a variety of tasks. 

But, before we implement our pre-trained T5 model, we will want to fine-tune it with a supervised task of reading news articles as inputs and training them against their respective authored news summaries. Fortunately, there are available data sets that offer such news summarisation, e.g. CNN+Dailynews, and XSum. By fine tuning the model, we ensure that the model is better trained for our given task of extracting a succinct summary from news articles. **We'll be using XSum.** 

**Why do we use the XSum dataset?** XSum stands for 'Extreme Summarisation' and it is a dataset for evaluating single-document summarisation systems. Each article summary follows the question of 'What is the article about?'. It comprises of 226,711 news articles accompanied with one-sentence summary, and they are collected from BBC (from 2010 to 2017) which cover a wide variety of genres such as general news, politics, sports, weather, business, technology, science, health, family, education, entertainment and arts. With a wide span of genre, it is the ideal dataset to use for our pre-trained models fine tuning exercise.

### 1.3. Environment
AWS EC2 Instance - Deep Learning AMI GPU TensorFlow 2.7.3 (Ubuntu 20.04). Instance type: c5.2xlarge

(Learn how to set up your deep learning workstation with AWS [here](https://medium.com/@bobbycxy/detailed-guide-to-connect-ec2-with-vscode-2c084c265e36?source=your_stories_page))


***

## 2.0. Data Preprocessing
### 2.1. Import Libraries

In [1]:
## import the needed libraries
import os
import logging

import nltk
import numpy as np
import tensorflow as tf
from tensorflow import keras

# Only log error messages
tf.get_logger().setLevel(logging.ERROR)

os.environ["TOKENIZERS_PARALLELISM"] = "false"

2023-10-25 08:29:04.189021: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### 2.2. Prepare Key Parameters


In [2]:
# Percentage of the dataset to split as train and test
TRAIN_TEST_SPLIT = 0.1

# Training Parameters
MAX_INPUT_LENGTH = 1024  # Maximum length of the input to the model
BATCH_SIZE = 8  # Batch-size for training our model
LEARNING_RATE = 2e-5  # Learning-rate for training our model
MAX_EPOCHS = 1  # Maximum number of epochs we will train the model for

# Inference Parameters
MIN_TARGET_LENGTH = 5  # Minimum length of the output by the model
MAX_TARGET_LENGTH = 128  # Maximum length of the output by the model

# What type of model? We'll use the t5-small
MODEL = "t5-small"

### 2.3. Import Data
As mentioned in Section 1.1., we will use the XSum dataset. This dataset is available with huggingface's datasets.

In [3]:
from datasets import load_dataset
raw_datasets = load_dataset("xsum", split="train")

  from .autonotebook import tqdm as notebook_tqdm


Here, we'll explore how the dataset holds the article text and the article summary. We'll learn that the dataset holds the data in a tidy fashion that allows us to read and get the data easily.

In [4]:
import pandas as pd

df = [doc for doc in raw_datasets]
df = pd.DataFrame(df)

i = 0

print("DOCUMENT:\n", df.loc[i,'document'])
print('----------------------------------')
print("SUMMARY:\n", df.loc[i,'summary'])
print('----------------------------------')
print("Number of observations",len(df))

DOCUMENT:
 The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.
However, she said more preventative work could have been carried out to ensure the retaining wall did not fail.
"It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - b

Next, we proceed to take a small stratified sampling of the train and test data. By taking a stratified sampling, we ensure a balanced mix of data.

In [5]:
raw_datasets = raw_datasets.train_test_split(
    train_size=TRAIN_TEST_SPLIT, test_size=TRAIN_TEST_SPLIT
)

## 3.0 Data Preprocessing

Before we train our model, we need to pre-process our inputs. A key step is tokenising the inputs, as well as converting these strings into their respective IDs. We can do this easily by taking a pre-trained tokenizer from the Hugging Face Model Hub.

In [6]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL)

In [7]:
## what does the tokenizer return?
## it returns a dictionary of 2 keys - input ids, and the attention mask.

tokenizer('What does a tokenizer object return when you feed it a string?')

{'input_ids': [363, 405, 3, 9, 14145, 8585, 3735, 1205, 116, 25, 3305, 34, 3, 9, 6108, 58, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

**Note:** When using T5, we should place a prefix titled 'summarize' in the inputs. If the model is meant for translation, then we'll adjust our prefix accordingly.

In [10]:
if MODEL in ["t5-small", "t5-base", "t5-large", "t5-3b", "t5-11b"]:
    prefix = "summarize: "
else:
    prefix = ""

### 3.1. Helper Function

Here, we want to have a function to pre-process the huggingface data. Concretely, we want it to tokenise the inputs and the targets, and return a dictionary of keys - input_ids, attention_mask and labels.

In [11]:
def preprocess(examples):
    # tokenise inputs
    inputs = [prefix + doc for doc in examples["document"]]
    model_inputs = tokenizer(inputs, max_length=MAX_INPUT_LENGTH, truncation=True)

    # tokenise targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(
            examples["summary"], max_length=MAX_TARGET_LENGTH, truncation=True
        )

    model_inputs["labels"] = labels["input_ids"]

    return model_inputs

In [12]:
tokenized_datasets = raw_datasets.map(preprocess, batched=True)


Map:   0%|          | 0/20404 [00:00<?, ? examples/s]

Map: 100%|██████████| 20404/20404 [00:31<00:00, 643.21 examples/s]
Map: 100%|██████████| 20405/20405 [00:31<00:00, 637.66 examples/s]


In [13]:
train_df = [doc for doc in tokenized_datasets['train']]
train_df = pd.DataFrame(train_df)
train_df.head()

Unnamed: 0,document,summary,id,input_ids,attention_mask,labels
0,The 22-year-old was part of the England Under-...,Watford have signed midfielder Nathaniel Chalo...,40603257,"[21603, 10, 37, 1630, 18, 1201, 18, 1490, 47, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[3129, 17, 2590, 43, 3814, 2076, 1846, 49, 180..."
1,Garcia started slowly but controlled the later...,Danny Garcia won the vacant WBC welterweight t...,35394450,"[21603, 10, 22373, 708, 5665, 68, 6478, 8, 865...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[19445, 22373, 751, 8, 14333, 549, 7645, 3, 93..."
2,Samuel Hertz will spend a year working to crea...,A composer who plans to create a work of music...,38962932,"[21603, 10, 15718, 216, 25075, 56, 1492, 3, 9,...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[71, 13075, 113, 1390, 12, 482, 3, 9, 161, 13,..."
3,The message from Craig was displayed for Linsa...,A couple are getting hitched after a proposal ...,39337294,"[21603, 10, 37, 1569, 45, 12870, 47, 6099, 21,...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[71, 1158, 33, 652, 1560, 4513, 227, 3, 9, 638..."
4,Property owners maintain that cracks appearing...,Residents in part of Swindon have said they fe...,26291708,"[21603, 10, 8648, 2713, 1961, 24, 5261, 7, 160...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[24998, 16, 294, 13, 180, 5165, 106, 43, 243, ..."


## 4.0. Model Building

Auto Classes help users to retrieve relevant models in an intuitive manner. In addition, our text summarisation fine tuning exercise involves the use of sequences for the input and the output. Hence, we'll want to use the TFAutoModelForSeq2SeqLM. Loading model weights is simple with the '.from_pretrained()' method.

In [10]:
from transformers import TFAutoModelForSeq2SeqLM

model = TFAutoModelForSeq2SeqLM.from_pretrained(MODEL)

2023-10-24 14:43:40.504782: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Another important step is padding our inputs. That's so that each all inputs and targets share the same length. To achieve this efficiently, we can use the DataCollatorForSeq2Seq.

In [11]:
from transformers import DataCollatorForSeq2Seq

data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, return_tensors="tf")

Finally, we can prepare the train and test dataset. The generation dataset is what we'll use to calculate our evaluation metric score when the model is training. 

In [12]:
train_dataset = tokenized_datasets["train"].to_tf_dataset(
    batch_size=BATCH_SIZE,
    columns=["input_ids", "attention_mask", "labels"],
    shuffle=True,
    collate_fn=data_collator,
)

test_dataset = tokenized_datasets["test"].to_tf_dataset(
    batch_size=BATCH_SIZE,
    columns=["input_ids", "attention_mask", "labels"],
    shuffle=False,
    collate_fn=data_collator,
)

## to calculate our ROUGE score
generation_dataset = (
    tokenized_datasets["test"].shuffle().select(list(range(200))).to_tf_dataset(
        batch_size=BATCH_SIZE,
        columns=["input_ids", "attention_mask", "labels"],
        shuffle=False,
        collate_fn=data_collator,
    )
)

## 5.0. Compiling and Training the Model

In [13]:
optimizer = keras.optimizers.Adam(learning_rate=LEARNING_RATE)
model.compile(optimizer=optimizer)

No loss specified in compile() - the model's internal loss computation will be used as the loss. Don't panic - this is a common way to train TensorFlow models in Transformers! To disable this behaviour please pass a loss argument, or explicitly pass `loss=None` if you do not want your model to compute a loss.


Here, we'll use the ROUGE-L metric for our evaluation of the model. The Rouge-L metric is a score from 0 to 1 indicating how similar two sequences are, based on the length of the longest common subsequence (LCS). In particular, Rouge-L is the weighted harmonic mean (or f-measure) combining the LCS precision (the percentage of the hypothesis sequence covered by the LCS) and the LCS recall (the percentage of the reference sequence covered by the LCS). For more information, use this [link](https://www.tensorflow.org/text/tutorials/text_similarity#rouge-l).

In [14]:
import keras_nlp

rouge_l = keras_nlp.metrics.RougeL()


def metric_fn(eval_predictions):
    predictions, labels = eval_predictions
    decoded_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    for label in labels:
        label[label < 0] = tokenizer.pad_token_id  # Replace masked label tokens
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    result = rouge_l(decoded_labels, decoded_predictions)
    # We will print only the F1 score, you can use other aggregation metrics as well
    result = {"RougeL": result["f1_score"]}

    return result

In [15]:
from transformers.keras_callbacks import KerasMetricCallback

metric_callback = KerasMetricCallback(
    metric_fn, eval_dataset=generation_dataset, predict_with_generate=True
)

callbacks = [metric_callback]

# For now we will use our test set as our validation_data
model.fit(
    train_dataset, validation_data=test_dataset, epochs=MAX_EPOCHS, callbacks=callbacks
)





<keras.src.callbacks.History at 0x7faa649978b0>

In [16]:
model.fit(
    train_dataset, validation_data=test_dataset, epochs=2, callbacks=callbacks
)

Epoch 1/2
Epoch 2/2


<keras.src.callbacks.History at 0x7faa6503ddf0>

### 5.1. Save the Model and Tokenizer for reusing

In [21]:
## Save the model and tokenizer
model.save_pretrained("t5_xsum_trained_model_ec2", from_pt=True)
tokenizer.save_pretrained("t5_xsum_trained_model_ec2_tokenizer", from_pt=True)

('t5_xsum_trained_model_ec2_tokenizer/tokenizer_config.json',
 't5_xsum_trained_model_ec2_tokenizer/special_tokens_map.json',
 't5_xsum_trained_model_ec2_tokenizer/tokenizer.json')

## 6.0. Inferencing

In [17]:
from transformers import pipeline

summarizer = pipeline("summarization", model=model, tokenizer=tokenizer, framework="tf")

index = 0
print(raw_datasets["test"][index]["document"])
print('----------------------------------------------------------------')

result = summarizer(raw_datasets["test"][index]["document"],
                    min_length=MIN_TARGET_LENGTH,
                    max_length=MAX_TARGET_LENGTH)

print(result[0]['summary_text'])

The Prince has been visiting Queen's University's cyber security unit at the  Science Park in Belfast's Titanic Quarter.
Among the cyber security on show is a system that prevents hackers from accessing water and electricity supplies.
Prince Charles will be joined by the Duchess of Cornwall on Tuesday.
Among those who met Prince Charles during his visit to the cyber security unit were First Minister Arlene Foster, East Belfast MP Gavin Robinson and Deputy Lord Mayor Guy Spence.
The Prince was accompanied by Lord Lieutenant Fionnuala Jay O'Boyle and Secretary of State Theresa Villiers.
----------------------------------------------------------------
Prince Charles will be joined by the Duchess of Cornwall on a visit to the Queen's University cyber security unit.


### 6.1. Test it Out

In [18]:
text_to_summarise = """SINGAPORE – In an attempt to evade arrest, a doctor who drove a car after drinking beer tried to change seats with his passenger when he spotted a police roadblock.

The passenger refused to do so and Nah Kwang Meng, who practises at Dr Nah & Lee Family Clinic in Woodlands, initially failed a breathalyser test after he stepped out of the vehicle.

He was later found to have 32 micrograms (mcg) of alcohol in 100ml of breath – below the prescribed legal limit of 35mcg.

Even though he had not been drink driving, Nah, 41, was fined $4,000 on Friday after he pleaded guilty to one count of attempting to perform an act that could pervert the course of justice.

Assistant Public Prosecutor Chye Jer Yuan told the court that before going behind the wheel on July 14, 2022, Nah had dinner and consumed about three to four glasses of beer.

He was driving along Sophia Road towards Upper Wilkie Road shortly before 11.30pm when he spotted a police roadblock.

The prosecutor said: “The accused requested his front-seat passenger to swop seats with him, so that he would not be presented as the driver of the vehicle at the roadblock."""

print(text_to_summarise)
print('----------------------------------------------------------------')

result = summarizer(text_to_summarise,
                    min_length=MIN_TARGET_LENGTH,
                    max_length=MAX_TARGET_LENGTH)

print(result[0]['summary_text'])

SINGAPORE – In an attempt to evade arrest, a doctor who drove a car after drinking beer tried to change seats with his passenger when he spotted a police roadblock.

The passenger refused to do so and Nah Kwang Meng, who practises at Dr Nah & Lee Family Clinic in Woodlands, initially failed a breathalyser test after he stepped out of the vehicle.

He was later found to have 32 micrograms (mcg) of alcohol in 100ml of breath – below the prescribed legal limit of 35mcg.

Even though he had not been drink driving, Nah, 41, was fined $4,000 on Friday after he pleaded guilty to one count of attempting to perform an act that could pervert the course of justice.

Assistant Public Prosecutor Chye Jer Yuan told the court that before going behind the wheel on July 14, 2022, Nah had dinner and consumed about three to four glasses of beer.

He was driving along Sophia Road towards Upper Wilkie Road shortly before 11.30pm when he spotted a police roadblock.

The prosecutor said: “The accused request

In [19]:
text_to_summarise = """Oct 19 (Reuters) - Three Palestinians, including two teenagers, were killed by Israeli forces in separate incidents in the occupied West Bank early on Thursday, Palestinian official news agency WAFA said.

Israeli forces stormed the village of Budrus, west of Ramallah, shooting dead a young man, Gebriel Awad, and wounding another, WAFA said.

In other incidents, a 14-year-old was killed by a bullet wound in the head in a refugee camp south of Bethlehem and a 16-year-old succumbed to his wounds after being shot in the town of Tulkarm, the news agency added.

There was no immediate comment from Israel.

Dozens of Palestinians have been killed in the West Bank in the latest flare-up of Israeli-Palestinian violence.

Israel is preparing a ground assault in the Gaza Strip in response to a deadly attack by Palestinian militant group Hamas that killed at least 1,400 Israelis, mostly civilians, on Oct. 7.

Israeli forces have carried out their fiercest bombardment of Gaza in response, killing more than 3,000 Palestinians and imposing a total siege on the blockaded enclave that Hamas controls, fuelling anger among Palestinians in the West Bank."""

print(text_to_summarise)
print('----------------------------------------------------------------')

result = summarizer(text_to_summarise,
                    min_length=MIN_TARGET_LENGTH,
                    max_length=MAX_TARGET_LENGTH)

print(result[0]['summary_text'])

Oct 19 (Reuters) - Three Palestinians, including two teenagers, were killed by Israeli forces in separate incidents in the occupied West Bank early on Thursday, Palestinian official news agency WAFA said.

Israeli forces stormed the village of Budrus, west of Ramallah, shooting dead a young man, Gebriel Awad, and wounding another, WAFA said.

In other incidents, a 14-year-old was killed by a bullet wound in the head in a refugee camp south of Bethlehem and a 16-year-old succumbed to his wounds after being shot in the town of Tulkarm, the news agency added.

There was no immediate comment from Israel.

Dozens of Palestinians have been killed in the West Bank in the latest flare-up of Israeli-Palestinian violence.

Israel is preparing a ground assault in the Gaza Strip in response to a deadly attack by Palestinian militant group Hamas that killed at least 1,400 Israelis, mostly civilians, on Oct. 7.

Israeli forces have carried out their fiercest bombardment of Gaza in response, killing 

In [20]:
text_to_summarise = """Get ready to look perfect as you're thinking out loud on Feb 16, 2024 at Ed Sheeran's concert. The Grammy Award-winning singer is heading to Singapore for a one-night show at the National Stadium. Plus, he's bringing along English singer Calum Scott as a guest.

Tickets for the concert will cost between S$88 and S$488 and can be purchased via Ticketmaster and at SingPost outlets.

If you signed up for a UOB card for Taylor Swift's concert and didn't cancel your membership afterward, here's some great news. UOB cardholders can enjoy a presale from 10am on Oct 27 till 9.59 am on Oct 29.

A second presale will be held for KrisFlyer members from 10am on Oct 30 to 9.59am on Oct 31. To get in on this presale, KrisFlyer UOB credit and debit cardholders will need to subscribe to receive KrisFlyer and SIA Group promotional emails via their KrisFlyer account preferences. They will then receive a unique access code from KrisFlyer via email on Oct 27. 

Members who are not KrisFlyer UOB credit or debit cardholders can download Kris+, the SIA Group’s lifestyle rewards app and spend 150 miles between Oct 20 and 25 to redeem a unique access code. Do note that redemptions are limited to the first 110,000 customers.

Alternatively, those with loads of miles to spare can opt to redeem Categories 1 to 4 concert tickets using their miles via KrisFlyer Experiences from Oct 30. Tickets from Categories 1 to 4 may be redeemed with 49,000; 38,000; 29,000 and 19,000 miles, respectively.

General sale will commence from 11am on Oct 31."""

print(text_to_summarise)
print('----------------------------------------------------------------')

result = summarizer(text_to_summarise,
                    min_length=MIN_TARGET_LENGTH,
                    max_length=MAX_TARGET_LENGTH)

print(result[0]['summary_text'])

Get ready to look perfect as you're thinking out loud on Feb 16, 2024 at Ed Sheeran's concert. The Grammy Award-winning singer is heading to Singapore for a one-night show at the National Stadium. Plus, he's bringing along English singer Calum Scott as a guest.

Tickets for the concert will cost between S$88 and S$488 and can be purchased via Ticketmaster and at SingPost outlets.

If you signed up for a UOB card for Taylor Swift's concert and didn't cancel your membership afterward, here's some great news. UOB cardholders can enjoy a presale from 10am on Oct 27 till 9.59 am on Oct 29.

A second presale will be held for KrisFlyer members from 10am on Oct 30 to 9.59am on Oct 31. To get in on this presale, KrisFlyer UOB credit and debit cardholders will need to subscribe to receive KrisFlyer and SIA Group promotional emails via their KrisFlyer account preferences. They will then receive a unique access code from KrisFlyer via email on Oct 27. 

Members who are not KrisFlyer UOB credit or 

## 7.0. Loading and Inferencing from Saved Model and Tokenizer

In [16]:
from transformers import TFAutoModelForSeq2SeqLM

model_load_inf = TFAutoModelForSeq2SeqLM.from_pretrained('t5_xsum_trained_model_ec2')

All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5_xsum_trained_model_ec2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [17]:
from transformers import AutoTokenizer

tokenizer_load_inf = AutoTokenizer.from_pretrained('t5_xsum_trained_model_ec2_tokenizer')

In [18]:
from transformers import pipeline

summarizer_load_inf = pipeline("summarization", model=model_load_inf, tokenizer=tokenizer_load_inf, framework="tf")

In [19]:
text_to_summarise = """SINGAPORE – In an attempt to evade arrest, a doctor who drove a car after drinking beer tried to change seats with his passenger when he spotted a police roadblock.

The passenger refused to do so and Nah Kwang Meng, who practises at Dr Nah & Lee Family Clinic in Woodlands, initially failed a breathalyser test after he stepped out of the vehicle.

He was later found to have 32 micrograms (mcg) of alcohol in 100ml of breath – below the prescribed legal limit of 35mcg.

Even though he had not been drink driving, Nah, 41, was fined $4,000 on Friday after he pleaded guilty to one count of attempting to perform an act that could pervert the course of justice.

Assistant Public Prosecutor Chye Jer Yuan told the court that before going behind the wheel on July 14, 2022, Nah had dinner and consumed about three to four glasses of beer.

He was driving along Sophia Road towards Upper Wilkie Road shortly before 11.30pm when he spotted a police roadblock.

The prosecutor said: “The accused requested his front-seat passenger to swop seats with him, so that he would not be presented as the driver of the vehicle at the roadblock."""

print(text_to_summarise)
print('----------------------------------------------------------------')

result = summarizer_load_inf(text_to_summarise,
                    min_length=MIN_TARGET_LENGTH,
                    max_length=MAX_TARGET_LENGTH)

print(result[0]['summary_text'])

SINGAPORE – In an attempt to evade arrest, a doctor who drove a car after drinking beer tried to change seats with his passenger when he spotted a police roadblock.

The passenger refused to do so and Nah Kwang Meng, who practises at Dr Nah & Lee Family Clinic in Woodlands, initially failed a breathalyser test after he stepped out of the vehicle.

He was later found to have 32 micrograms (mcg) of alcohol in 100ml of breath – below the prescribed legal limit of 35mcg.

Even though he had not been drink driving, Nah, 41, was fined $4,000 on Friday after he pleaded guilty to one count of attempting to perform an act that could pervert the course of justice.

Assistant Public Prosecutor Chye Jer Yuan told the court that before going behind the wheel on July 14, 2022, Nah had dinner and consumed about three to four glasses of beer.

He was driving along Sophia Road towards Upper Wilkie Road shortly before 11.30pm when he spotted a police roadblock.

The prosecutor said: “The accused request

In [20]:
text_to_summarise = """Oct 19 (Reuters) - Three Palestinians, including two teenagers, were killed by Israeli forces in separate incidents in the occupied West Bank early on Thursday, Palestinian official news agency WAFA said.

Israeli forces stormed the village of Budrus, west of Ramallah, shooting dead a young man, Gebriel Awad, and wounding another, WAFA said.

In other incidents, a 14-year-old was killed by a bullet wound in the head in a refugee camp south of Bethlehem and a 16-year-old succumbed to his wounds after being shot in the town of Tulkarm, the news agency added.

There was no immediate comment from Israel.

Dozens of Palestinians have been killed in the West Bank in the latest flare-up of Israeli-Palestinian violence.

Israel is preparing a ground assault in the Gaza Strip in response to a deadly attack by Palestinian militant group Hamas that killed at least 1,400 Israelis, mostly civilians, on Oct. 7.

Israeli forces have carried out their fiercest bombardment of Gaza in response, killing more than 3,000 Palestinians and imposing a total siege on the blockaded enclave that Hamas controls, fuelling anger among Palestinians in the West Bank."""

print(text_to_summarise)
print('----------------------------------------------------------------')

result = summarizer_load_inf(text_to_summarise,
                    min_length=MIN_TARGET_LENGTH,
                    max_length=MAX_TARGET_LENGTH)

print(result[0]['summary_text'])

Oct 19 (Reuters) - Three Palestinians, including two teenagers, were killed by Israeli forces in separate incidents in the occupied West Bank early on Thursday, Palestinian official news agency WAFA said.

Israeli forces stormed the village of Budrus, west of Ramallah, shooting dead a young man, Gebriel Awad, and wounding another, WAFA said.

In other incidents, a 14-year-old was killed by a bullet wound in the head in a refugee camp south of Bethlehem and a 16-year-old succumbed to his wounds after being shot in the town of Tulkarm, the news agency added.

There was no immediate comment from Israel.

Dozens of Palestinians have been killed in the West Bank in the latest flare-up of Israeli-Palestinian violence.

Israel is preparing a ground assault in the Gaza Strip in response to a deadly attack by Palestinian militant group Hamas that killed at least 1,400 Israelis, mostly civilians, on Oct. 7.

Israeli forces have carried out their fiercest bombardment of Gaza in response, killing 

In [21]:
text_to_summarise = """Get ready to look perfect as you're thinking out loud on Feb 16, 2024 at Ed Sheeran's concert. The Grammy Award-winning singer is heading to Singapore for a one-night show at the National Stadium. Plus, he's bringing along English singer Calum Scott as a guest.

Tickets for the concert will cost between S$88 and S$488 and can be purchased via Ticketmaster and at SingPost outlets.

If you signed up for a UOB card for Taylor Swift's concert and didn't cancel your membership afterward, here's some great news. UOB cardholders can enjoy a presale from 10am on Oct 27 till 9.59 am on Oct 29.

A second presale will be held for KrisFlyer members from 10am on Oct 30 to 9.59am on Oct 31. To get in on this presale, KrisFlyer UOB credit and debit cardholders will need to subscribe to receive KrisFlyer and SIA Group promotional emails via their KrisFlyer account preferences. They will then receive a unique access code from KrisFlyer via email on Oct 27. 

Members who are not KrisFlyer UOB credit or debit cardholders can download Kris+, the SIA Group’s lifestyle rewards app and spend 150 miles between Oct 20 and 25 to redeem a unique access code. Do note that redemptions are limited to the first 110,000 customers.

Alternatively, those with loads of miles to spare can opt to redeem Categories 1 to 4 concert tickets using their miles via KrisFlyer Experiences from Oct 30. Tickets from Categories 1 to 4 may be redeemed with 49,000; 38,000; 29,000 and 19,000 miles, respectively.

General sale will commence from 11am on Oct 31."""

print(text_to_summarise)
print('----------------------------------------------------------------')

result = summarizer_load_inf(text_to_summarise,
                    min_length=MIN_TARGET_LENGTH,
                    max_length=MAX_TARGET_LENGTH)

print(result[0]['summary_text'])

Get ready to look perfect as you're thinking out loud on Feb 16, 2024 at Ed Sheeran's concert. The Grammy Award-winning singer is heading to Singapore for a one-night show at the National Stadium. Plus, he's bringing along English singer Calum Scott as a guest.

Tickets for the concert will cost between S$88 and S$488 and can be purchased via Ticketmaster and at SingPost outlets.

If you signed up for a UOB card for Taylor Swift's concert and didn't cancel your membership afterward, here's some great news. UOB cardholders can enjoy a presale from 10am on Oct 27 till 9.59 am on Oct 29.

A second presale will be held for KrisFlyer members from 10am on Oct 30 to 9.59am on Oct 31. To get in on this presale, KrisFlyer UOB credit and debit cardholders will need to subscribe to receive KrisFlyer and SIA Group promotional emails via their KrisFlyer account preferences. They will then receive a unique access code from KrisFlyer via email on Oct 27. 

Members who are not KrisFlyer UOB credit or 

## 8.0. Conclusion

Our methodology of using pre-trained T5 models and fine-tuning them on the supervised tasks of of news summarisation has shown rather decent summaries of the 3 given news articles - article1, articel2, and articel3. Article1 missed capturing celebrity details, but there was notable agreements between article2 and article3, and their respective news summarisation. All these was achieved with 3 training epochs and fine-tuning a small T5 model on AWS EC2.

However, the average ROUGE-L score of 0.20 is considered low which means there is a smaller overlap of words. This suggests that there are further ways to improve upon the existing model. Some ways to improve the model is using a bigger T5 model, adjusting the learning rate, letting the model train over more epochs on the entire training dataset.