<h1><b><p style="background-image: url(https://i.postimg.cc/0Qwf8YX6/2927262.jpg);font-family:camtasia;font-size:110%;color:white;text-align:center;border-radius:15px 50px; padding:7px; border:solid 2px #09375b; box-shadow: 10px 10px 10px #042b4c">Project Title: Text Summarization Using T5-FineTuning</b></h1>



### Tabel of Contents:



* [Import Libraries](#1)

* [Load Dataset](#2)

* [T5 Model](#4)

* [Evaluation](#3)

* [Predictive for Test](#8)

* [Like this? Upvote and comment! 🌊 End](#6)

**<a id="1"></a>

<h1><b><p style="background-image: url(https://i.postimg.cc/0Qwf8YX6/2927262.jpg);font-family:camtasia;font-size:110%;color:white;text-align:center;border-radius:15px 50px; padding:7px; border:solid 2px #09375b; box-shadow: 10px 10px 10px #042b4c">Import Libraries</p></b></h1>

<a class="btn" href="#home">Tabel of Contents</a>

In [1]:
!pip install transformers datasets wandb



In [2]:
import pandas as pd
import torch
import re
from datasets import load_dataset, Dataset
from transformers import T5Tokenizer, T5ForConditionalGeneration, Trainer, TrainingArguments
from huggingface_hub import login
import wandb
wandb.init(mode = "disabled")
import warnings
warnings.filterwarnings('ignore')

**<a id="2"></a>

<h1><b><p style="background-image: url(https://i.postimg.cc/0Qwf8YX6/2927262.jpg);font-family:camtasia;font-size:110%;color:white;text-align:center;border-radius:15px 50px; padding:7px; border:solid 2px #09375b; box-shadow: 10px 10px 10px #042b4c">Load Dataset</p></b></h1>

<a class="btn" href="#home">Tabel of Contents</a>

In [3]:
# Load dataset (example, adjust path as needed)
train_data = pd.read_csv("/kaggle/input/samsum-dataset-text-summarization/samsum-train.csv")
validation_data = pd.read_csv("/kaggle/input/samsum-dataset-text-summarization/samsum-validation.csv")
test_data = pd.read_csv("/kaggle/input/samsum-dataset-text-summarization/samsum-test.csv")
# Display a sample
train_data.head()

Unnamed: 0,id,dialogue,summary
0,13818513,Amanda: I baked cookies. Do you want some?\r\...,Amanda baked cookies and will bring Jerry some...
1,13728867,Olivia: Who are you voting for in this electio...,Olivia and Olivier are voting for liberals in ...
2,13681000,"Tim: Hi, what's up?\r\nKim: Bad mood tbh, I wa...",Kim may try the pomodoro technique recommended...
3,13730747,"Edward: Rachel, I think I'm in ove with Bella....",Edward thinks he is in love with Bella. Rachel...
4,13728094,Sam: hey overheard rick say something\r\nSam:...,"Sam is confused, because he overheard Rick com..."


In [4]:
test_data.head()

Unnamed: 0,id,dialogue,summary
0,13862856,"Hannah: Hey, do you have Betty's number?\nAman...",Hannah needs Betty's number but Amanda doesn't...
1,13729565,Eric: MACHINE!\r\nRob: That's so gr8!\r\nEric:...,Eric and Rob are going to watch a stand-up on ...
2,13680171,"Lenny: Babe, can you help me with something?\r...",Lenny can't decide which trousers to buy. Bob ...
3,13729438,"Will: hey babe, what do you want for dinner to...",Emma will be home soon and she will let Will k...
4,13828600,"Ollie: Hi , are you in Warsaw\r\nJane: yes, ju...",Jane is in Warsaw. Ollie and Jane has a party....


In [5]:
validation_data.head()

Unnamed: 0,id,dialogue,summary
0,13817023,"A: Hi Tom, are you busy tomorrow’s afternoon?\...",A will go to the animal shelter tomorrow to ge...
1,13716628,Emma: I’ve just fallen in love with this adven...,Emma and Rob love the advent calendar. Lauren ...
2,13829420,Jackie: Madison is pregnant\r\nJackie: but she...,Madison is pregnant but she doesn't want to ta...
3,13819648,Marla: <file_photo>\r\nMarla: look what I foun...,Marla found a pair of boxers under her bed.
4,13728448,Robert: Hey give me the address of this music ...,Robert wants Fred to send him the address of t...


In [6]:
def clean_text(text):
    text = str(text)
    text = re.sub(r'\r\n', ' ', text)
    text = re.sub(r'\s+', ' ', text)
    text = re.sub(r'<.*?>', '', text)
    text = text.strip().lower()
    return text

# Apply cleaning to dialogue and summary columns
train_data['dialogue'] = train_data['dialogue'].apply(clean_text)
train_data['summary'] = train_data['summary'].apply(clean_text)

test_data['dialogue'] = test_data['dialogue'].apply(clean_text)
test_data['summary'] = test_data['summary'].apply(clean_text)

validation_data['dialogue'] = validation_data['dialogue'].apply(clean_text)
validation_data['summary'] = validation_data['summary'].apply(clean_text)
train_data.head()

Unnamed: 0,id,dialogue,summary
0,13818513,amanda: i baked cookies. do you want some? jer...,amanda baked cookies and will bring jerry some...
1,13728867,olivia: who are you voting for in this electio...,olivia and olivier are voting for liberals in ...
2,13681000,"tim: hi, what's up? kim: bad mood tbh, i was g...",kim may try the pomodoro technique recommended...
3,13730747,"edward: rachel, i think i'm in ove with bella....",edward thinks he is in love with bella. rachel...
4,13728094,sam: hey overheard rick say something sam: i d...,"sam is confused, because he overheard rick com..."


In [7]:
train_dataset = Dataset.from_pandas(train_data)
test_dataset = Dataset.from_pandas(test_data)
validation_dataset = Dataset.from_pandas(validation_data)
train_dataset

Dataset({
    features: ['id', 'dialogue', 'summary'],
    num_rows: 14732
})

<a id="4"></a>

<h1><b><p style="background-image: url(https://i.postimg.cc/0Qwf8YX6/2927262.jpg);font-family:camtasia;font-size:110%;color:white;text-align:center;border-radius:15px 50px; padding:7px; border:solid 2px #09375b; box-shadow: 10px 10px 10px #042b4c">T5 Model</p></b></h1>

<a class="btn" href="#home">Tabel of Contents</a>

In [8]:
# Initialize tokenizer and model
model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [9]:
input_max_len = max(len(tokenizer.encode(text)) for text in train_data['dialogue'])
output_max_len = max(len(tokenizer.encode(text)) for text in train_data['summary'])
print(f"Calculated Input max_length: {input_max_len}")
print(f"Calculated Output max_length: {output_max_len}")

Token indices sequence length is longer than the specified maximum sequence length for this model (594 > 512). Running this sequence through the model will result in indexing errors


Calculated Input max_length: 1224
Calculated Output max_length: 108


In [10]:
# Tokenization function
def tokenize_function(examples):
    # Tokenize the dialogue and summary
    inputs = tokenizer(examples["dialogue"], padding="max_length", truncation=True, max_length=512)
    targets = tokenizer(examples["summary"], padding="max_length", truncation=True, max_length=150)
    inputs["labels"] = targets["input_ids"]
    return inputs
# Tokenize datasets
train_dataset = train_dataset.map(tokenize_function, batched=True)
test_dataset = test_dataset.map(tokenize_function, batched=True)
validation_dataset = validation_dataset.map(tokenize_function, batched=True)
print(train_dataset[0])

Map:   0%|          | 0/14732 [00:00<?, ? examples/s]

Map:   0%|          | 0/819 [00:00<?, ? examples/s]

Map:   0%|          | 0/818 [00:00<?, ? examples/s]

{'id': '13818513', 'dialogue': "amanda: i baked cookies. do you want some? jerry: sure! amanda: i'll bring you tomorrow :-)", 'summary': 'amanda baked cookies and will bring jerry some tomorrow.', 'input_ids': [183, 232, 9, 10, 3, 23, 13635, 5081, 5, 103, 25, 241, 128, 58, 3, 12488, 651, 10, 417, 55, 183, 232, 9, 10, 3, 23, 31, 195, 830, 25, 5721, 3, 10, 18, 61, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

In [11]:
training_args = TrainingArguments(
    output_dir="./Finetuning_T5_Text_Summarization",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    weight_decay=0.01,
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=validation_dataset
)

In [12]:
trainer.train()
trainer.save_model()

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Epoch,Training Loss,Validation Loss
1,0.3967,0.353033



KeyboardInterrupt



In [14]:
tokenizer.save_pretrained(training_args.output_dir)

('./Finetuning_T55_Text_Summarization/tokenizer_config.json',
 './Finetuning_T55_Text_Summarization/special_tokens_map.json',
 './Finetuning_T55_Text_Summarization/spiece.model',
 './Finetuning_T55_Text_Summarization/added_tokens.json')

In [None]:
login(token="HUGGINGFACE_TOKEN")

In [16]:
repo_name = "ahmed792002/Finetuning_T5_Text_Summarization"
trainer.push_to_hub(repo_name)
tokenizer.push_to_hub(repo_name)

training_args.bin:   0%|          | 0.00/5.30k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

events.out.tfevents.1733348814.90dd88cf6c62.23.0:   0%|          | 0.00/7.69k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/ahmed792002/Finetuning_T5_Text_Summarization/commit/2cf829726451d9c0861514d5a67bd79cf0dd2d5f', commit_message='Upload tokenizer', commit_description='', oid='2cf829726451d9c0861514d5a67bd79cf0dd2d5f', pr_url=None, repo_url=RepoUrl('https://huggingface.co/ahmed792002/Finetuning_T5_Text_Summarization', endpoint='https://huggingface.co', repo_type='model', repo_id='ahmed792002/Finetuning_T5_Text_Summarization'), pr_revision=None, pr_num=None)

**<a id="1"></a>

<h1><b><p style="background-image: url(https://i.postimg.cc/0Qwf8YX6/2927262.jpg);font-family:camtasia;font-size:110%;color:white;text-align:center;border-radius:15px 50px; padding:7px; border:solid 2px #09375b; box-shadow: 10px 10px 10px #042b4c">Evaluation</p></b></h1>

<a class="btn" href="#home">Tabel of Contents</a>

In [17]:
results = trainer.evaluate(test_dataset)
print("Evaluation results:")
print("Test Loss",results["eval_loss"])

Evaluation results:
Test Loss 0.3530331254005432


<a id="8"></a>

<h1><b><p style="background-image: url(https://i.postimg.cc/0Qwf8YX6/2927262.jpg);font-family:camtasia;font-size:110%;color:white;text-align:center;border-radius:15px 50px; padding:7px; border:solid 2px #09375b; box-shadow: 20px 10px 10px #042b4c">Predictive for Test</p></b></h1>

In [18]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
def summarize_dialogue(dialogue):
    dialogue = clean_text(dialogue)
    inputs = tokenizer(dialogue, return_tensors="pt", truncation=True, padding="max_length"
                       , max_length=input_max_len)
    # Move input tensors to the same device as the model
    inputs = {key: value.to(device) for key, value in inputs.items()}
    # Generate summary
    outputs = model.generate(
        inputs["input_ids"], 
        max_length=output_max_len,  
        num_beams=4, 
        early_stopping=True
    )
    # Decode the generated summary
    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return summary

In [26]:
test_sentence = test_dataset["dialogue"][50]
test_summary = test_dataset["summary"][50]
summary = summarize_dialogue(test_sentence)
# Display results
print(f"Original :\n{test_sentence}")
print("\n","*"*50,"\n")
print(f"Original Summary :\n{test_summary}")
print("\n","*"*50,"\n")
print(f"Summary :\n{summary}")

Original :
nick: you look absolutely gorgeous and have a lovely smile. nick: would love to get to know you a bit more. how about we meet up for a drink sometime? jane: hmmm... you're shooting a bit above your range aren't you? nick: why would you think that hon? jane: because i'm not that desperate. nick: that was a bit below the belt. nick: you're nice but you're not that hot. jane: oh is your poor little dick shriveling at the thought? nick: actually i'll take it back. forget about the drink. nick: forget i ever wrote to you. jane: bye loser! nick: fucking bitch! jane: you're welcome!

 ************************************************** 

Original Summary :
nick finds jane pretty and invites her for a drink to get to know her better. jane rejects nick and is unpleasant to him. nick suggests jane to forget about their conversation.

 ************************************************** 

Summary :
nick and jane are meeting up for a drink. they're shooting a bit above their range.


In [27]:
test_sentence = test_dataset["dialogue"][200]
test_summary = test_dataset["summary"][200]
summary = summarize_dialogue(test_sentence)
# Display results
print(f"Original :\n{test_sentence}")
print("\n","*"*50,"\n")
print(f"Original Summary :\n{test_summary}")
print("\n","*"*50,"\n")
print(f"Summary :\n{summary}")

Original :
abdellilah: where are you? sam: work abdellilah: what time you finish? sam: not til 5 abdellilah: are your bringing him over tonight: sam: no in the morning: abdellilah: ok, what time? sam: about 9. is that ok? abdellilah: ok - see you then

 ************************************************** 

Original Summary :
sam won't finish work till 5. sam is bringing him over about 9 am. sam will see abdellilah in the morning.

 ************************************************** 

Summary :
sam and abdellilah are bringing abdellilah over to work tonight.


In [29]:
test_sentence = test_dataset["dialogue"][500]
test_summary = test_dataset["summary"][500]
summary = summarize_dialogue(test_sentence)
# Display results
print(f"Original :\n{test_sentence}")
print("\n","*"*50,"\n")
print(f"Original Summary :\n{test_summary}")
print("\n","*"*50,"\n")
print(f"Summary :\n{summary}")

Original :
helen: hey, simo, are you there? simon: yep babe, what's up? helen: i was calling you before... simon: sorry i was on the phone, i didn't hear you... tell me. helen: it's a bit embarrassing... the toilet paper is finished, could you fetch me some tissues, please? simon: hahaha sure, no worries!

 ************************************************** 

Original Summary :
simon was on the phone before so he didn't hear helen calling. simon will fetch helen some tissues as they're out of toilet paper.

 ************************************************** 

Summary :
helen was on the phone when he was on the phone. helen's toilet paper is finished.


In [30]:
test_sentence = """
Violet: Hey Claire! I was reading an article about Austin and thought you might find it interesting! 
Violet: It's about the current trends in urban development and how cities are planning for the future.
Violet: Here, let me share the link: <file_other>
Claire: Oh wow, that sounds like an insightful read. But I've actually already read that one last week. 
Claire: It was really interesting though, especially the part about sustainable architecture in cities. 
Claire: You know, I've been following these urban planning discussions for a while now.
Violet: Oh, I didn’t know that! Well, I’ll look for something else then, maybe something about eco-friendly cities or tech innovations.
Claire: That would be awesome! Let me know if you find something cool.
Violet: Sure, I’ll keep you posted. Thanks for the feedback!
"""
summary = summarize_dialogue(test_sentence)
# Display results
print(f"Original :\n{test_sentence}")
print("\n","*"*50,"\n")
print(f"Summary :\n{summary}")

Original :

Violet: Hey Claire! I was reading an article about Austin and thought you might find it interesting! 
Violet: It's about the current trends in urban development and how cities are planning for the future.
Violet: Here, let me share the link: <file_other>
Claire: Oh wow, that sounds like an insightful read. But I've actually already read that one last week. 
Claire: It was really interesting though, especially the part about sustainable architecture in cities. 
Claire: You know, I've been following these urban planning discussions for a while now.
Violet: Oh, I didn’t know that! Well, I’ll look for something else then, maybe something about eco-friendly cities or tech innovations.
Claire: That would be awesome! Let me know if you find something cool.
Violet: Sure, I’ll keep you posted. Thanks for the feedback!


 ************************************************** 

Summary :
claire was reading an article about austin and thought she might find it interesting. he's already re

In [32]:
test_sentence = """
Violet: Hey Claire! I was reading an article about Austin and thought you might find it interesting! 
Violet: It's about the current trends in urban development and how cities are planning for the future.
Violet: Here, let me share the link: <file_other>
Claire: Oh wow, that sounds like an insightful read. But I've actually already read that one last week. 
Claire: It was really interesting though, especially the part about sustainable architecture in cities. 
Claire: You know, I've been following these urban planning discussions for a while now.
Violet: Oh, I didn’t know that! Well, I’ll look for something else then, maybe something about eco-friendly cities or tech innovations.
Claire: That would be awesome! Let me know if you find something cool.
Violet: Sure, I’ll keep you posted. Thanks for the feedback!
"""
summary = summarize_dialogue(test_sentence)
# Display results
print(f"Original :\n{test_sentence}")
print("\n","*"*50,"\n")
print(f"Summary :\n{summary}")

Original :

Violet: Hey Claire! I was reading an article about Austin and thought you might find it interesting! 
Violet: It's about the current trends in urban development and how cities are planning for the future.
Violet: Here, let me share the link: <file_other>
Claire: Oh wow, that sounds like an insightful read. But I've actually already read that one last week. 
Claire: It was really interesting though, especially the part about sustainable architecture in cities. 
Claire: You know, I've been following these urban planning discussions for a while now.
Violet: Oh, I didn’t know that! Well, I’ll look for something else then, maybe something about eco-friendly cities or tech innovations.
Claire: That would be awesome! Let me know if you find something cool.
Violet: Sure, I’ll keep you posted. Thanks for the feedback!


 ************************************************** 

Summary :
claire was reading an article about austin and thought she might find it interesting. he's already re

In [33]:
test_sentence = """
Reporter: In today's news, the latest climate change report reveals alarming global temperature rises. According to the Intergovernmental Panel on Climate Change (IPCC), the Earth’s temperature is on track to rise by 1.5°C within the next two decades.
Reporter: This is expected to lead to more frequent and severe heatwaves, flooding, and extreme weather events. Coastal cities are at particular risk due to rising sea levels.
Expert: The report emphasizes that immediate action is needed to prevent catastrophic consequences. We need to significantly reduce carbon emissions and transition to renewable energy sources.
Expert: If global temperatures increase by more than 1.5°C, we could face irreversible damage to ecosystems, agriculture, and water supply. It will have a devastating impact on biodiversity as well.
Reporter: The IPCC also stresses the importance of individual action. Governments must set stronger policies, but individuals can help by reducing waste, conserving water, and supporting green initiatives.
Expert: It's not just about the big changes; small actions like using public transportation, reducing meat consumption, and recycling can collectively make a significant difference.
Reporter: With the next UN Climate Summit coming up next month, world leaders will need to prioritize climate action. The stakes have never been higher for our planet’s future.
"""
summary = summarize_dialogue(test_sentence)
# Display results
print(f"Original :\n{test_sentence}")
print("\n","*"*50,"\n")
print(f"Summary :\n{summary}")

Original :

Reporter: In today's news, the latest climate change report reveals alarming global temperature rises. According to the Intergovernmental Panel on Climate Change (IPCC), the Earth’s temperature is on track to rise by 1.5°C within the next two decades.
Reporter: This is expected to lead to more frequent and severe heatwaves, flooding, and extreme weather events. Coastal cities are at particular risk due to rising sea levels.
Expert: The report emphasizes that immediate action is needed to prevent catastrophic consequences. We need to significantly reduce carbon emissions and transition to renewable energy sources.
Expert: If global temperatures increase by more than 1.5°C, we could face irreversible damage to ecosystems, agriculture, and water supply. It will have a devastating impact on biodiversity as well.
Reporter: The IPCC also stresses the importance of individual action. Governments must set stronger policies, but individuals can help by reducing waste, conserving wat

<center><span style="font-family:Palatino; font-size:22px;"><i>Like this? <span style="color:#DC143C;">Upvote and Comment!</span> </i>🌊 End</span> </center>