In [1]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration, Trainer, TrainingArguments
from datasets import load_dataset

In [2]:
# Load dataset
dataset = load_dataset("cnn_dailymail", "3.0.0")
train_dataset = dataset['train'].select(range(10000))
test_dataset = dataset['test'].select(range(100))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [3]:
print("Article:", train_dataset[0]['article'])
print("Summary:", train_dataset[0]['highlights'])

Article: LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported Â£20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don't plan to be one of those people who, as soon as they turn 18, suddenly buy themselves a massive sports car collection or something similar," he told an Australian interviewer earlier this month. "I don't think I'll be particularly extravagant. "The things I like buying are things that cost about 10 pounds -- books and CDs and DVDs." At 18, Radcliffe will be able to gamble in a casino, buy a drink in a pub or see the horror film "Hostel: Part II," currently six places below his number one movie on the UK box office chart. Deta

In [4]:
# Load model and tokenizer
model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [6]:
# Preprocess data
def preprocess_data(examples):
    inputs = [doc for doc in examples['article']]
    targets = [summary for summary in examples['highlights']]
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")

    with tokenizer.as_target_tokenizer():
        labels = tokenizer(targets, max_length=150, truncation=True, padding="max_length")

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

train_dataset = train_dataset.map(preprocess_data, batched=True)
test_dataset = test_dataset.map(preprocess_data, batched=True)

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]



Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [7]:
# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=5,
    per_device_eval_batch_size=5,
    num_train_epochs=5,
    weight_decay=0.01,
)



In [8]:
# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

In [9]:
# Train the model
trainer.train()

Epoch,Training Loss,Validation Loss
1,0.9523,0.701795
2,0.9218,0.700065
3,0.9208,0.700321
4,0.9195,0.699634
5,0.9124,0.699439


TrainOutput(global_step=10000, training_loss=0.9839768341064453, metrics={'train_runtime': 2770.4004, 'train_samples_per_second': 18.048, 'train_steps_per_second': 3.61, 'total_flos': 6767090073600000.0, 'train_loss': 0.9839768341064453, 'epoch': 5.0})

In [10]:
# Evaluate the model
trainer.evaluate()

{'eval_loss': 0.6994388103485107,
 'eval_runtime': 1.6248,
 'eval_samples_per_second': 61.548,
 'eval_steps_per_second': 12.31,
 'epoch': 5.0}

In [12]:
model.save_pretrained("/content/drive/MyDrive/summarization")

In [13]:
tokenizer.save_pretrained("/content/drive/MyDrive/tokenizer")

('/content/drive/MyDrive/tokenizer/tokenizer_config.json',
 '/content/drive/MyDrive/tokenizer/special_tokens_map.json',
 '/content/drive/MyDrive/tokenizer/spiece.model',
 '/content/drive/MyDrive/tokenizer/added_tokens.json')

In [17]:
def summarize(text):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True).to(device)
    outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [20]:
# Example usage
text= "Climate change poses a significant threat to global ecosystems and human societies. The rise in greenhouse gas emissions, primarily from human activities such as fossil fuel combustion and deforestation, has led to unprecedented shifts in temperature patterns, sea level rise, extreme weather events, and biodiversity loss. Addressing this complex issue requires coordinated efforts across sectors, including policy interventions, technological innovations, and societal behavior changes. Renewable energy sources such as solar, wind, and hydroelectric power offer sustainable alternatives to fossil fuels, reducing carbon emissions and mitigating climate impacts. Additionally, advancements in carbon capture and storage technologies aim to sequester carbon dioxide from industrial processes and power generation, further curbing greenhouse gas concentrations in the atmosphere. Climate adaptation strategies are also crucial, involving measures to protect vulnerable communities, enhance resilience to extreme weather events, and promote sustainable land and water management practices. International cooperation through agreements like the Paris Agreement plays a pivotal role in fostering global consensus and collective action towards achieving climate goals. As efforts intensify to combat climate change, fostering innovation, scaling up sustainable practices, and mobilizing resources effectively are critical for building a resilient and low-carbon future for generations to come."
summary = summarize(text)
print(summary)

Climate adaptation strategies are crucial, involving measures to protect vulnerable communities, enhance resilience to extreme weather events, and promote sustainable land and water management practices. Climate adaptation strategies are also crucial, involving measures to protect vulnerable communities and enhance resilience to extreme weather events.


In [1]:
from transformers import T5Tokenizer, T5ForConditionalGeneration, Trainer, TrainingArguments

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
import torch

In [2]:
model = T5ForConditionalGeneration.from_pretrained('summarization')
tokenizer = T5Tokenizer.from_pretrained('tokenizer')


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [3]:
def summarize(text):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True).to(device)
    outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [6]:
# Example usage
text= "Climate change poses a significant threat to global ecosystems and human societies. The rise in greenhouse gas emissions, primarily from human activities such as fossil fuel combustion and deforestation, has led to unprecedented shifts in temperature patterns, sea level rise, extreme weather events, and biodiversity loss. Addressing this complex issue requires coordinated efforts across sectors, including policy interventions, technological innovations, and societal behavior changes. Renewable energy sources such as solar, wind, and hydroelectric power offer sustainable alternatives to fossil fuels, reducing carbon emissions and mitigating climate impacts. Additionally, advancements in carbon capture and storage technologies aim to sequester carbon dioxide from industrial processes and power generation, further curbing greenhouse gas concentrations in the atmosphere. Climate adaptation strategies are also crucial, involving measures to protect vulnerable communities, enhance resilience to extreme weather events, and promote sustainable land and water management practices. International cooperation through agreements like the Paris Agreement plays a pivotal role in fostering global consensus and collective action towards achieving climate goals. As efforts intensify to combat climate change, fostering innovation, scaling up sustainable practices, and mobilizing resources effectively are critical for building a resilient and low-carbon future for generations to come."
summary = summarize(text)
print(summary)

Climate adaptation strategies are crucial, involving measures to protect vulnerable communities, enhance resilience to extreme weather events, and promote sustainable land and water management practices. Climate adaptation strategies are also crucial, involving measures to protect vulnerable communities and enhance resilience to extreme weather events.
