# Template Notebook

In [11]:
!pip install transformers
!pip install torch
!pip install datasets
!pip install SentencePiece



In [12]:
from transformers import T5Tokenizer, T5ForConditionalGeneration
from transformers import Trainer, TrainingArguments
from transformers import DataCollatorWithPadding
from datasets import load_dataset

In [13]:
def preprocess_function(examples):
    # Tokenize the input texts.
    model_inputs = tokenizer(examples['article'], max_length=512, truncation=True, padding="max_length")
    labels = tokenizer(text_target=examples['highlights'], max_length=128, truncation=True, padding="max_length")
    model_inputs["labels"] = labels["input_ids"]
    
    return model_inputs

In [14]:
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small').to('cpu')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [15]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="pt")

In [16]:
dataset = load_dataset("cnn_dailymail", "3.0.0")

subsample_size = 500  # Num of samples

subsampled_train_dataset = dataset['train'].shuffle(seed=42).select(range(subsample_size)).map(preprocess_function, batched=True)
subsampled_val_dataset = dataset['validation'].shuffle(seed=42).select(range(1000)).map(preprocess_function, batched=True)

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [17]:
training_args = TrainingArguments(
    output_dir='../src/model/results',          
    num_train_epochs=3,              
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,    
    warmup_steps=500,                
    weight_decay=0.01,               
    logging_dir='../src/model/logs',            
)

In [18]:
trainer = Trainer(
    model=model,                         
    args=training_args,                  
    train_dataset=subsampled_train_dataset,         
    eval_dataset=subsampled_val_dataset,
    data_collator=data_collator
)

In [19]:
trainer.train()

Step,Training Loss


TrainOutput(global_step=375, training_loss=4.586693684895834, metrics={'train_runtime': 2329.2452, 'train_samples_per_second': 0.644, 'train_steps_per_second': 0.161, 'total_flos': 203012702208000.0, 'train_loss': 4.586693684895834, 'epoch': 3.0})

In [21]:
# Testing

def generate_summary(text, model, tokenizer):
    # Prefix required by T5 to understand that it's a summarization task
    preprocess_text = "summarize: " + text
    inputs = tokenizer.encode(preprocess_text, return_tensors="pt", max_length=512, truncation=True).to('cpu')
    summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

text = """
Title: Next-Gen Eco-Friendly Transportation: "EcoMove" Unveils Hydrogen-Powered Vehicles

Date: April 15, 2024

Location: Berlin, Germany

In an ambitious move set to revolutionize the transportation industry, the innovative startup EcoMove unveiled its new line of hydrogen-powered vehicles at a grand event in Berlin today. Amidst growing concerns over climate change and the environmental impact of fossil fuels, EcoMove's launch represents a significant leap towards sustainable, eco-friendly transportation solutions.

"EcoMove is not just launching a new vehicle; we're introducing a new era of transportation that aligns with our vision for a cleaner, greener planet," stated EcoMove's CEO, Dr. Lena Hoffmann, during the product reveal. The company's flagship model, the H2Rover, combines cutting-edge technology with sleek design, offering both efficiency and performance without compromising on environmental values.

The H2Rover stands out for its zero-emission technology. Unlike traditional electric vehicles that rely on lithium batteries, which pose recycling and mining concerns, the H2Rover utilizes hydrogen fuel cells. These cells generate electricity through a chemical reaction between hydrogen and oxygen, with water vapor as the only byproduct, thereby eliminating harmful emissions.

The unveiling event showcased the H2Rover's innovative features, including its advanced fuel cell technology, regenerative braking system, and state-of-the-art safety protocols. With a range of over 600 kilometers on a single fueling and a refueling time of less than five minutes, the H2Rover addresses one of the most significant barriers to hydrogen vehicle adoption: convenience.

EcoMove's approach extends beyond the vehicles themselves, as the company has announced partnerships with renewable energy providers to establish a network of green hydrogen fueling stations across Europe. "Our goal is to make hydrogen fuel both accessible and sustainable, ensuring that every kilometer driven by an H2Rover contributes to our vision of a carbon-neutral future," explained Hoffmann.

The response from the industry and environmental groups has been overwhelmingly positive, with many praising EcoMove's commitment to innovation and sustainability. "EcoMove is setting a new standard for the automotive industry," said Kai Müller, Executive Director of GreenFuture, an environmental NGO. "Their holistic approach to eco-friendly transportation, from vehicle design to fuel production and distribution, marks a significant step forward in our fight against climate change."

As EcoMove prepares for the commercial release of the H2Rover, the company faces the challenge of scaling its technology and infrastructure to meet anticipated demand. However, with strong backing from investors and a clear commitment to sustainability, EcoMove is poised to lead the transition to eco-friendly transportation on a global scale.

With the launch of the H2Rover, EcoMove not only promises a future where vehicles no longer contribute to pollution but also inspires other companies to pursue innovation in green technology. As the world looks for solutions to environmental challenges, EcoMove's pioneering work serves as a beacon of hope and progress.

EcoMove has announced that the H2Rover will be available for pre-order starting June 2024, with the first deliveries expected by the end of the year. As the automotive world watches closely, EcoMove's journey may well redefine our relationship with vehicles and the environment for generations to come.
"""


summary = generate_summary(text, model, tokenizer)
print("Summary:")
print(summary)

Summary:
Aivia is a personal assistant designed to redefine the way individuals manage their daily tasks and interactions. Aivia is a companion designed to learn from and adapt to your personal preferences. Aivia is a companion designed to learn from and adapt to the user's mood and circumstances. Innovatech has unveiled its latest product: Aivia.


In [None]:
model_save_path = "../src/model/t5_model"
tokenizer_save_path = "../src/model/t5_tokenizer"

model.save_pretrained(model_save_path)
tokenizer.save_pretrained(tokenizer_save_path)