<a href="https://colab.research.google.com/github/Mohammadhsiavash/DeepL-Training/blob/main/Unsupervised%2BSemi-Supervised/Text_Summarizer_with_T5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Use the pretrained T5 model to summarize large chunks of text by framing
summarizaon as a text-to-text task.


 Load or Input Long Text

In [3]:
text = '''
Introduction to Deep Learning

Licensed by Google
Deep learning is a subfield of machine learning that focuses on algorithms inspired by the structure and function of the human brain, known as artificial neural networks. The "deep" in deep learning refers to the use of multiple layers in these networks. Unlike traditional machine learning, which often requires manual feature engineering, deep learning models can automatically learn hierarchical representations of data. This allows them to identify increasingly complex patterns and relationships, leading to breakthroughs in fields like computer vision, natural language processing, and speech recognition.




Key Concepts and Principles
Neural Networks
At its core, deep learning is built on the concept of neural networks, which are composed of interconnected nodes or "neurons." These networks are organized into layers: an input layer to receive data, one or more hidden layers to process the data, and an output layer to produce the final result. Each connection between neurons has a weight and a bias associated with it. The network "learns" by adjusting these weights and biases to minimize the difference between its predictions and the actual values.




Backpropagation and Gradient Descent
The learning process in deep neural networks is primarily driven by two key algorithms:

Forward Propagation: The process of data flowing from the input layer, through the hidden layers, to the output layer to generate a prediction.

Backpropagation: An algorithm that calculates the "error" or "loss" of the network's prediction. It then propagates this error backward through the network, layer by layer, to determine how much each weight and bias contributed to the error.

Gradient Descent: An optimization algorithm that uses the error information from backpropagation to iteratively adjust the weights and biases in the direction that will reduce the error. This is analogous to a person walking downhill and taking small steps to find the lowest point in a valley.

Activation Functions
Activation functions introduce non-linearity into the network. Without them, a neural network would only be able to learn linear relationships, severely limiting its capability. Common activation functions include ReLU, sigmoid, and tanh, each serving a specific purpose in different network architectures.


Major Deep Learning Architectures
Deep learning has given rise to several powerful architectures, each designed to excel at specific types of tasks.

Convolutional Neural Networks (CNNs): Primarily used for image and video analysis. CNNs use a "convolutional" layer to automatically extract features like edges and textures from images, followed by "pooling" layers to downsample the data and reduce complexity. This makes them highly effective for tasks such as image classification, object detection, and facial recognition.


Recurrent Neural Networks (RNNs): Built to handle sequential data like text, speech, and time-series data. RNNs have a feedback loop that allows information from previous steps in a sequence to influence the current step. This "memory" makes them suitable for tasks like natural language processing, language translation, and speech recognition. A significant advancement in this area is the Long Short-Term Memory (LSTM) network, a type of RNN designed to overcome the vanishing gradient problem, enabling them to learn long-term dependencies in data.




Transformers: A more recent and highly influential architecture, especially in natural language processing (NLP). Transformers use a mechanism called self-attention, which allows the model to weigh the importance of different words in a sentence relative to each other, regardless of their position. This parallel processing capability makes them more efficient and effective than RNNs for many sequence-to-sequence tasks. They form the foundation of modern large language models (LLMs) like GPT and BERT.



Generative Adversarial Networks (GANs): Consist of two competing neural networks: a generator and a discriminator. The generator creates new data (e.g., images), and the discriminator tries to distinguish between the real data and the data created by the generator. Through this adversarial process, GANs can produce highly realistic synthetic data, with applications in image generation, video creation, and data augmentation.



Applications of Deep Learning
Deep learning has revolutionized a wide range of industries and applications.

Computer Vision: Self-driving cars rely on CNNs to detect pedestrians, other vehicles, and road signs. Medical imaging uses deep learning for automated tumor detection and disease diagnosis.


Natural Language Processing (NLP): Deep learning powers virtual assistants like Siri and Alexa, provides real-time language translation, and enables sentiment analysis for understanding customer feedback.

Speech Recognition: Deep learning models convert spoken language to text with high accuracy, a technology used in voice dictation, phone assistants, and automated customer service.

Recommender Systems: Platforms like Netflix, Amazon, and YouTube use deep learning to analyze user behavior and recommend personalized content, products, and videos.

Robotics: Deep reinforcement learning allows robots to learn how to perform complex tasks, such as grasping objects and navigating in dynamic environments, through trial and error.

Drug Discovery: Deep learning algorithms can analyze vast biological datasets to predict potential drug targets and accelerate the drug development process.

Future Trends and Challenges
The field of deep learning continues to evolve rapidly, with several key trends shaping its future.

Democratization of AI: Tools and frameworks are becoming more accessible, allowing more people to build and deploy deep learning models without extensive expertise.

Explainable AI (XAI): As deep learning models are used for high-stakes decisions in fields like finance and medicine, there's a growing demand to understand "why" a model made a particular decision. XAI aims to make these complex models more transparent and interpretable.

Edge AI and Federated Learning: Running AI models directly on devices like smartphones and IoT devices (Edge AI) reduces latency and enhances privacy. Federated learning takes this a step further by training a shared model across multiple devices without exchanging the raw data, thereby preserving privacy.


Multimodal AI: Future models will likely be able to process and generate data across multiple modalities simultaneously—for example, understanding and generating text, images, and audio in a single model.

Ethical AI: With the increasing power of deep learning, there's a critical focus on ensuring models are fair, unbiased, and used responsibly. Research is ongoing to address issues like algorithmic bias and data privacy.
'''

Load the T5 Model and Tokenizer

In [1]:
from transformers import T5Tokenizer, T5ForConditionalGeneration
# Load the T5 tokenizer and model
tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Prepare the Input Text

In [4]:
# Prefix for summarization task
input_text = "summarize: " + text
# Tokenize the input
input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True)

Generate the Summary

In [5]:
# Generate output summary
summary_ids = model.generate(input_ids, max_length=100, min_length=20, length_penalty=2.0, num_beams=4, early_stopping=True)
# Decode the summary
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Summary:\n", summary)

Summary:
 deep learning is a subfield of machine learning that focuses on algorithms inspired by the structure and function of the human brain. deep learning models can automatically learn hierarchical representations of data. deep learning is built on the concept of neural networks, which are composed of interconnected nodes or "neurons"
