## https://blog.devgenius.io/transformers-for-text-summarization-a-step-by-step-tutorial-in-python-9d8e2c74233e
### pip install transformers
### pip install torch 
### pip install sentencepiece

## Hugging Face provides a wide range of pre-trained models, including BERT, GPT-2, and T5.

In [1]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [2]:
input_text = """
Machine learning is a branch of artificial intelligence that allows computers to learn and improve from experience without being explicitly programmed. It is the process of using algorithms and statistical models to analyze and draw insights from large amounts of data, and then use those insights to make predictions or decisions. Machine learning has become increasingly popular in recent years, as the amount of available data has grown and computing power has increased. There are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, the algorithm is given a labeled dataset and learns to make predictions based on that data. In unsupervised learning, the algorithm is given an unlabeled dataset and must find patterns and relationships within the data on its own. In reinforcement learning, the algorithm learns by trial and error, receiving feedback in the form of rewards or punishments for certain actions. Machine learning is used in a wide range of applications, including image recognition, natural language processing, autonomous vehicles, fraud detection, and recommendation systems. As the technology continues to improve, it is likely that machine learning will become even more prevalent in our daily lives.
"""

### Tokenization 
##### We will be using the T5 tokenizer that we loaded earlier. encode() method:

##### return_tensors='pt': This tells the method to return a PyTorch tensor instead of a list of integers.
#####  max_length=512: This sets the maximum length of the input text to 512 tokens.
#####  truncation=True: This tells the tokenizer to truncate the input text if it exceeds the maximum length.

In [3]:
inputs = tokenizer.encode(input_text, return_tensors='pt', max_length=512, truncation=True)

In [4]:
summary_ids = model.generate(inputs,
                              max_length=150,
                              min_length=40,
                              length_penalty=2.0,
                              num_beams=4,
                              early_stopping=True)

#### The generate() method returns a tensor representation of the generated summary, which we can convert back to text using the decode() method of the tokenizer object.
#### skip_special_tokens=True to remove any special tokens from the generated summary

In [5]:
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

In [6]:
summary

', and reinforcement learning. In reinforcement learning, the algorithm learns by trial and error, receiving feedback in form of rewards or punishments for certain actions. Machine learning is used in a wide range of applications, including image recognition, natural language processing, autonomous vehicles, fraud detection, and recommendation systems.'