## https://blog.devgenius.io/transformers-for-text-summarization-a-step-by-step-tutorial-in-python-9d8e2c74233e
### pip install transformers
### pip install torch 
### pip install sentencepiece

## Hugging Face provides a wide range of pre-trained models, including BERT, GPT-2, and T5.

In [1]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [2]:
input_text = """
Machine learning is a branch of artificial intelligence that allows computers to learn and improve from experience without being explicitly programmed. It is the process of using algorithms and statistical models to analyze and draw insights from large amounts of data, and then use those insights to make predictions or decisions. Machine learning has become increasingly popular in recent years, as the amount of available data has grown and computing power has increased. There are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, the algorithm is given a labeled dataset and learns to make predictions based on that data. In unsupervised learning, the algorithm is given an unlabeled dataset and must find patterns and relationships within the data on its own. In reinforcement learning, the algorithm learns by trial and error, receiving feedback in the form of rewards or punishments for certain actions. Machine learning is used in a wide range of applications, including image recognition, natural language processing, autonomous vehicles, fraud detection, and recommendation systems. As the technology continues to improve, it is likely that machine learning will become even more prevalent in our daily lives.
"""

### Tokenization 
##### We will be using the T5 tokenizer that we loaded earlier. encode() method:

##### return_tensors='pt': This tells the method to return a PyTorch tensor instead of a list of integers.
#####  max_length=512: This sets the maximum length of the input text to 512 tokens.
#####  truncation=True: This tells the tokenizer to truncate the input text if it exceeds the maximum length.

In [3]:
inputs = tokenizer.encode(input_text, return_tensors='pt', max_length=512, truncation=True)

In [4]:
summary_ids = model.generate(inputs,
                              max_length=150,
                              min_length=40,
                              length_penalty=2.0,
                              num_beams=4,
                              early_stopping=True)

#### The generate() method returns a tensor representation of the generated summary, which we can convert back to text using the decode() method of the tokenizer object.
#### skip_special_tokens=True to remove any special tokens from the generated summary

In [5]:
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

In [6]:
summary

', and reinforcement learning. In reinforcement learning, the algorithm learns by trial and error, receiving feedback in form of rewards or punishments for certain actions. Machine learning is used in a wide range of applications, including image recognition, natural language processing, autonomous vehicles, fraud detection, and recommendation systems.'

In [3]:
article = """Like so many problem-solving experiments conducted over the years, the answer depends not on 
 intelligence or skill but perspective, feedback, and circumstance. Small manipulations, such as offering money to 
 participants for generating more anagrams or revealing the average number of anagrams other participants completed, 
 can drastically sway results. 
 
 In recent years, social psychologist Heidi Grant Halvorson, the associate director of the Motivation Science Centre 
 at Columbia Business School, has used the anagram puzzle to study how people focus. Across dozens of experiments and 
 articles, Halvorson has revealed that we approach problems in one of two ways. “If you are promotion-focused, 
 you want to advance and avoid missed opportunities. If you are prevention-focused, you want to minimize losses and 
 keep things working." Which one is more effective? 
 
 Halvorson teamed with Jens Forster and Lorraine Chen Idson to find out. In an experiment conduct several years ago, 
 they gathered 109 participants and divided them into two groups. Those in the promotion condition received four 
 dollars and the chance to earn an extra dollar if they scored above the 70th percentile on the anagram task. Their 
 peers in the prevention condition received five dollars, but if their performance dropped below the 70th percentile, 
 they risked losing a dollar. 
 
 On paper, each condition was the same: every participant would receive at least four dollars. The difference is how 
 Halvorson and her team framed the experiment. In the promotion condition, success meant gaining a dollar; in the 
 prevention condition, it meant avoiding losing a dollar. 
 
 This subtle manipulation had a big impact. Halfway through the experiment, Halvorson told every participant that they 
 were performing either above or below the 70th percentile (the researchers randomly assigned the feedback). The 
 participants in the promotion-focused group took positive feedback well--it boosted their expectations and 
 motivation--while those in the prevention-focused group did not. Their motivation decreased. When the news 
 was bad, the responses flipped. In promotion-focused group, expectations of success and motivation went down. 
 Expectations in the prevention-focused groups dropped dramatically but motivation surged. 
 
 This finding suggests that when we focus on gaining something, positive feedback helps us persist until we complete a 
 problem. If, on the other hand, we dwell on the possibility of failure, negative feedback can also stimulate 
 motivation and boost performance. We're more willing to stick with it when we think there's something to lose. 
 
 "Aren't we supposed to banish negative thoughts if we want to succeed?" Halvorson asks in her book Focus: Use 
 Different Ways of Seeing the World for Success and Influence (co-authored with E. Tory Higgins). 
 
 "Not if you are prevention-focused or are pursuing a prevention-focused goal. Because if you are, optimism not only 
 feels wrong--it will disrupt and dampen your motivation. If you are sure that everything is going to work out 
 for you, then why would you go out of your way to avoid mistakes or to plan your way around obstacles or two come up 
 with plan B?
"""

In [2]:
from transformers import T5Tokenizer, T5Config, T5ForConditionalGeneration,GPT2Tokenizer, GPT2LMHeadModel,BartForConditionalGeneration,BartTokenizer,BartConfig

In [None]:
def sumT5(aritcle):
    # Loading the model and tokenizer for t5-small
    my_model = T5ForConditionalGeneration.from_pretrained('t5-small')
    tokenizer = T5Tokenizer.from_pretrained('t5-small')

    text = "summarize:" + article

    # encoding the input text
    input_ids = tokenizer.encode(text, return_tensors='pt', max_length=512, truncation=True)
    summary_ids = my_model.generate(input_ids)

    # Decoding and printing the summary
    t5_summary = tokenizer.decode(summary_ids[0])
    print("T-5 Summary: "+t5_summary)

def sumbart(aritcle):
    # Loading the model and tokenizer for bart-large-cnn
    tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
    model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

    # Encoding the inputs and passing them to model.generate()
    inputs = tokenizer.batch_encode_plus([article], return_tensors='pt')
    summary_ids = model.generate(inputs['input_ids'])

    # Decoding and printing the summary
    bart_summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    print("BART Summary: "+bart_summary)

#### https://www.analyticsvidhya.com/blog/2022/01/youtube-summariser-mini-nlp-project/?utm_source=related_WP&utm_medium=https://www.analyticsvidhya.com/blog/2023/07/exploring-gpt-2-and-xlnet-transformers/

In [None]:
import transformers
from transformers import BartTokenizer, BartForConditionalGeneration

tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

input_tensor = tokenizer.encode( subtitle, return_tensors="pt", max_length=512)

outputs_tensor = model.generate(input_tensor, max_length=160, min_length=120, length_penalty=2.0, num_beams=4, early_stopping=True)

In [None]:
print(tokenizer.decode(outputs_tensor[0]))