<a href="https://colab.research.google.com/github/JPP-J/deep-_learning_project/blob/main/DL_6_summarize_gen_text.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install relative libraries

In [None]:
pip install transformers



# Model usage in this notebook
* General Text Generation: GPT-2

* BART (Bidirectional and Auto-Regressive Transformers)

```
summarize: " + text
```

```
translate English to French:
```

# Generative task
Using GPT-2 model

In [None]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel, pipeline

# Load the GPT-2 tokenizer and model
model_name_g = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name_g)
model_g = GPT2LMHeadModel.from_pretrained(model_name_g)

# Function to summarize text
def generate_text(text, max_length=512, num_beams=5):
    """
    Summarize the input text using GPT-2.
    :param text: str, input text to summarize
    :param max_length: int, maximum length of the summary
    :param num_beams: int, number of beams for beam search
    :return: str, summarized text
    """
    # Encode input text
    inputs = tokenizer.encode(text, return_tensors="pt", truncation=True, max_length=1024)

    # Generate using beam search
    summary_ids = model_g.generate(
        inputs,
        max_length=max_length,
        num_beams=num_beams,
        early_stopping=True,
        no_repeat_ngram_size=2
    )

    # Decode generated tokens
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

## Model GPT-2 Details

In [None]:
model_g

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

## Example usage GPT-2

In [None]:
# Example usage GPT-2
input_text_g = 'write decription of house'
generated_text = generate_text(input_text_g, max_length=1024)
print("Original Text:\n", input_text_g)
print("\nGenerated Text:\n", generated_text)
print("input length: ", len(input_text_g))
print("out put length: ", len(generated_text))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Original Text:
 write decription of house

Generated Text:
 write decription of house.

The house was built in the late 19th century, and it is believed to have been built on the site of one of the most famous houses in London. The house is said to be the oldest house in Britain, dating back to the 17th Century. It is thought that it was used as a boarding house for the nobility, as well as being used by the royal family as an entertainment centre.


It is estimated that the house cost £1.5m to build, with a total cost of £2.3m.
input length:  25
out put length:  474


# Summarize task
Using bart model

    facebook/bart-large-cnn: Fine-tuned for summarization tasks using the CNN/DailyMail dataset.

    facebook/bart-large-mnli: Fine-tuned for natural language inference (NLI) tasks using the Multi-Genre Natural Language Inference (MNLI) dataset.

    facebook/bart-large-xsum: Fine-tuned for extreme summarization tasks using the XSum dataset.

    facebook/bart-large-squad2: Fine-tuned for question answering using the SQuAD 2.0 dataset.

In [None]:
from transformers import BartForConditionalGeneration, BartTokenizer

# Load the BART tokenizer and model
model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

# Function to summarize text
def summarize_text(text, max_length=130, min_length=30, num_beams=4):
    """
    Summarize the input text using BART.
    :param text: str, input text to summarize
    :param max_length: int, maximum length of the summary
    :param min_length: int, minimum length of the summary
    :param num_beams: int, number of beams for beam search
    :return: str, summarized text
    """
    inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = model.generate(
        inputs,
        max_length=max_length,
        min_length=min_length,
        num_beams=num_beams,
        early_stopping=True
    )
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)


## Model facebook/bart-large-cnn Details

In [None]:
model

BartForConditionalGeneration(
  (model): BartModel(
    (shared): BartScaledWordEmbedding(50264, 1024, padding_idx=1)
    (encoder): BartEncoder(
      (embed_tokens): BartScaledWordEmbedding(50264, 1024, padding_idx=1)
      (embed_positions): BartLearnedPositionalEmbedding(1026, 1024)
      (layers): ModuleList(
        (0-11): 12 x BartEncoderLayer(
          (self_attn): BartSdpaAttention(
            (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
          )
          (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (activation_fn): GELUActivation()
          (fc1): Linear(in_features=1024, out_features=4096, bias=True)
          (fc2): Linear(in_features=4096, out_features=1024, bias=True)
    

## Example usage bart model

In [None]:
input_text = (
  """Meta CEO Mark Zuckerberg has announced layoffs of what he refers to as "low-performers" at his empire.

According to a company-wide memo obtained by Bloomberg, the Facebook owner is cutting around five percent of its staff. And interestingly, the directive is already in tension with what Zuckerberg told podcaster Joe Rogan last week about how the company was looking to replace "midlevel engineers" with AI. Instead — in a likely concession to AI just not quite being up to snuff yet — he says employees "who aren't meeting expectations" will be replaced in order to "bring new people in" (emphasis on the "people," for any AI zealots.)

"I’ve decided to raise the bar on performance management and move out low-performers faster," he wrote in the message, adding that terminated employees would be provided with "generous severance."

Zuckerberg wrote that 2025 will be an "intense year" that will require the "strongest talent." But what exactly he means by that remains unclear as the billionaire makes sweeping changes to the company's operations.

The CEO appears to be taking yet another page out of the playbook of X-former-Twitter owner Elon Musk, who has long led his companies with an iron fist — demanding in 2022 that Twitter staff be "extremely hardcore" or risk immediate termination, for instance.

Race to the Bottom

Zuckerberg already raised eyebrows this month by giving up the pretense of serious content moderation on his sites. Earlier this month, he introduced new measures that would allow hate speech and misinformation to proliferate unchecked on the company's platforms, including Facebook, Instagram, and Threads.

The straightforward reading is that it was a thinly veiled attempt by Zuckerberg to get in the good graces of president-elect Donald Trump, who has formed a tight relationship with Musk and will be sworn in next week (Trump previously threatened to imprison Zuckerberg, which may also be weighing on the founder's mind.)

How exactly Meta's latest efforts to weed out "low-performers" fits into the ongoing groveling remains to be seen. It's not just Meta, either; tech companies across the board are looking to tighten up their operations. Microsoft is also targeting underperforming employees as part of major headcount reductions across the company.

During his chat with Rogan last week, Zuckerberg also whined that companies were being "culturally neutered" by purportedly distancing themselves from "masculine energy."

Could his latest attempt to push out "low-performers" be symptomatic of his deranged desire to inject some machismo into Meta? Judging by the company's willingness to throw out the rulebook and double down on Musk-inspired meritocracy, anything seems possible.
"""
)

In [None]:
# Example usage "facebook/bart-large-cnn"

summarized_text = summarize_text(input_text, max_length=1024)
print("Original Text:\n", input_text)
print("\nSummarized Text:\n", summarized_text)
print("input length: ", len(input_text))
print("out put length: ", len(summarized_text))


Original Text:
 Meta CEO Mark Zuckerberg has announced layoffs of what he refers to as "low-performers" at his empire.

According to a company-wide memo obtained by Bloomberg, the Facebook owner is cutting around five percent of its staff. And interestingly, the directive is already in tension with what Zuckerberg told podcaster Joe Rogan last week about how the company was looking to replace "midlevel engineers" with AI. Instead — in a likely concession to AI just not quite being up to snuff yet — he says employees "who aren't meeting expectations" will be replaced in order to "bring new people in" (emphasis on the "people," for any AI zealots.)

"I’ve decided to raise the bar on performance management and move out low-performers faster," he wrote in the message, adding that terminated employees would be provided with "generous severance."

Zuckerberg wrote that 2025 will be an "intense year" that will require the "strongest talent." But what exactly he means by that remains unclear a