In [None]:
%pip install torch numpy transformers datasets huggingface

### Importing the Required libraries

Hugginface key is required 

[Huggingface](https://huggingface.co/)

use the link to signup and create a api key free of cost


In [None]:
from huggingface_hub import login
import torch
from transformers import BartForConditionalGeneration, BartTokenizer,BertTokenizer,BertModel
from datasets import load_dataset
huggingfacekey = "hugging-face api key"
login(token=huggingfacekey)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


## Random Index Generation and Dataset Loading Example

### Random Index Generation:
    - Uses np.random.randint() to generate random integers.
    - Generates n = 10 random integers between 0 and 30000.
### CNN/DailyMail Dataset:
    - Loaded using load_dataset("cnn_dailymail", "3.0.0").
    - Contains articles and summaries used for NLP tasks such as summarization.


In [None]:
import numpy as np
n = 10
randindex = np.random.randint(30000,size=10)
print(randindex)
dataset = load_dataset("cnn_dailymail", "3.0.0")
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 287113
    })
    validation: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 13368
    })
    test: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 11490
    })
})


##### Selecting Random Articles , BART Model

Random indices are generated, and the corresponding articles from the dataset are selected.


The BART (Bidirectional and Auto-Regressive Transformer) model is loaded using Hugging Face.
This model is used for text summarization.

##### Hugging Face Pretrained Models:

The "facebook/bart-large-cnn" model is used for summarizing news articles.

In [None]:
articles = dataset['train'][randindex]['article']
# Load the BART model and tokenizer
model_name = "facebook/bart-large-cnn"

tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

**Purpose:** This code defines a function summarize_text() that takes a text input and generates a summary using a pre-trained transformer model like BART.

**Inputs:** The function accepts a text string and processes it with the tokenizer to convert it into a format suitable for the model. The input text is truncated to a maximum length of 1024 tokens to fit within the model's constraints.

**Model Summarization:** It uses the model’s generate() method to create a summary with specific parameters like max_length, min_length, and beam search (num_beams=4) for optimal summary quality. Early stopping is applied to halt the generation process once a suitable summary is reached.

**Output:** The function decodes the generated token IDs into a readable text summary and returns it, excluding any special tokens.

In [None]:
# Function to summarize text
def summarize_text(text):
    inputs = tokenizer(text, max_length=1024, return_tensors="pt", truncation=True)
    summary_ids = model.generate(inputs["input_ids"], max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

**Purpose:** This loop processes a list of articles, applies the summarize_text() function to each article, and prints both the original article and its summary.

**Iteration over Articles:** The code iterates through the articles list using enumerate() to keep track of both the index (i) and the article content.

**Summarization Process:** For each article, the function summarize_text(article) is called, generating a summary based on the pre-trained model (such as BART).

**Output:** For each article, the code prints:
The original article (Original Article {i+1}).
The generated summary (Summary {i+1}).
A separator line ("-" * 80) for readability between articles.

In [None]:
for i , article in enumerate(articles):
    summary = summarize_text(article)
    print(f"Original Article {i+1}:\n{article}\n")
    print(f"Summary {i+1}:\n{summary}\n")
    print("-" * 80)


Original Article 1:

Summary 1:
NEW: Rep. Barney Frank, one of the first openly gay House members, is seen wiping his eyes. President Barack Obama will sign the repeal bill on Wednesday. The "don't ask, don't tell" policy banned openly gay and lesbian soldiers from military service. More than 14,000 military members have been discharged because of it.

--------------------------------------------------------------------------------
Original Article 2:
(CNN) -- Will Bunch's CNN.com tirade earlier this week against television host Glenn Beck and David Barton -- the founder and president of WallBuilders, a national pro-family organization that emphasizes history's "moral, religious and constitutional heritage" -- for allegedly creating "pseudo history" reveals more about Mr. Bunch than it does about what Mr. Beck and Mr. Barton are presenting. Mr. Bunch seems, above all, to be annoyed that many people are no longer staying on the liberal plantation of secularized American history. He offe