#  Welcome to Transformers

### The pipeline

In [1]:
import transformers

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
from transformers import pipeline
clssifier = pipeline("sentiment-analysis")
res = clssifier("I want you to  summarize educatinooal content for me")

print(res)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


[{'label': 'POSITIVE', 'score': 0.9801173806190491}]


In [2]:
from transformers import pipeline

generator  = pipeline("text-generation", model="distilgpt2")

res= generator(
    "In a worled where education is inivitable as an enterpunoot you have to",
    max_length=100,
    num_return_sequences=2,
) 

print(res)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[{'generated_text': 'In a worled where education is inivitable as an enterpunoot you have to keep in place the great knowledge, knowledge that is not the quality, the perfect education, education that is good. It is a spiritual or moral need for teachers and teachers and teachers, a spiritual or moral need for teachers and teachers of all classes, at all levels of learning and instruction for everyone: to be able to live, live and work in full power, without any means of oppression. Without such knowledge'}, {'generated_text': 'In a worled where education is inivitable as an enterpunoot you have to start using something to achieve goals, goals, and achievements you can achieve from the basics to the technical knowledge you can achieve through the process. You can work with those qualities to become your success. This can be difficult, if not impossible, for both young and adults. Your family often uses the concept of having a child who does not have one but a single father. This would 

In [3]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

res = classifier(
    "I want to classify the following article into educational content where yu summarise books",
    ["education", "politics", "business"],
)
print(res)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


{'sequence': 'I want to classify the following article into educational content where yu summarise books', 'labels': ['education', 'business', 'politics'], 'scores': [0.9950029850006104, 0.003552044974640012, 0.001444946276023984]}


### Transformers A generic approach with more degrees of freedom

In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
model_name =  "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
res = classifier("I want you to  summarize educatinooal content for me")
print(res)


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


[{'label': 'POSITIVE', 'score': 0.9801173806190491}]


### A glimps of the Transformer tokenizer

In [3]:
sequence = "I want you to summarize this article"
res = tokenizer(sequence)
print(res)
tokens = tokenizer.tokenize(sequence)
print(tokens)
vocab = tokenizer.get_vocab()
print(len(vocab))
ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)
decoded_strings = tokenizer.decode(ids)
print(decoded_strings)

{'input_ids': [101, 1045, 2215, 2017, 2000, 7680, 7849, 4697, 2023, 3720, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
['i', 'want', 'you', 'to', 'sum', '##mar', '##ize', 'this', 'article']
30522
[1045, 2215, 2017, 2000, 7680, 7849, 4697, 2023, 3720]
i want you to summarize this article


### A practical guidee for the complete proocess

In [18]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
import torch
model_name =  "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

X_train =  ["I have been doing this for about 3 yyears now and it is time to take a  free time to reflect upon what i have done on those uears s so far",
            "Now  in htis age i think the best next step  is to reflect to geada head  inthe near  future as  it  will work as inflection point  that  can give you all the momentumthat  you need"]

res = classifier(X_train)
print(res)


batch = tokenizer(X_train, padding=True,truncation=True,max_length=512, return_tensors="pt")
print(batch.keys())

with torch.no_grad():
    outputs = model(**batch)
    logits = outputs.logits
    predicted_labels = torch.argmax(logits, dim=1)
    print(predicted_labels)
    predicted_scores = torch.softmax(logits, dim=1)
    print(predicted_scores)
    predicted_scores = predicted_scores.tolist()
    print(predicted_scores)
    for i, (sentence, label, score) in enumerate(zip(X_train, predicted_labels, predicted_scores)):
        print(f"Sentence: {sentence} - Label: {label} - Score: {score}")
    

[{'label': 'NEGATIVE', 'score': 0.964401125907898}, {'label': 'POSITIVE', 'score': 0.9213657975196838}]
dict_keys(['input_ids', 'attention_mask'])
tensor([0, 1])
tensor([[0.9644, 0.0356],
        [0.0786, 0.9214]])
[[0.964401125907898, 0.03559890761971474], [0.07863417267799377, 0.9213657975196838]]
Sentence: I have been doing this for about 3 yyears now and it is time to take a  free time to reflect upon what i have done on those uears s so far - Label: 0 - Score: [0.964401125907898, 0.03559890761971474]
Sentence: Now  in htis age i think the best next step  is to reflect to geada head  inthe near  future as  it  will work as inflection point  that  can give you all the momentumthat  you need - Label: 1 - Score: [0.07863417267799377, 0.9213657975196838]


In [None]:
data_directory = "Saved"
tokenizer.save_pretrained(data_directory)
model.save_pretrained(data_directory)

tok = AutoTokenizer.from_pretrained(data_directory)
mod = AutoModelForSequenceClassification(data_directory)

### Data Loading and Prepreparation

In [4]:
import os
os.chdir("../")
%pwd

'd:\\AI\\NLP\\HandsOn\\Text Summarization'

In [4]:
from datasets import load_dataset
data_directory = "artifacts\data_standardization\standardized_data.csv"

dataset = load_dataset("csv", data_files=data_directory)

In [5]:
dataset

DatasetDict({
    train: Dataset({
        features: ['description', 'tags', 'title', 'ratings', 'transcript', 'description_standardized', 'title_standardized', 'transcript_standardized'],
        num_rows: 2467
    })
})

In [6]:
dataset["train"][0]

{'description': 'Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.',
 'tags': "['children', 'creativity', 'culture', 'dance', 'education', 'parenting', 'teaching']",
 'title': 'Do schools kill creativity?',
 'ratings': "[{'id': 7, 'name': 'Funny', 'count': 19645}, {'id': 1, 'name': 'Beautiful', 'count': 4573}, {'id': 9, 'name': 'Ingenious', 'count': 6073}, {'id': 3, 'name': 'Courageous', 'count': 3253}, {'id': 11, 'name': 'Longwinded', 'count': 387}, {'id': 2, 'name': 'Confusing', 'count': 242}, {'id': 8, 'name': 'Informative', 'count': 7346}, {'id': 22, 'name': 'Fascinating', 'count': 10581}, {'id': 21, 'name': 'Unconvincing', 'count': 300}, {'id': 24, 'name': 'Persuasive', 'count': 10704}, {'id': 23, 'name': 'Jaw-dropping', 'count': 4439}, {'id': 25, 'name': 'OK', 'count': 1174}, {'id': 26, 'name': 'Obnoxious', 'count': 209}, {'id': 10, 'name': 'Inspiring', 'count': 24924}]",
 'transcr

In [7]:
# Split the dataset into a 80/10/10 train/validation/test split
train_test_split = dataset["train"].train_test_split(test_size=0.2)

# Extract the training and testing sets
train_val_dataset = train_test_split["train"]
test_dataset = train_test_split["test"]

# Further split the training and validation set into 90/10
train_val_split = train_val_dataset.train_test_split(test_size=0.1)

# Extract the final training, validation, and test sets
train_dataset = train_val_split["train"]
validation_dataset = train_val_split["test"]

print(f"Training set size: {len(train_dataset)}")
print(f"Validation set size: {len(validation_dataset)}")
print(f"Test set size: {len(test_dataset)}")

Training set size: 1775
Validation set size: 198
Test set size: 494


In [8]:
from datasets import DatasetDict

dataset = DatasetDict({
    "train": train_dataset,
    "validation": validation_dataset,
    "test": test_dataset
})

train_data = dataset["train"]
val_data = dataset["validation"]
test_data = dataset["test"]

In [9]:
dataset["validation"][0]

{'description': 'In a lively show, mathemagician Arthur Benjamin races a team of calculators to figure out 3-digit squares, solves another massive mental equation and guesses a few birthdays. How does he do it? He’ll tell you.',
 'tags': "['education', 'entertainment', 'magic', 'math', 'performance']",
 'title': 'A performance of "Mathemagic"',
 'ratings': "[{'id': 22, 'name': 'Fascinating', 'count': 3710}, {'id': 9, 'name': 'Ingenious', 'count': 1944}, {'id': 8, 'name': 'Informative', 'count': 340}, {'id': 10, 'name': 'Inspiring', 'count': 817}, {'id': 1, 'name': 'Beautiful', 'count': 429}, {'id': 7, 'name': 'Funny', 'count': 2152}, {'id': 2, 'name': 'Confusing', 'count': 168}, {'id': 21, 'name': 'Unconvincing', 'count': 84}, {'id': 11, 'name': 'Longwinded', 'count': 114}, {'id': 3, 'name': 'Courageous', 'count': 182}, {'id': 23, 'name': 'Jaw-dropping', 'count': 7196}, {'id': 25, 'name': 'OK', 'count': 314}, {'id': 24, 'name': 'Persuasive', 'count': 128}, {'id': 26, 'name': 'Obnoxious

In [12]:
# Access the feature names 
feature_names = list(dataset["train"].features.keys()) 
feature_names

['description',
 'tags',
 'title',
 'ratings',
 'transcript',
 'description_standardized',
 'title_standardized',
 'transcript_standardized']

##### Putting it all togother

In [7]:
from datasets import DatasetDict, load_dataset
from pathlib import Path
data_directory = "artifacts\data_standardization\standardized_data.csv"

def load_data_into_DatasetDict(data_directory: Path, dataset_type: str = "csv") -> DatasetDict:    
    dataset = load_dataset(dataset_type, data_files=data_directory)
    # Split the dataset into a 80/10/10 train/validation/test split
    train_test_split = dataset["train"].train_test_split(test_size=0.2)

    # Extract the training and testing sets
    train_val_dataset = train_test_split["train"]
    test_dataset = train_test_split["test"]

    # Further split the training and validation set into 90/10
    train_val_split = train_val_dataset.train_test_split(test_size=0.1)

    # Extract the final training, validation, and test sets
    train_dataset = train_val_split["train"]
    validation_dataset = train_val_split["test"]
    
    dataset = DatasetDict({
    "train": train_dataset,
    "validation": validation_dataset,
    "test": test_dataset
    })  
    
    return dataset

In [8]:
dataset = load_data_into_DatasetDict(data_directory=data_directory)
dataset

DatasetDict({
    train: Dataset({
        features: ['description', 'tags', 'title', 'ratings', 'transcript', 'description_standardized', 'title_standardized', 'transcript_standardized'],
        num_rows: 1775
    })
    validation: Dataset({
        features: ['description', 'tags', 'title', 'ratings', 'transcript', 'description_standardized', 'title_standardized', 'transcript_standardized'],
        num_rows: 198
    })
    test: Dataset({
        features: ['description', 'tags', 'title', 'ratings', 'transcript', 'description_standardized', 'title_standardized', 'transcript_standardized'],
        num_rows: 494
    })
})

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Single sentence
single_sentence = "This is a single sentence."
inputs = tokenizer(single_sentence, padding=True, truncation=True, return_tensors="pt")
print(inputs)

# Batch of sentences
batch_sentences = [
    "This is the first sentence.",
    "This is a slightly longer second sentence.",
    "Short sentence.",
]
batch_inputs = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="pt")
print(batch_inputs)

----------------------------------------------------------------

### Fine-Tuning a pretrained model

In [4]:
import pandas as pd
import numpy as np
from transformers import AutoTokenizer ,AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer
from transformers import pipeline
from datasets import Dataset
from sklearn.model_selection import train_test_split
import torch
from tqdm import tqdm
import evaluate
import nltk
from datasets import DatasetDict, load_dataset
from pathlib import Path
from transformers import DataCollatorForSeq2Seq

  from .autonotebook import tqdm as notebook_tqdm


In [22]:
class Summarizer:
    def __init__(self, checkpoint="google-t5/t5-small", max_length=1024, min_length=40):
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.checkpoint = checkpoint
        self.max_length = max_length
        self.min_length = min_length
        self.prefix = "summarize: "
        self.tokenizer = AutoTokenizer.from_pretrained(checkpoint)
        self.model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(self.device)
        self.data_collator = DataCollatorForSeq2Seq(tokenizer=self.tokenizer , model= self.model ,padding=True, max_length= self.max_length)
        self.training_args = Seq2SeqTrainingArguments(
            output_dir="artifacts\models\Summarizer_model_artifacts",
            eval_strategy="epoch",
            learning_rate=2e-5,
            per_device_train_batch_size=16,
            per_device_eval_batch_size=16,
            weight_decay=0.01,
            save_total_limit=3,
            num_train_epochs=4,
            predict_with_generate=True,
            fp16=True,
            push_to_hub=False,
        )

        
    def load_data_into_DatasetDict(self, data_directory: Path, dataset_type: str = "csv") -> DatasetDict:    
        dataset = load_dataset(dataset_type, data_files=data_directory)
        
        # Ensure required columns exist
        required_features = ['transcript_standardized', 'description_standardized', 'title_standardized']
        feature_names = list(dataset["train"].features.keys()) 
        assert all(col in feature_names for col in required_features), f"Missing required columns: {required_features}"     

        # Remove unnecessary colummn 
        dataset = dataset.remove_columns(['description', 'tags', 'title', 'ratings', 'transcript'])           
        
        # Standerizing colummn names
        dataset  = dataset.rename_column("transcript_standardized", "text")
        dataset  = dataset.rename_column("description_standardized", "summary")
        dataset  = dataset.rename_column("title_standardized", "title")

        
        
        # Split the dataset into a 80/10/10 train/validation/test split
        train_test_split = dataset["train"].train_test_split(test_size=0.2)

        # Extract the training and testing sets
        train_val_dataset = train_test_split["train"]
        test_dataset = train_test_split["test"]

        # Further split the training and validation set into 90/10
        train_val_split = train_val_dataset.train_test_split(test_size=0.1)

        # Extract the final training, validation, and test sets
        train_dataset = train_val_split["train"]
        validation_dataset = train_val_split["test"]
        
        dataset = DatasetDict({
        "train": train_dataset,
        "validation": validation_dataset,
        "test": test_dataset
        })  
        
        return dataset
    
    def preprocess_function(self, dataset):
        if isinstance(dataset, DatasetDict):
            for split in dataset:
                dataset[split] = dataset[split].map(self._preprocess_single_split)
            return dataset
        else:
            return self._preprocess_single_split(dataset)

    def _preprocess_single_split(self, batch):
        if "text" not in batch or "summary" not in batch:
            raise KeyError(f"Keys 'text' or 'summary' not found. Available keys: {list(batch.keys())}")
        
        inputs = [self.prefix + doc for doc in batch["text"]]
        model_inputs = self.tokenizer(inputs, padding=True, max_length=1024, truncation=True)
        
        labels = self.tokenizer(text_target=batch["summary"], padding=True, max_length=128, truncation=True)
        model_inputs["labels"] = labels["input_ids"]
        return model_inputs


    def tokenize_dataset(self,dataset):
        tokenized_dataset = dataset.map(self.preprocess_function, batched=True)
        return tokenized_dataset
    
    

    def compute_metrics(self, eval_pred):
        rouge = evaluate.load("rouge")
        predictions, labels = eval_pred
        decoded_preds = self.tokenizer.batch_decode(predictions, skip_special_tokens=True)
        labels = np.where(labels != -100, labels, self.tokenizer.pad_token_id)
        decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)
        result = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
        prediction_lens = [np.count_nonzero(pred != self.tokenizer.pad_token_id) for pred in predictions]
        result["gen_len"] = np.mean(prediction_lens)
        return {k: round(v, 4) for k, v in result.items()}

    def model_trainer(self,dataset):
        trainer = Seq2SeqTrainer(
            model=self.model,
            args=self.training_args,
            train_dataset=dataset["train"],
            eval_dataset=dataset["test"],
            tokenizer=self.tokenizer,
            data_collator=self.data_collator,
            compute_metrics=self.compute_metrics,
        )
        trainer.train()
        return trainer

    def predict(self, text: str,) -> str:
        """Generate summary for a given text using the trained model"""
        inputs = self.tokenizer(
            self.prefix + text,
            max_length=self.max_length or self.max_length,
            truncation=True,
            return_tensors="pt"
        ).to(self.device)
        
        summary_ids = self.model.generate(
            inputs["input_ids"],
            max_length=self.max_length,
            min_length=self.min_length,
            num_beams=4,
            length_penalty=2.0,
            early_stopping=True
        )
        
        return self.tokenizer.decode(summary_ids[0], skip_special_tokens=True)

In [6]:
import os
os.chdir(r"D:\AI\NLP\HandsOn\Text Summarization")
%pwd

'D:\\AI\\NLP\\HandsOn\\Text Summarization'

In [23]:
summarizer = Summarizer()

In [8]:
dataset = summarizer.load_data_into_DatasetDict(data_directory = "artifacts\data_standardization\standardized_data.csv")
model_inputs = summarizer.preprocess_function(dataset)

Map: 100%|██████████| 1775/1775 [03:58<00:00,  7.43 examples/s]
Map: 100%|██████████| 198/198 [00:27<00:00,  7.27 examples/s]
Map: 100%|██████████| 494/494 [01:07<00:00,  7.27 examples/s]


In [15]:
tokenized_dataset = summarizer.tokenize_dataset(model_inputs)

Map: 100%|██████████| 20/20 [00:00<00:00, 282.99 examples/s]
Map: 100%|██████████| 20/20 [00:00<00:00, 338.78 examples/s]
Map: 100%|██████████| 20/20 [00:00<00:00, 300.08 examples/s]


In [11]:
def data_trainer_sampeler(dataset: Dataset, sample_size: int = 20):
    sampled_dataset_random = dataset
    for split in dataset:
        sampled_dataset_random[split] = dataset[split].shuffle(seed=42).select(range(sample_size)) 

    return sampled_dataset_random

In [16]:
sample = data_trainer_sampeler(tokenized_dataset)
sample

DatasetDict({
    train: Dataset({
        features: ['summary', 'title', 'text', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 20
    })
    validation: Dataset({
        features: ['summary', 'title', 'text', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 20
    })
    test: Dataset({
        features: ['summary', 'title', 'text', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 20
    })
})

----------

In [20]:
trainer = summarizer.model_trainer(sample)

  trainer = Seq2SeqTrainer(
  0%|          | 0/8 [03:25<?, ?it/s]
 25%|██▌       | 2/8 [00:29<01:19, 13.18s/it]

[A[A

                                             

[A[A                                       
[A                                          


 25%|██▌       | 2/8 [01:30<01:19, 13.18s/it]
[A

[A[A

[A[A

{'eval_loss': 10.442031860351562, 'eval_rouge1': 0.0734, 'eval_rouge2': 0.0174, 'eval_rougeL': 0.0617, 'eval_rougeLsum': 0.0609, 'eval_gen_len': 20.0, 'eval_runtime': 61.2928, 'eval_samples_per_second': 0.326, 'eval_steps_per_second': 0.033, 'epoch': 1.0}


 50%|█████     | 4/8 [02:03<02:04, 31.03s/it]

[A[A

                                             

[A[A                                       
[A                                          


 50%|█████     | 4/8 [02:10<02:04, 31.03s/it]
[A

[A[A

[A[A

{'eval_loss': 10.170769691467285, 'eval_rouge1': 0.0689, 'eval_rouge2': 0.0174, 'eval_rougeL': 0.0604, 'eval_rougeLsum': 0.0596, 'eval_gen_len': 20.0, 'eval_runtime': 6.9208, 'eval_samples_per_second': 2.89, 'eval_steps_per_second': 0.289, 'epoch': 2.0}


 75%|███████▌  | 6/8 [02:31<00:40, 20.32s/it]

[A[A

                                             

[A[A                                       
[A                                          


 75%|███████▌  | 6/8 [03:08<00:40, 20.32s/it]
[A

[A[A

[A[A

{'eval_loss': 9.887848854064941, 'eval_rouge1': 0.0689, 'eval_rouge2': 0.0174, 'eval_rougeL': 0.0604, 'eval_rougeLsum': 0.0596, 'eval_gen_len': 20.0, 'eval_runtime': 36.5351, 'eval_samples_per_second': 0.547, 'eval_steps_per_second': 0.055, 'epoch': 3.0}




[A[A

                                             

[A[A                                       
[A                                          


100%|██████████| 8/8 [04:23<00:00, 23.37s/it]
[A

[A[A

                                             

[A[A                                       
100%|██████████| 8/8 [04:23<00:00, 23.37s/it]
100%|██████████| 8/8 [04:23<00:00, 32.94s/it]

{'eval_loss': 9.818323135375977, 'eval_rouge1': 0.0649, 'eval_rouge2': 0.0174, 'eval_rougeL': 0.0563, 'eval_rougeLsum': 0.0554, 'eval_gen_len': 20.0, 'eval_runtime': 37.2787, 'eval_samples_per_second': 0.536, 'eval_steps_per_second': 0.054, 'epoch': 4.0}
{'train_runtime': 263.5021, 'train_samples_per_second': 0.304, 'train_steps_per_second': 0.03, 'train_loss': 10.653341293334961, 'epoch': 4.0}





AttributeError: 'Summarizer' object has no attribute 'get_summary'

In [27]:
test = "Imagine a world overwhelmed by information, where sifting through endless articles, reports, and data consumes valuable time and energy. AI summarization offers a powerful solution, employing natural language processing to condense lengthy texts into concise summaries. These systems utilize two primary approaches: extractive summarization, selecting key sentences directly from the source, and abstractive summarization, generating new, more human-like summaries that capture the core meaning. This technology finds diverse applications, from streamlining news consumption and accelerating research analysis to optimizing business workflows by summarizing meetings and customer feedback. While challenges remain in handling complex language, nuanced meanings, and ensuring factual accuracy, ongoing research continually improves the fluency, coherence, and contextual understanding of AI-generated summaries. Ultimately, AI summarization promises to revolutionize how we process information, empowering us to access key insights quickly and efficiently, unlocking the potential of knowledge for a more informed future."
summary = summarizer.predict(test)

print(summary)

AI summarization uses natural language processing to condense lengthy texts into concise summaries. this technology finds diverse applications, from streamlining news consumption to optimizing business workflows.


: 