# **Hugging Face Summarization**

##### **Models**
###### **1. facebook/bart-large-cnn (Short Summary)**
###### **2. sshleifer/distilbart-cnn-12-6 (long Summary)**
###### **3. bart-large-cnn-samsum (Alt Long Summary)**

In [1]:
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM, BertTokenizer, BertModel
from langchain.text_splitter import RecursiveCharacterTextSplitter
import PyPDF2
import torch
import nltk
from nltk.tokenize import sent_tokenize
nltk.download('punkt')
import warnings 
warnings.filterwarnings("ignore", category=FutureWarning)

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\siddu\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


**Function To Return Content Chunks From Text And PDF File**

In [2]:
with open("Extract_Content.txt",'r') as fh:
    x = fh.read()
    words = x.split(" ")
    print(len(words))



600


In [4]:
#Main
def main_chunk():
    max_chunk_length=None

    with open("Extract_Content.txt","r",encoding="utf-8",errors="ignore") as fh: 
        content = fh.read()
        wrds = len(content.split(" "))
        if wrds <= 1000:
            max_chunk_length = 200
        elif wrds <= 4000: 
            max_chunk_length = 500
        else : 
            max_chunk_length = 1000


    sentences = sent_tokenize(content)
    chunks = []
    current_chunk = []
    current_length = 0

    for sentence in sentences:
        sentence_length = len(sentence.split())
        if current_length + sentence_length <= max_chunk_length:
            current_chunk.append(sentence)
            current_length += sentence_length
        else:
            chunks.append(' '.join(current_chunk))
            current_chunk = [sentence]
            current_length = sentence_length

    if current_chunk:
        chunks.append(' '.join(current_chunk))

    return chunks

for i in main_chunk():
    print(f"{i}\n")
    print(len(i.split()))

In 1667, a Danish scientist finally concluded that certain mysterious stones prized for their supposed medicinal powers, hadnt fallen from the sky during lunar eclipses and werent serpent tongues. In fact, they were fossilized teeth many belonging to a prehistoric species that would come to be called megalodon, the biggest shark to ever live. So what was it like when megalodon ruled the seas? And what brought this formidable predator to extinction? Because their skeletons were cartilaginous, what remains of megalodons are mostly scattered clues, like some isolated vertebrae and lots of their enamel-protected teeth. Like many sharks, megalodons could shed and replace thousands of teeth over the course of their lives. Interestingly, some fossil sites harbor especially high numbers of small megalodon teeth. Experts believe these were nurseries that supported countless generations of budding megalodons. They grew up in sheltered  and food-packed shallow waters before becoming unrivaled adu

**Pipeline Function To Summarize Text**

In [5]:
def max_min(chunk_length):
    if chunk_length <= 100:
        max_length = max(50, int(chunk_length * 0.8)) 
    elif chunk_length <= 500:
        max_length = max(100, int(chunk_length * 0.6)) 
    else:
        max_length = max(200, int(chunk_length * 0.5)) 
    min_length = int(max_length * 0.5)
    
    return max_length, min_length

#Model : bart-large-cnn
#Short Summary Based On Given Text Chunks

def short_summary(chunks, batch_size=8):
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    batched_chunks = [chunks[i:i+batch_size] for i in range(0, len(chunks), batch_size)]
    summaries = []
    for batch in batched_chunks:
        text = "\n".join(batch)
        summary = summarizer(text, max_length=300, min_length=100, do_sample=True, top_k=50, top_p=0.95)
        summaries.extend([s['summary_text'] for s in summary])
    return (" ".join(summaries))

#Model : distilbart-cnn-12-6
#Long Summary Based On Given Text Chunks
def long_summary(chunks):
    summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")
    summaries = []
    for chunk in chunks:
        chunk_length = len(chunk.split(" "))
        max_len, min_len = max_min(chunk_length=chunk_length)
        summary = summarizer(chunk, max_length=max_len, min_length=min_len, do_sample=False)
        summaries.append(summary[0]['summary_text'])
    
    return (" ".join(summaries))

#Model : bart-large-cnn-samsum
#Long Summary Alt 
def long_alt_summary(chunks):
    summarizer = pipeline("summarization", model="philschmid/bart-large-cnn-samsum")
    summaries = []
    for chunk in chunks:
        summary = summarizer(chunk, max_length=150, min_length=30, do_sample=False)
        summaries.append(summary[0]['summary_text'])
    return (" ".join(summaries))

In [18]:
def new_short_summary(chunks):
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    summaries = []
    for chunk in chunks:
        chunk_length = len(chunk.split(" "))
        max_len, min_len = max_min(chunk_length=chunk_length)
        summary = summarizer(chunk, max_length=max_len, min_length=min_len, do_sample=True, top_k=50, top_p=0.95)
        summaries.append(summary[0]['summary_text'])
    
    return ("\n".join(summaries))

In [19]:
content = main_chunk()
summary = new_short_summary(content)

In [21]:
for i in summary.split("\n"):
    print(f"{i}\n")

Fossilized teeth belong to a prehistoric species that would come to be called megalodon, the biggest shark to ever live. They grew up in sheltered  and food-packed shallow waters before becoming unrivaled adult marine hunters. Like many sharks, megalodons could shed and replace thousands of teeth over the course of their lives.

Megalodons were apex predators that not only ate large prey species but also other predators. Researchers have access to one exceptionally well-preserved spinal column that comprises 141 vertebrae of a 46-year-old megalodon. 3D model of the megalodons body suggests that its stomach could reach volumes of almost 10,000 liters big enough to fit an entire orca.

By the time they disappeared around 3.5 million years ago, the global climate had cooled, causing more glaciers to form and the sea level to drop. This dried up many coastal habitats, meaning some of the worlds most resource-rich marine sites were lost. About a third of all marine megafauna went extinct, s