<a href="https://colab.research.google.com/github/Vi-vek9135/BERT/blob/main/bertsummarizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

installs the necessary packages (transformers, torch, tensorflow, and gtts) using the pip package manager.

In [1]:
# Install the transformers package from Hugging Face:
!pip install transformers
!pip install torch
!pip install tensorflow
!pip install gtts

Collecting gtts
  Downloading gTTS-2.5.1-py3-none-any.whl (29 kB)
Installing collected packages: gtts
Successfully installed gtts-2.5.1


imports the required modules from the transformers library and loads the BART (Facebook's denoising autoencoder for text) model and tokenizer.

In [2]:
from transformers import BartForConditionalGeneration, BartTokenizer

In [3]:
# Load the model and tokenizer
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

 function takes a text input and generates a summary using the BART model.

In [4]:
def summarize(text, maxSummarylength=500):
    # Encode the text and summarize
    inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = model.generate(inputs, max_length=maxSummarylength, min_length=int(maxSummarylength/5), length_penalty=10.0, num_beams=4, early_stopping=True)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

function splits the input text into smaller pieces to facilitate the summarization process.

In [5]:
def split_text_into_pieces(text, max_tokens=900, overlapPercent=10):
    # Tokenize the text
    tokens = tokenizer.tokenize(text)

    # Calculate the overlap in tokens
    overlap_tokens = int(max_tokens * overlapPercent / 100)

    # Split the tokens into chunks of size
    # max_tokens with overlap
    pieces = [tokens[i:i + max_tokens]
              for i in range(0, len(tokens),
                             max_tokens - overlap_tokens)]

    # Convert the token pieces back into text
    text_pieces = [tokenizer.decode(tokenizer.convert_tokens_to_ids(piece),skip_special_tokens=True) for piece in pieces]

    return text_pieces


function recursively summarizes the input text in smaller chunks to overcome any length limitations.

In [6]:

def recursive_summarize(text, max_length=200, recursionLevel=0):
    recursionLevel=recursionLevel+1
    print("######### Recursion level: ", recursionLevel,"\n\n######### ")
    tokens = tokenizer.tokenize(text)
    expectedCountOfChunks = len(tokens)/max_length
    max_length=int(len(tokens)/expectedCountOfChunks)+2

    # Break the text into pieces of max_length
    pieces = split_text_into_pieces(text, max_tokens=max_length)

    print("Number of pieces: ", len(pieces))
    # Summarize each piece
    summaries=[]
    k=0
    for k in range(0, len(pieces)):
        piece=pieces[k]
        print("****************************************************")
        print("Piece:",(k+1)," out of ", len(pieces), "pieces")
        print(piece, "\n")
        summary =summarize(piece, maxSummarylength=max_length/3*2)
        print("SUMNMARY: ", summary)
        summaries.append(summary)
        print("****************************************************")

    concatenated_summary = ' '.join(summaries)

    tokens = tokenizer.tokenize(concatenated_summary)

    if len(tokens) > max_length:
        # If the concatenated_summary is too long, repeat the process
        print("############# GOING RECURSIVE ##############")
        return recursive_summarize(concatenated_summary, max_length=max_length, recursionLevel=recursionLevel)
    else:
      # Concatenate the summaries and summarize again
        final_summary=concatenated_summary
        if len(pieces)>1:
            final_summary = summarize(concatenated_summary, maxSummarylength=max_length)
        return final_summary

An example text is provided, and the recursive_summarize function is called on this text to obtain the final summary.

In [7]:
# Example usage
text = '''Summary: Judgment on Complaint under Section 138 of the Negotiable Instruments Act

Introduction
This text discusses a judgment from the Supreme Court of India regarding a complaint filed under Section 138 of the Negotiable Instruments Act.
The case involves a dispute over a cheque issued by the respondent, which was returned due to insufficient funds.
The Trial Court initially dismissed the complaint, but the Supreme Court upheld it, finding that the cheque was indeed issued by the respondent.

Key Points
The complaint was dismissed initially due to contradictions in the evidence regarding the number of apple cartons and the amount owed.
The High Court established that a cheque carries a presumption of consideration unless proven otherwise.
The burden of proof is on the accused to rebut the presumption of consideration by providing evidence or circumstances to show that no debt existed.
The court discusses the presumption of debt or liability under Section 139 of the Act and states that it may fail if the accused raises a probable defense.
The court emphasizes that the presumption under Section 139 is a device to prevent undue delay in litigation and that dishonoring a check is
  largely a civil wrong.
The respondent in this case failed to provide any evidence to rebut the presumption of consideration in issuing the cheque.
The courts below were criticized for dismissing the complaint based on discrepancies in the determination of the amount due.
The respondent is held guilty of dishonoring the cheque and is ordered to pay a fine and costs.

Conclusion
In conclusion, the Supreme Court of India upheld a complaint filed under Section 138 of the Negotiable Instruments Act.
The court found that the cheque was issued by the respondent and criticized the lower courts for dismissing the complaint based on
discrepancies in the evidence. The court emphasized the presumption of consideration under Section 139 and held the respondent guilty
of dishonoring the cheque. The respondent was ordered to pay a fine and costs.
'''

final_summary = recursive_summarize(text)
print("\n%%%%%%%%%%%%%%%%%%%%%\n")
print("Final summary:", final_summary)

######### Recursion level:  1 

######### 
Number of pieces:  3
****************************************************
Piece: 1  out of  3 pieces
Summary: Judgment on Complaint under Section 138 of the Negotiable Instruments Act

Introduction
This text discusses a judgment from the Supreme Court of India regarding a complaint filed under Section 138 of the Negotiable Instruments Act.
The case involves a dispute over a cheque issued by the respondent, which was returned due to insufficient funds.
The Trial Court initially dismissed the complaint, but the Supreme Court upheld it, finding that the cheque was indeed issued by the respondent.

Key Points
The complaint was dismissed initially due to contradictions in the evidence regarding the number of apple cartons and the amount owed.
The High Court established that a cheque carries a presumption of consideration unless proven otherwise.
The burden of proof is on the accused to rebut the presumption of consideration by providing evidence or

cell uses the gTTS library to convert the final summary into an audio file (MP3 format) and saves it as "summary_audio.mp3".

In [8]:
from gtts import gTTS

# Language in which you want to convert
language = 'en'

# Passing the text and language to the engine
tts = gTTS(text=final_summary, lang=language, slow=False)

# Saving the converted audio in a file
tts.save("summary_audio.mp3")
