<a href="https://colab.research.google.com/github/DivyanshiSingh12/ChatBot_Telegram/blob/master/Text_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Text Summarization
##Abstractive Summary using BERT's Model


In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m58.4 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m109.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90

In [None]:
from transformers import BartTokenizer, BartForConditionalGeneration

def generate_abstractive_summary(text):
    model_name = 'facebook/bart-large-cnn'
    tokenizer = BartTokenizer.from_pretrained(model_name)
    model = BartForConditionalGeneration.from_pretrained(model_name)

    inputs = tokenizer(text, truncation=True, padding='longest', return_tensors='pt')
    summary_ids = model.generate(inputs['input_ids'], max_length=100, min_length=30, num_beams=4, early_stopping=True)

    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary



In [None]:
text = "Make sure you have the transformers library installed, preferably the latest version, using pip install transformers. Also, keep in mind that the quality of the abstractive summary may vary based on the pre-trained model and fine-tuning process used."
summary = generate_abstractive_summary(text)

print("Original Text:")
print(text)
print("\nBERT Abstractive Summary:")
print(summary)


Downloading (…)olve/main/vocab.json: 0.00B [00:00, ?B/s]

Downloading (…)olve/main/merges.txt: 0.00B [00:00, ?B/s]

Downloading (…)lve/main/config.json: 0.00B [00:00, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Original Text:
Make sure you have the transformers library installed, preferably the latest version, using pip install transformers. Also, keep in mind that the quality of the abstractive summary may vary based on the pre-trained model and fine-tuning process used.

BERT Abstractive Summary:
Make sure you have the transformers library installed, preferably the latest version. Keep in mind that the quality of the abstractive summary may vary based on the pre-trained model and fine-tuning process used.


##Extractive Text Summarization using BERT's pretrained model


In [None]:
pip install torch transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import torch
from transformers import BertTokenizer, BertModel
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import KMeans

def extractive_summarization(text, num_sentences):
    model_name = 'bert-base-uncased'
    tokenizer = BertTokenizer.from_pretrained(model_name)
    model = BertModel.from_pretrained(model_name)

    # Tokenize the text
    tokens = tokenizer.tokenize(text)
    input_ids = tokenizer.convert_tokens_to_ids(tokens)

    # Generate sentence embeddings using BERT
    inputs = torch.tensor([input_ids])
    outputs = model(inputs)[0]
    embeddings = outputs.detach().numpy()

    # Reshape the embeddings array
    embeddings = embeddings.squeeze(axis=0)

    # Apply clustering to group similar sentences
    num_clusters = min(num_sentences, len(tokens))
    kmeans = KMeans(n_clusters=num_clusters, random_state=0)
    cluster_labels = kmeans.fit_predict(embeddings)

    # Select representative sentences from each cluster
    summary = ""
    for cluster_id in range(num_clusters):
        cluster_sentences = [tokens[i] for i, label in enumerate(cluster_labels) if label == cluster_id]
        representative_sentence = max(cluster_sentences, key=len)  # Choose the longest sentence as representative
        summary += representative_sentence + " "

    return summary

text = "Make sure you have the transformers library installed, preferably the latest version, using pip install transformers. Also, keep in mind that the quality of the abstractive summary may vary based on the pre-trained model and fine-tuning process used."

summary = extractive_summarization(text, num_sentences=2)
print("Original Text:")
print(text)
print("\nExtractive Summary:")
print(summary)


Downloading (…)solve/main/vocab.txt: 0.00B [00:00, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Original Text:
Make sure you have the transformers library installed, preferably the latest version, using pip install transformers. Also, keep in mind that the quality of the abstractive summary may vary based on the pre-trained model and fine-tuning process used.

Extractive Summary:
transformers also 


##Adding Text to a Text File (Extractive Summary using TextRank algorithm).
###Final Code

In [None]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import networkx as nx

def extractive_summarization(text, num_sentences):
    # Tokenize the text into sentences
    sentences = sent_tokenize(text)

    # Preprocess the sentences
    stop_words = set(stopwords.words('english'))
    preprocessed_sentences = [word_tokenize(sentence.lower()) for sentence in sentences]
    preprocessed_sentences = [[word for word in sentence if word.isalnum() and word not in stop_words] for sentence in preprocessed_sentences]

    # Calculate the sentence similarity matrix using TF-IDF vectors
    sentence_vectors = TfidfVectorizer().fit_transform([' '.join(sentence) for sentence in preprocessed_sentences])
    similarity_matrix = cosine_similarity(sentence_vectors)

    # Convert the similarity matrix into a graph
    graph = nx.from_numpy_array(similarity_matrix)

    # Apply the TextRank algorithm to rank the sentences
    scores = nx.pagerank(graph)

    # Sort the sentences based on their scores
    ranked_sentences = sorted(((scores[i], sentence) for i, sentence in enumerate(sentences)), reverse=True)

    # Select the top-ranked sentences as the summary
    summary_sentences = [sentence for _, sentence in ranked_sentences[:num_sentences]]
    summary = ' '.join(summary_sentences)

    return summary

text = "This is the input text that we want to summarize. It contains multiple sentences and discusses various aspects. We will use TextRank for extractive summarization."
summary = extractive_summarization(text, num_sentences=2)

# Save the extractive summary to a text file
with open('Summary.txt', 'w') as file:
    file.write(summary)

print("Original Text:")
print(text)
print("\nExtractive Summary:")
print(summary)
print("Summary saved to 'summary.txt' file.")


Original Text:
This is the input text that we want to summarize. It contains multiple sentences and discusses various aspects. We will use TextRank for extractive summarization.

Extractive Summary:
We will use TextRank for extractive summarization. This is the input text that we want to summarize.
Summary saved to 'summary.txt' file.


Wrong Output

In [None]:
import nltk
from nltk.tokenize import sent_tokenize

def generate_summary(text):
    summaries = []
    sentences = sent_tokenize(text)

    # Split the text into sections for each person
    sections = text.split('Person')

    # Iterate through each section and generate a summary for each person
    for i, section in enumerate(sections[1:]):
        person_text = 'Person' + section
        sentences = sent_tokenize(person_text)
        summary = ' '.join(sentences[:2])  # Extract the first two sentences as the summary
        summaries.append(f"Person {i+1}: {summary}\n")

    return summaries

text = '''
"Person 1: Hi everyone! Thank you all for being here today. Let's discuss the needs and requirements for our new startup. I'll start by saying that we need a solid business plan. We should define our target market, competition, and unique value proposition. Any thoughts?
Person 2: Absolutely, a well-defined business plan is crucial. We also need to consider our financial requirements. How much funding will we need to get started and sustain the business until we become profitable?
Person 3: Agreed. We should conduct a thorough market analysis to determine the potential demand for our product or service. This will help us gauge the size of our target market and estimate the sales and revenue projections.
Person 4: Additionally, we need to identify the right technology infrastructure and tools to support our operations. We should discuss the hardware, software, and network requirements for our startup.
Person 5: I think branding and marketing will play a significant role in our success. We need to establish a strong brand identity and develop a comprehensive marketing strategy to reach our target audience effectively.
Person 1: Good point! We should also consider the talent and skills required to run the startup. Let's discuss the roles and responsibilities we need to fill and create a hiring plan.
Person 2: Speaking of talent, we should invest in ongoing training and development programs to ensure our team stays up-to-date with the latest industry trends and skills.
Person 3: Absolutely, and we shouldn't forget about legal and regulatory compliance. We need to understand the laws and regulations that apply to our business and ensure we have the necessary licenses and permits.
Person 4: Another important aspect is scalability. As we grow, our startup should be able to handle increased demand. We need to plan for scalability in terms of infrastructure, staffing, and processes.
Person 5: And let's not overlook the importance of customer support. We should prioritize building strong customer relationships and providing exceptional support to ensure customer satisfaction.
Person 1: Fantastic ideas, everyone! It seems like we have a lot to consider. Let's create an action plan to tackle each of these requirements and set realistic timelines to achieve them.
Person 2: Agreed. We should assign responsibilities to each team member to ensure accountability and progress. Regular check-ins and milestones will help us stay on track.
Person 3: And let's make sure to regularly evaluate our progress and adapt our plans as needed. Flexibility and agility will be key as we navigate the challenges and opportunities that come our way.
Person 4: Absolutely. By addressing these needs and requirements systematically, we'll be on our way to building a successful and sustainable startup.
Person 5: I'm excited about this journey with all of you. Let's bring our expertise together and make this startup a remarkable success!"
Person 1: Now that we have discussed the needs and requirements of our startup, we should also focus on creating a detailed financial plan. It will help us project our expenses, revenue streams, and potential profitability."
Person 2: I agree. We should consider various financial aspects such as fixed costs, variable costs, pricing strategies, and projected sales volumes. This will enable us to determine our break-even point and set realistic financial goals."
Person 3: In addition to the financial plan, we should develop a comprehensive marketing plan. This will involve identifying our target audience, conducting market research, and formulating effective marketing campaigns to create awareness and drive customer acquisition."
Person 4: Absolutely. We should leverage both online and offline marketing channels to reach our target audience. Social media advertising, content marketing, and attending industry events could be effective strategies to consider."
Person 5: Furthermore, we should establish key performance indicators (KPIs) to measure the success of our marketing efforts. This will help us track our progress, identify areas for improvement, and optimize our marketing strategies accordingly."
Person 1: That's a great point. We should also prioritize building strong relationships with potential investors and strategic partners. Networking events and pitching sessions can provide valuable opportunities to connect with industry experts and secure additional funding or collaborations."
Person 2: Additionally, we should allocate resources for research and development to continuously innovate and improve our products or services. Staying ahead of the competition and offering unique value propositions will be crucial for long-term success."
Person 3: As we move forward, we should keep a close eye on industry trends and emerging technologies. This will allow us to adapt quickly, seize new opportunities, and stay relevant in a rapidly evolving market."'''

summaries = generate_summary(text)

# Save the summaries to a text file
with open('Summary.txt', 'w') as file:
    file.writelines(summaries)

print("Summaries saved to 'person_summaries.txt' file.")


Summaries saved to 'person_summaries.txt' file.


Working Code

In [None]:
import re
from nltk.tokenize import sent_tokenize

conversation = """
Person 1: "Hi everyone! Thank you all for being here today. Let's discuss the needs and requirements for our new startup. I'll start by saying that we need a solid business plan. We should define our target market, competition, and unique value proposition. Any thoughts?"
Person 2: "Absolutely, a well-defined business plan is crucial. We also need to consider our financial requirements. How much funding will we need to get started and sustain the business until we become profitable?"
Person 3: "Agreed. We should conduct a thorough market analysis to determine the potential demand for our product or service. This will help us gauge the size of our target market and estimate the sales and revenue projections."
Person 4: "Additionally, we need to identify the right technology infrastructure and tools to support our operations. We should discuss the hardware, software, and network requirements for our startup."
Person 5: "I think branding and marketing will play a significant role in our success. We need to establish a strong brand identity and develop a comprehensive marketing strategy to reach our target audience effectively."
Person 1: "Good point! We should also consider the talent and skills required to run the startup. Let's discuss the roles and responsibilities we need to fill and create a hiring plan."
Person 2: "Speaking of talent, we should invest in ongoing training and development programs to ensure our team stays up-to-date with the latest industry trends and skills."
Person 3: "Absolutely, and we shouldn't forget about legal and regulatory compliance. We need to understand the laws and regulations that apply to our business and ensure we have the necessary licenses and permits."
Person 4: "Another important aspect is scalability. As we grow, our startup should be able to handle increased demand. We need to plan for scalability in terms of infrastructure, staffing, and processes."
Person 5: "And let's not overlook the importance of customer support. We should prioritize building strong customer relationships and providing exceptional support to ensure customer satisfaction."
Person 1: "Fantastic ideas, everyone! It seems like we have a lot to consider. Let's create an action plan to tackle each of these requirements and set realistic timelines to achieve them."
Person 2: "Agreed. We should assign responsibilities to each team member to ensure accountability and progress. Regular check-ins and milestones will help us stay on track."
Person 3: "And let's make sure to regularly evaluate our progress and adapt our plans as needed. Flexibility and agility will be key as we navigate the challenges and opportunities that come our way."
Person 4: "Absolutely. By addressing these needs and requirements systematically, we'll be on our way to building a successful and sustainable startup."
Person 5: "I'm excited about this journey with all of you. Let's bring our expertise together and make this startup a remarkable success!"
Person 1: "Now that we have discussed the needs and requirements of our startup, we should also focus on creating a detailed financial plan. It will help us project our expenses, revenue streams, and potential profitability."
Person 2: "I agree. We should consider various financial aspects such as fixed costs, variable costs, pricing strategies, and projected sales volumes. This will enable us to determine our break-even point and set realistic financial goals."
Person 3: "In addition to the financial plan, we should develop a comprehensive marketing plan. This will involve identifying our target audience, conducting market research, and formulating effective marketing campaigns to create awareness and drive customer acquisition."
Person 4: "Absolutely. We should leverage both online and offline marketing channels to reach our target audience. Social media advertising, content marketing, and attending industry events could be effective strategies to consider."
Person 5: "Furthermore, we should establish key performance indicators (KPIs) to measure the success of our marketing efforts. This will help us track our progress, identify areas for improvement, and optimize our marketing strategies accordingly."
Person 1: "That's a great point. We should also prioritize building strong relationships with potential investors and strategic partners. Networking events and pitching sessions can provide valuable opportunities to connect with industry experts and secure additional funding or collaborations."
Person 2: "Additionally, we should allocate resources for research and development to continuously innovate and improve our products or services. Staying ahead of the competition and offering unique value propositions will be crucial for long-term success."
Person 3: "As we move forward, we should keep a close eye on industry trends and emerging technologies. This will allow us to adapt quickly, seize new opportunities, and stay relevant in a rapidly evolving market."
"""

# Extract participant names and their statements from the conversation
participants_statements = re.findall(r'(Person [1-9]): "(.*?)"', conversation)

# Generate summary of what each participant has said
summary = {}
for participant, statement in participants_statements:
    if participant not in summary:
        summary[participant] = []
    summary[participant].append(statement)

# Write the summary to a text file
with open("participant_summary.txt", "w") as file:
    for participant, statements in summary.items():
        file.write("{}:\n".format(participant))
        summary_sentences = []
        for statement in statements:
            sentences = sent_tokenize(statement)
            summary_sentences.extend(sentences)
        summary_text = " ".join(summary_sentences)
        file.write(summary_text)
        file.write("\n\n")


In [None]:
import re
from nltk.tokenize import sent_tokenize

conversation = """
Person A: "Hey everyone, I'm really excited to discuss our plans for building our new startup software. This is a great opportunity for us to create something innovative and impactful in the industry."
Person B: "Absolutely! I think the first step is to identify the specific problem we want to solve with our software. We should conduct market research to understand the needs and pain points of our target audience."
Person C: "I agree, market research is crucial. We need to gather data and insights to validate the demand for our software and identify any existing solutions in the market. This will help us refine our idea and differentiate ourselves from competitors."
Person A: "Once we have a clear understanding of the problem, we can start working on developing our software. We should consider building a minimum viable product (MVP) that showcases the core features and functionality to gather feedback from early users."
Person B: "Yes, the MVP will allow us to iterate and improve the software based on user feedback. Continuous refinement is essential to ensure that we are building a product that truly meets the needs of our target audience."
Person C: "While developing the software, we should also consider scalability and flexibility. It's important to build a strong foundation that can accommodate future growth and adapt to evolving market trends and technologies."
Person A: "Definitely. We should also pay attention to the user experience and design aspects of our software. A user-friendly interface and intuitive workflows can greatly enhance the adoption and satisfaction of our customers."
Person B: "In parallel, we need to start thinking about our go-to-market strategy. How are we going to position our software in the market? What will be our pricing model? These are important aspects to consider for a successful launch."
Person C: "Agreed. We should also think about our marketing and sales efforts. How will we create awareness and generate leads for our software? We may need to explore different channels such as online advertising, content marketing, or partnerships."
Person A: "Furthermore, we should establish a solid support and maintenance plan for our software. Providing excellent customer support and regular updates will be crucial to ensure customer satisfaction and retention."
Person B: "Absolutely, we should prioritize building a strong customer-centric culture within our startup. By listening to our customers and incorporating their feedback, we can continuously enhance our software and build strong relationships."
Person C: "Lastly, let's not forget about the importance of teamwork and collaboration. We need a diverse and talented team that can work together effectively, share ideas, and collectively contribute to the success of our startup."
Person A: "That's a great point. Building a strong team with complementary skills and a shared vision will be instrumental in overcoming challenges and driving our startup towards success."
Person B: "I'm really excited about this journey. Let's work together and build a remarkable software that makes a positive impact in the lives of our users."
Person C: "Absolutely! With the right mindset, dedication, and hard work, I believe we can create something truly exceptional. Let's make our startup software a game-changer in the industry!"
"""

# Extract participant names and their statements from the conversation
participants_statements = re.findall(r'(Person [A-Z]): "(.*?)"', conversation)

# Generate summary of what each participant has said
summary = {}
for participant, statement in participants_statements:
    if participant not in summary:
        summary[participant] = []
    summary[participant].append(statement)

# Write the summary to a text file
with open("participant_summary.txt", "w") as file:
    for participant, statements in summary.items():
        file.write("{}:\n".format(participant))
        summary_sentences = []
        for statement in statements:
            sentences = sent_tokenize(statement)
            summary_sentences.extend(sentences)
        summary_text = " ".join(summary_sentences)
        file.write(summary_text)
        file.write("\n\n")


Pickle Module

In [None]:
import pickle

#Basic pickling
'''greet = ["Hello","Greetings","Hi!","Good_Morning","How_are_you_doing?"]
file = "greetme.pkl"
fileobj = open(file,'wb')
pickle.dump(greet, fileobj)
fileobj.close()'''

file = "greetme.pkl"
fileobj = open(file,'rb')
greetme = pickle.load(fileobj)
print(greetme)
print(type(greetme))

['Hello', 'Greetings', 'Hi!', 'Good_Morning', 'How_are_you_doing?']
<class 'list'>
