<a href="https://colab.research.google.com/github/Saimadeveloper/Text-Summarization.py/blob/main/Text_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import spacy
from textblob import TextBlob
from collections import Counter
import re

# Load SpaCy language model
nlp = spacy.load("en_core_web_sm")


In [2]:
def preprocess_text(text):
    # Remove extra spaces and clean special characters
    text = re.sub(r'\s+', ' ', text.strip())
    return text


In [3]:
import nltk
nltk.download('punkt')


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [10]:
!python -m textblob.download_corpora


[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package conll2000 to /root/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.


In [4]:
import nltk
print(nltk.data.find('tokenizers/punkt'))


/root/nltk_data/tokenizers/punkt


In [5]:
def extract_sentences(text):
    doc = nlp(text)
    return [sent.text.strip() for sent in doc.sents]


In [6]:
from textblob import Word

def get_keywords(text, top_n=10):
    blob = TextBlob(text)
    words = [Word(word.lower()) for word in blob.words if word.isalpha()]
    words = [word for word in words if word not in nlp.Defaults.stop_words]
    word_freq = Counter(words)
    return [word.string for word, _ in word_freq.most_common(top_n)]


In [7]:
def summarize_text(text, summary_ratio=0.4):
    text = preprocess_text(text)
    sentences = extract_sentences(text)
    keywords = get_keywords(text)

    sentence_scores = {}
    for sentence in sentences:
        sentence_lower = sentence.lower()
        score = sum(1 for word in keywords if word in sentence_lower)

        # Perform similarity check only if word vectors are available
        doc_input = nlp(sentence_lower)
        if nlp.vocab.has_vector:
            score += sum(doc_input.similarity(nlp(word)) for word in keywords)

        sentence_scores[sentence] = score

    # Select the top sentences
    sorted_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)
    num_sentences = max(1, int(len(sentences) * summary_ratio))
    summary = sorted_sentences[:num_sentences]

    return ' '.join(summary)


In [13]:
text = """
Artificial Intelligence (AI) is rapidly transforming industries across the globe.
From healthcare and finance to autonomous vehicles and cybersecurity, AI-powered systems are solving complex problems.
Machine learning algorithms, particularly deep learning models, are making significant advancements.
However, ethical concerns, data privacy issues, and the potential for job displacement remain significant challenges.
Despite these challenges, AI continues to drive innovation and improve efficiencies across various sectors.
"""

print("Original Text:\n", text)
summary = summarize_text(text)
print("\nGenerated Summary:\n", summary)


Original Text:
 
Artificial Intelligence (AI) is rapidly transforming industries across the globe.
From healthcare and finance to autonomous vehicles and cybersecurity, AI-powered systems are solving complex problems.
Machine learning algorithms, particularly deep learning models, are making significant advancements.
However, ethical concerns, data privacy issues, and the potential for job displacement remain significant challenges.
Despite these challenges, AI continues to drive innovation and improve efficiencies across various sectors.



  score += sum(doc_input.similarity(nlp(word)) for word in keywords)



Generated Summary:
 Artificial Intelligence (AI) is rapidly transforming industries across the globe. However, ethical concerns, data privacy issues, and the potential for job displacement remain significant challenges.
