# TextRank
**TextRank** algorithm has its own advantages and disadvantages. Here are some of the pros and cons:

### Pros:

* Automatic: Text summarization using TextRank is an automatic process that does not require human intervention. It can summarize large amounts of text in a very short period of time.

* Unbiased: TextRank algorithm is unbiased and does not take into account the author's opinion or perspective while summarizing the text. It summarizes the text based on the frequency of the most important keywords.

* Saves time: Text summarization using TextRank saves time and effort. It can quickly provide a summary of the main points of a large text without having to read the entire document.

* Consistency: TextRank algorithm provides consistent summaries every time. The algorithm uses a fixed set of rules to summarize the text and does not get influenced by external factors.

* Customizable: TextRank algorithm can be customized to suit specific needs. The algorithm can be modified to prioritize certain keywords or phrases to provide a more targeted summary.

### Cons:

* Limited context: TextRank algorithm focuses on the most important keywords and may miss out on important context that is not captured by those keywords.

* Limited accuracy: TextRank algorithm may not provide accurate summaries if the text is poorly written or has grammatical errors.

* Limited understanding: TextRank algorithm lacks human-like understanding of the text. It may not understand the nuances of language, sarcasm, or irony, which can affect the accuracy of the summary.

* Limited coverage: TextRank algorithm may not be able to summarize all types of text. It is more effective for summarizing factual texts such as news articles or scientific papers.

* Limited creativity: TextRank algorithm cannot provide creative summaries that are outside the scope of the text. It can only summarize what is already present in the text.

These are the scores we achieved:

      ROUGE Score:
      Precision: 1.00
      Recall: 0.41
      F1-Score: 0.59

      BLEU Score: 0.69

## References
Here are a few research papers on text summarization using TextRank:

1. "TextRank: Bringing Order into Texts" by Rada Mihalcea and Paul Tarau (2004)
This paper introduced the TextRank algorithm, which is a graph-based ranking algorithm for text summarization. The authors applied TextRank to several datasets and demonstrated its effectiveness in producing high-quality summaries.

2. "A Comparative Study of Text Summarization Techniques" by G. Pandey and P. Pal (2007)
This paper compares various text summarization techniques, including TextRank, and evaluates their effectiveness on different types of datasets. The authors found that TextRank outperformed other techniques in terms of precision and recall.

3. "An Improved TextRank Algorithm for Text Summarization" by X. Wu et al. (2018)
This paper proposes an improved version of TextRank for text summarization that takes into account sentence length and position in the text. The authors evaluated the effectiveness of the improved TextRank on several datasets and found that it outperformed the original TextRank algorithm.

4. "Text Summarization Using TextRank and Latent Semantic Analysis" by K. Murthy et al. (2020)
This paper combines TextRank with Latent Semantic Analysis (LSA) for text summarization and evaluates its effectiveness on several datasets. The authors found that the combination of TextRank and LSA produced higher-quality summaries than either technique alone.







 

 


In [26]:
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from string import punctuation
from collections import defaultdict


In [27]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [28]:
def calculate_similarity(s1, s2):
    """
    Calculates the similarity between two sentences based on the overlap of their words.
    """
    s1 = set(s1)
    s2 = set(s2)
    overlap = len(s1.intersection(s2))
    return overlap / (len(s1) + len(s2))

def summarize(text, num_sentences=3):
    """
    Summarizes the given text using the TextRank algorithm.
    """
    # Tokenize the text into sentences and words
    sentences = sent_tokenize(text)
    words = [word_tokenize(sentence.lower()) for sentence in sentences]

    # Remove stopwords and punctuation
    stop_words = set(stopwords.words('english') + list(punctuation))
    filtered_words = [[word for word in sentence if word not in stop_words] for sentence in words]

    # Create a dictionary to hold the word frequencies
    word_freq = defaultdict(int)
    for sentence in filtered_words:
        for word in sentence:
            word_freq[word] += 1

    # Calculate the sentence scores based on word frequencies and similarity
    sentence_scores = defaultdict(int)
    for i, sentence in enumerate(filtered_words):
        for word in sentence:
            sentence_scores[i] += word_freq[word] / sum(word_freq.values())
    for i, sentence in enumerate(filtered_words):
        for j, other_sentence in enumerate(filtered_words):
            if i == j:
                continue
            similarity = calculate_similarity(sentence, other_sentence)
            sentence_scores[i] += similarity

    # Sort the sentences by score and select the top ones
    top_sentences = sorted(sentence_scores.items(), key=lambda x: x[1], reverse=True)[:num_sentences]
    top_sentences = [sentences[i] for i, score in top_sentences]

    # Combine the top sentences into a summary
    summary = ' '.join(top_sentences)

    return summary

In [29]:
article = """
India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. The NEGVAC also suggested that private hospitals may be allowed to administer the vaccine, although the details of this are yet to be finalized.India began its vaccination drive in mid-January, starting with healthcare and frontline workers. Since then, over 13 million doses have been administered across the country. However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges.The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in India. The Health Ministry has also urged eligible individuals to come forward and get vaccinated at the earliest.India has reported over 11 million cases of COVID-19, making it the second-worst affected country in the world after the United States. The country's daily case count has been declining in recent weeks, but experts have warned that the pandemic is far from over and that precautions need to be maintained.
In summary, India's Health Ministry has announced that the country's COVID-19 vaccination drive will be expanded to include people over 60 and those over 45 with co-morbidities, covering an additional 270 million people. The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19, and is a major step towards achieving herd immunity and controlling the spread of the virus in India.
"""

In [24]:
print ("The Actual length of the article is : ", len(article))

The Actual length of the article is :  1981


In [34]:
# Generating the summary
summary = summarize(article, num_sentences=3)

In [35]:
print ("The length of the summarized article is : ", len(summary))
summary

The length of the summarized article is :  736


"In summary, India's Health Ministry has announced that the country's COVID-19 vaccination drive will be expanded to include people over 60 and those over 45 with co-morbidities, covering an additional 270 million people. \nIndia's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges.The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in India."

In [32]:
!pip install rouge

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1


In [36]:
from rouge import Rouge
rouge = Rouge()
scores = rouge.get_scores(summary, article)
print("ROUGE Score:")
print("Precision: {:.2f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.2f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.2f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 1.00
Recall: 0.41
F1-Score: 0.59


In [39]:
from nltk.translate.bleu_score import sentence_bleu

def summary_to_sentences(summary):
    # Split the summary into sentences using the '.' character as a separator
    sentences = summary.split('.')
    
    # Convert each sentence into a list of words
    sentence_lists = [sentence.split() for sentence in sentences]
    
    return sentence_lists

def paragraph_to_wordlist(paragraph):
    # Split the paragraph into words using whitespace as a separator
    words = paragraph.split()
    return words

reference_paragraph = article
reference_summary = summary_to_sentences(reference_paragraph)
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)



score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.2f}".format(score))

BLEU Score: 0.69
