# LexRank
**Lexrank** is a graph-based algorithm for text summarization that uses cosine similarity between sentences to score them based on their importance. Here are some advantages and disadvantages of the Lexrank algorithm for text summarization:

### Pros:
* Simple and easy to implement: Lexrank is a simple and easy-to-implement algorithm that requires only basic natural language processing techniques and can be applied to a wide range of text summarization tasks.
* Can handle long and complex texts: Lexrank can handle long and complex texts, and can produce summaries that capture the most important information in the input text.
* Extractive summarization: Lexrank is an extractive summarization technique, which means that it selects the most important sentences from the input text, ensuring that the summary is based on the content of the text.
* Good performance: Lexrank has been shown to perform well on various text summarization benchmarks.

### Disadvantages:
*	Limited coverage of the input text: Lexrank selects only a subset of the input text, which means that some important information may be left out of the summary.
*	Lack of abstractive summarization: Lexrank is an extractive summarization technique, which means that it can only select sentences from the input text and cannot generate new information or paraphrase the original text.
*	May not capture the context of the text: Lexrank only considers the similarity between sentences in the text and their connections, which may not capture the context of the text or the relationships between different parts of the text.
*	Sensitive to the choice of similarity metric: Lexrank relies on cosine similarity to score sentences, which may not be the best metric for all text summarization tasks.

Overall, Lexrank is a simple and effective algorithm for text summarization that can handle long and complex texts and produce summaries based on the most important sentences in the input text. However, it has limitations in terms of coverage, abstractive summarization, capturing the context of the text, and sensitivity to the choice of similarity metric.

These are the scores we achieved:

    ROUGE Score:
    Precision: 0.971
    Recall: 0.219
    F1-Score: 0.357

    BLEU Score: 0.651

## References
Here are some research papers related to using LexRank for text summarization:

1. "LexRank: Graph-based lexical centrality as salience in text summarization" by G. Erkan and D. Radev, in Journal of Artificial Intelligence Research (JAIR) (2004)

2. "Multi-document summarization using LexRank" by J. H. Lee and H. S. Seung, in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP)

3. "An evaluation of LexRank for Korean text summarization" by J. Kim and J. Kim, in Proceedings of the 2013 International Conference on Information Science and Applications (ICISA)

4. "Extractive summarization with rich linguistic features and LexRank" by Y. Zhang, B. Li, and M. Li, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)

These papers discuss various aspects of using LexRank for text summarization, such as its effectiveness in producing high-quality summaries, its comparison with other techniques like TF-IDF and LSA, and its application to different languages like Korean.

LexRank is a graph-based approach to text summarization that computes the centrality of each sentence based on its similarity to other sentences in the document. The approach uses a modified version of the PageRank algorithm to compute sentence centrality and select the most salient sentences for summarization.

The papers suggest that LexRank is a powerful approach to extractive summarization and can produce high-quality summaries that capture the most important information in the original document.

In [None]:
!pip install rouge
!pip install nltk
from rouge import Rouge 
import nltk
import nltk.translate.bleu_score as bleu
nltk.download('punkt')
import numpy as np
import networkx as nx

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [None]:
def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors."""
    dot_product = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    return dot_product / (norm1 * norm2)

In [None]:
def build_similarity_matrix(sentences, threshold=0.1):
    """Build the similarity matrix of sentences."""
    n = len(sentences)
    similarity_matrix = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            if i == j:
                continue
            similarity = cosine_similarity(sentences[i], sentences[j])
            if similarity > threshold:
                similarity_matrix[i][j] = similarity
    return similarity_matrix

In [None]:
def lexrank(sentences, threshold=0.1, damping_factor=0.85, max_iter=100):
    """Calculate LexRank scores for sentences."""
    # Build similarity matrix
    similarity_matrix = build_similarity_matrix(sentences, threshold=threshold)

    # Normalize rows of similarity matrix
    row_sums = similarity_matrix.sum(axis=1, keepdims=True)
    similarity_matrix = np.divide(similarity_matrix, row_sums)

    # Initialize scores
    n = len(sentences)
    scores = np.ones(n) / n

    # Iterate until convergence or max iterations reached
    for i in range(max_iter):
        new_scores = np.zeros(n)
        for j in range(n):
            # Calculate score for sentence j
            for k in range(n):
                if similarity_matrix[k][j] > 0:
                    new_scores[j] += similarity_matrix[k][j] * scores[k]
            # Apply damping factor
            new_scores[j] = (1 - damping_factor) + damping_factor * new_scores[j]
        # Check for convergence
        if np.allclose(new_scores, scores):
            break
        scores = new_scores

    # Return sentence scores
    return scores

In [None]:
sentences = [
    np.array([0.1, 0.2, 0.3]),
    np.array([0.2, 0.3, 0.4]),
    np.array([0.3, 0.4, 0.5]),
    np.array([0.4, 0.5, 0.6])
]
scores = lexrank(sentences)
print(scores)

[0.99372239 1.00418618 1.00261895 0.9992307 ]


In [None]:
import gensim.downloader as api

In [None]:
model = api.load('word2vec-google-news-300')



In [None]:
text ="""
 India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. The NEGVAC also suggested that private hospitals may be allowed to administer the vaccine, although the details of this are yet to be finalized.India began its vaccination drive in mid-January, starting with healthcare and frontline workers. Since then, over 13 million doses have been administered across the country. However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges.The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in India. The Health Ministry has also urged eligible individuals to come forward and get vaccinated at the earliest.India has reported over 11 million cases of COVID-19, making it the second-worst affected country in the world after the United States. The country's daily case count has been declining in recent weeks, but experts have warned that the pandemic is far from over and that precautions need to be maintained.
In summary, India's Health Ministry has announced that the country's COVID-19 vaccination drive will be expanded to include people over 60 and those over 45 with co-morbidities, covering an additional 270 million people. The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19, and is a major step towards achieving herd immunity and controlling the spread of the virus in India."""

In [None]:
# Split text into sentences
sentences = text.split(".")

# Generate embeddings for each sentence
sentence_embeddings = []
for sentence in sentences:
    words = sentence.split()
    embeddings = [model[word] for word in words if word in model.vocab]
    if len(embeddings) > 0:
        sentence_embeddings.append(np.mean(embeddings, axis=0))

In [None]:
sentence_embeddings

[array([-2.38037109e-02,  3.07269096e-02,  3.46786492e-02,  6.46850616e-02,
        -1.58630367e-02, -5.58158867e-02,  5.75439446e-02, -9.93164033e-02,
         6.87469468e-02,  4.99267578e-02, -5.85281365e-02, -5.77453598e-02,
        -3.72940078e-02,  9.33380146e-03, -6.65313751e-02,  1.44607546e-02,
         4.25201431e-02,  9.40673798e-02, -1.84148792e-02,  8.13293457e-03,
        -3.04705612e-02,  3.00231930e-02,  1.15020750e-02,  4.48062904e-02,
         5.38558960e-02, -3.00975796e-02, -9.49260741e-02,  1.62200928e-02,
         3.35418694e-02, -8.27026367e-03, -4.27497849e-02, -1.78607944e-02,
        -5.38887009e-02,  1.54205319e-02,  1.01379398e-02,  1.61705017e-02,
         4.95300302e-03, -8.18527192e-02,  3.48442085e-02,  7.95497894e-02,
         7.16087371e-02, -2.47192383e-03,  3.38745117e-02, -3.85686867e-02,
        -1.47583010e-02, -8.78479034e-02, -2.21679695e-02,  7.65838614e-03,
         2.81589516e-02, -2.40600593e-02,  2.80212406e-02,  4.26635752e-03,
        -5.1

In [None]:
# Calculate LexRank scores for sentences
scores = lexrank(sentence_embeddings)

# Sort sentences by score and get top n sentences as summary
n = 2
top_sentences = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:n]
summary = [sentences[i].strip() for i in top_sentences]

In [None]:
# Print summary
print("Summary:")
print("\n".join(summary))

Summary:
India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities
In summary, India's Health Ministry has announced that the country's COVID-19 vaccination drive will be expanded to include people over 60 and those over 45 with co-morbidities, covering an additional 270 million people


In [None]:
def listToString(s):
 
    # initialize an empty string
    str1 = ""
 
    # traverse in the string
    for ele in s:
        str1 += ele
 
    # return string
    return str1

In [None]:
summ= listToString(summary)

In [None]:
rouge = Rouge()
scores = rouge.get_scores(summ, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 0.971
Recall: 0.219
F1-Score: 0.357


In [None]:
from nltk.translate.bleu_score import sentence_bleu

def summary_to_sentences(summary):
    # Split the summary into sentences using the '.' character as a separator
    sentences = summary.split('.')
    
    # Convert each sentence into a list of words
    sentence_lists = [sentence.split() for sentence in sentences]
    
    return sentence_lists

def paragraph_to_wordlist(paragraph):
    # Split the paragraph into words using whitespace as a separator
    words = paragraph.split()
    return words

reference_paragraph = text
reference_summary = summary_to_sentences(reference_paragraph)
predicted_paragraph = summ
predicted_summary = paragraph_to_wordlist(predicted_paragraph)

score = sentence_bleu(reference_summary, predicted_summary)
print(score)

0.6509058480327963


In [None]:
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.651
