# Luhn's Model

**The Luhn Model** is a statistical-based text summarization technique that selects the most relevant sentences based on the frequency of important words in the text. Here are some advantages and disadvantages of using the Luhn Model for text summarization:

### Pros:

* Easy to implement: The Luhn Model is a simple algorithm that is easy to implement and requires minimal computational resources.

* No training data needed: The Luhn Model does not require any training data, as it is based on a statistical analysis of the text.

* Good for extractive summarization: The Luhn Model is well-suited for extractive summarization, where the summary is generated by selecting the most relevant sentences from the original text.

* Language-independent: The Luhn Model is language-independent, which means it can be applied to any language.

### Cons:

* Limited to statistical analysis: The Luhn Model relies solely on a statistical analysis of the text and may not be able to capture the semantic meaning of the text.

* Limited context awareness: The Luhn Model does not consider the context in which the sentences are used, which can lead to the selection of irrelevant sentences.

* Over-reliance on word frequency: The Luhn Model relies heavily on word frequency, which may not always be an accurate indicator of the importance of a sentence.

* Limited to single document summarization: The Luhn Model is designed for single document summarization and may not work well for summarizing multiple documents or large sets of data.

These are the scores we achieved:

    ROUGE Score:
    Precision: 0.991
    Recall: 0.742
    F1-Score: 0.848

    BLEU Score: 0.700

## References

Here are some research papers related to Luhn's algorithm for text summarization:

1. "The automatic creation of literature abstracts" by H. P. Luhn, in IBM Journal of Research and Development (1958)

2. "Text summarization using Luhn's algorithm" by H. P. Luhn, in Information Retrieval Techniques for Speech Applications (1996)

3. "Experiments with Luhn's automatic summarizer" by T. F. Sumner, in Journal of the Association for Computing Machinery (1959)

4. "Combining Luhn's algorithm with latent semantic analysis for text summarization" by R. S. Kesavan and S. S. Iyengar, in Proceedings of the 2009 International Conference on Advances in Recent Technologies in Communication and Computing

These papers describe the original Luhn's algorithm for text summarization, its limitations, and its extensions. The algorithm is based on identifying the most frequent words in a document and selecting the sentences that contain them. This approach is simple and can produce reasonable results, but it has some limitations, such as the lack of understanding of the semantic relationships between words.

The later papers explore extensions to the Luhn's algorithm, such as combining it with other techniques, like latent semantic analysis, to improve its performance. These extensions aim to address some of the limitations of the original algorithm and improve its effectiveness in generating high-quality summaries.





In [9]:
from collections import Counter
from nltk.corpus import stopwords
!pip install scikit-learn
!pip install rouge
!pip install nltk
from rouge import Rouge 
import nltk
import nltk.translate.bleu_score as bleu
nltk.download('stopwords')
nltk.download('punkt')
from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize, sent_tokenize 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [10]:
def extract_keywords(text, n_keywords=10):
    # Tokenize the text
    tokens = text.lower().split()

    # Remove stop words
    stop_words = set(stopwords.words('english'))
    tokens = [token for token in tokens if token not in stop_words]

    # Calculate the frequency of each word
    freq = Counter(tokens)

    # Assign scores to each word based on frequency and position
    scores = {word: freq[word] * (i+1) for i, word in enumerate(tokens)}

    # Sort the words by score and select the top n_keywords
    keywords = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:n_keywords]

    # Return the top keywords
    return [keyword[0] for keyword in keywords]

In [11]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [12]:
text = """
 India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. The NEGVAC also suggested that private hospitals may be allowed to administer the vaccine, although the details of this are yet to be finalized.India began its vaccination drive in mid-January, starting with healthcare and frontline workers. Since then, over 13 million doses have been administered across the country. However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges.The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in India. The Health Ministry has also urged eligible individuals to come forward and get vaccinated at the earliest.India has reported over 11 million cases of COVID-19, making it the second-worst affected country in the world after the United States. The country's daily case count has been declining in recent weeks, but experts have warned that the pandemic is far from over and that precautions need to be maintained.
In summary, India's Health Ministry has announced that the country's COVID-19 vaccination drive will be expanded to include people over 60 and those over 45 with co-morbidities, covering an additional 270 million people. The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19, and is a major step towards achieving herd immunity and controlling the spread of the virus in India."""

# Extract the top 3 keywords
keywords = extract_keywords(text, n_keywords=3)

# Print the keywords
print('Top keywords:', keywords)

Top keywords: ['vaccination', 'drive', 'million']


In [13]:
# Summarize the text using the top keywords
sentences = text.split('.')
summary = ''
for sentence in sentences:
    for keyword in keywords:
        if keyword in sentence.lower():
            summary += sentence.strip() + '. '
            break

# Print the summary
print('Summary:', summary)

Summary: India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world. The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. India began its vaccination drive in mid-January, starting with healthcare and frontline workers. Since then, over 13 million doses have been administered across the country. However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges. The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in Indi

In [14]:
rouge = Rouge()
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 0.991
Recall: 0.742
F1-Score: 0.848


In [15]:
from nltk.translate.bleu_score import sentence_bleu

def summary_to_sentences(summary):
    # Split the summary into sentences using the '.' character as a separator
    sentences = summary.split('.')
    
    # Convert each sentence into a list of words
    sentence_lists = [sentence.split() for sentence in sentences]
    
    return sentence_lists

def paragraph_to_wordlist(paragraph):
    # Split the paragraph into words using whitespace as a separator
    words = paragraph.split()
    return words

reference_paragraph = text
reference_summary = summary_to_sentences(reference_paragraph)
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)

score = sentence_bleu(reference_summary, predicted_summary)
print(score)

0.7003175301310649


In [16]:
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.700
