# LSA
**Latent Semantic Analysis (LSA)** is a popular algorithm for text summarization that uses singular value decomposition (SVD) to identify the underlying concepts in a text. Here are some advantages and disadvantages of the LSA algorithm for text summarization:

### Pros:
*	Captures semantic relationships: LSA is effective at identifying the semantic relationships between words in a text, which can help to generate more accurate and relevant summaries.
*	Good for multi-document summarization: LSA is particularly well-suited for summarizing multiple documents at once, as it can identify the common themes and concepts across them.
*	Flexibility: LSA can be used with different types of text, such as news articles, scientific papers, and social media posts, among others.

### Cons:
*	Requires large amounts of training data: LSA requires a large amount of training data to accurately identify the underlying concepts in a text.
*	Difficulty in handling new words: LSA may struggle to handle new words that are not part of its training data, which can lead to errors in summarization.
*	Limited coverage: LSA tends to focus on the most important concepts and may miss some important details that are not explicitly mentioned in the text.
*	Lack of coherence: LSA may generate summaries that lack coherence, especially when summarizing longer texts.

Overall, LSA is a powerful algorithm for text summarization that can generate accurate and relevant summaries, but it does have limitations that need to be considered when using it. Proper training data and parameter selection can help mitigate some of its limitations.

These are the scores we achieved:

      ROUGE Score:
      Precision: 1.000
      Recall: 0.430
      F1-Score: 0.602

      BLEU Score: 0.869

## References

Here are some research papers on LSA (Latent Semantic Analysis) text summarization:

1. "Automatic Text Summarization Using Latent Semantic Analysis" by Chandra Prakash K, et al. This paper presents a method for text summarization using LSA and shows its effectiveness in summarizing large documents.

2. "Text Summarization Based on Latent Semantic Analysis and Ontology" by Ahmed AbuRa'ed, et al. This paper proposes a method for text summarization using both LSA and ontology-based techniques, achieving better results than using LSA alone.

3. "Using Latent Semantic Analysis in Text Summarization and Summary Evaluation" by Dragomir Radev, et al. This paper presents an overview of using LSA in text summarization and evaluates the quality of summaries generated by LSA-based techniques.

4. "Extractive Text Summarization Using Latent Semantic Analysis with Feature Reduction" by Hanieh Poostchi, et al. This paper proposes a method for extractive text summarization using LSA with feature reduction techniques, achieving better results than using LSA alone.

These are just a few examples of research papers on LSA text summarization. There are many more papers and ongoing research in this field.

In [None]:
!pip install scikit-learn
!pip install rouge
!pip install nltk
from rouge import Rouge 
import nltk
import nltk.translate.bleu_score as bleu
nltk.download('punkt')

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD

In [None]:
text ="""
 India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. The NEGVAC also suggested that private hospitals may be allowed to administer the vaccine, although the details of this are yet to be finalized.India began its vaccination drive in mid-January, starting with healthcare and frontline workers. Since then, over 13 million doses have been administered across the country. However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges.The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in India. The Health Ministry has also urged eligible individuals to come forward and get vaccinated at the earliest.India has reported over 11 million cases of COVID-19, making it the second-worst affected country in the world after the United States. The country's daily case count has been declining in recent weeks, but experts have warned that the pandemic is far from over and that precautions need to be maintained.
In summary, India's Health Ministry has announced that the country's COVID-19 vaccination drive will be expanded to include people over 60 and those over 45 with co-morbidities, covering an additional 270 million people. The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19, and is a major step towards achieving herd immunity and controlling the spread of the virus in India."""

In [None]:
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform([text])

In [None]:
lsa = TruncatedSVD(n_components=1, algorithm='randomized', n_iter=100, random_state=42)
lsa.fit(X)

  self.explained_variance_ratio_ = exp_var / full_var


TruncatedSVD(n_components=1, n_iter=100, random_state=42)

In [None]:
sentences = text.split('.')
important_sentences = np.argsort(np.abs(lsa.components_[0]))[::-1]

# Ensure that the indices in important_sentences are within the range of valid indices for sentences
valid_indices = [i for i in important_sentences if i < len(sentences)]

# Extract the two most important sentences based on the valid indices
summary_sentences = [sentences[i].strip() for i in valid_indices[:3]]

# If there are not enough valid indices, pad the summary with empty strings
while len(summary_sentences) < 2:
    summary_sentences.append('')

In [None]:
summary = '. '.join(summary_sentences) + '.'
print(summary)

The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. The country's daily case count has been declining in recent weeks, but experts have warned that the pandemic is far from over and that precautions need to be maintained. The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in India.


In [None]:
rouge = Rouge()
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 1.000
Recall: 0.430
F1-Score: 0.602


In [None]:
from nltk.translate.bleu_score import sentence_bleu

def summary_to_sentences(summary):
    # Split the summary into sentences using the '.' character as a separator
    sentences = summary.split('.')
    
    # Convert each sentence into a list of words
    sentence_lists = [sentence.split() for sentence in sentences]
    
    return sentence_lists

def paragraph_to_wordlist(paragraph):
    # Split the paragraph into words using whitespace as a separator
    words = paragraph.split()
    return words

reference_paragraph = text
reference_summary = summary_to_sentences(reference_paragraph)
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)

score = sentence_bleu(reference_summary, predicted_summary)
print(score)

0.8694712979282957


In [None]:
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.869
