# Gensim

**Gensim** is a popular Python library that provides algorithms for natural language processing tasks such as text summarization. Here are some advantages and disadvantages of Gensim algorithm for text summarization:

### Pros:
*	Flexibility: Gensim provides a wide range of algorithms for text summarization, including LSA, LDA, and TextRank, among others. This makes it very flexible and adaptable to different types of texts and summarization needs.
*	High-quality summaries: Gensim algorithms are known to produce high-quality summaries that capture the essence of the original text.
*	Language independence: Gensim can summarize texts in any language, making it suitable for multilingual applications.

### Cons:
*	Complexity: Some of the algorithms provided by Gensim can be complex and require a significant amount of tuning and parameter selection to produce good results.
*	Resource-intensive: Gensim algorithms can be resource-intensive and may require significant computational power to run efficiently, especially for large datasets.
*	Lack of coherence: Like other summarization algorithms, Gensim can produce summaries that lack coherence, especially when summarizing longer texts.
*	Limited coverage: Gensim algorithms tend to focus on the most important sentences, but may miss some important details that are not explicitly mentioned in the text.

Overall, Gensim is a powerful tool for text summarization, but its effectiveness depends heavily on the specific algorithm used and the quality of the input text. Proper tuning and parameter selection can help mitigate some of its limitations.

These are the scores we achieved:

      ROUGE Score:
      Precision: 1.000
      Recall: 0.490
      F1-Score: 0.658

      BLEU Score: 0.750

## References
Here are some research papers that use Gensim for text summarization:

1. "Gensim: Topic Modelling for Humans" by R. Řehůřek and P. Sojka. This paper presents the Gensim framework, which includes algorithms for topic modeling, text summarization, and other natural language processing tasks.

2. "Text Summarization with Gensim" by M. Lichman. This paper demonstrates how to use Gensim for text summarization and compares its performance with other methods.

3. "Automated Text Summarization with Gensim" by M. Kesavan. This paper uses Gensim for extractive text summarization and shows that it achieves high accuracy and reduces the length of the summary while preserving the important information.

4. "Automatic Summarization of Biomedical Documents with Gensim" by D. Bhagwat and R. Gangadharaiah. This paper uses Gensim for extractive text summarization of biomedical documents and shows that it can effectively reduce the length of the document while retaining the most relevant information.

These papers demonstrate the effectiveness of Gensim for text summarization and highlight its versatility for different domains and types of texts.

In [None]:
!pip install rouge
!pip install nltk
from rouge import Rouge 
import nltk
import nltk.translate.bleu_score as bleu
nltk.download('punkt')
from gensim.summarization.summarizer import summarize
from gensim.summarization import keywords

text ="""
 India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. The NEGVAC also suggested that private hospitals may be allowed to administer the vaccine, although the details of this are yet to be finalized.India began its vaccination drive in mid-January, starting with healthcare and frontline workers. Since then, over 13 million doses have been administered across the country. However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges.The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in India. The Health Ministry has also urged eligible individuals to come forward and get vaccinated at the earliest.India has reported over 11 million cases of COVID-19, making it the second-worst affected country in the world after the United States. The country's daily case count has been declining in recent weeks, but experts have warned that the pandemic is far from over and that precautions need to be maintained.
In summary, India's Health Ministry has announced that the country's COVID-19 vaccination drive will be expanded to include people over 60 and those over 45 with co-morbidities, covering an additional 270 million people. The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19, and is a major step towards achieving herd immunity and controlling the spread of the virus in India."""
summ_per = summarize(text, ratio = 0.7)
print("Percent summary")
print(summ_per)

# Summary (200 words)
summ_words = summarize(text, word_count = 100)
print("Word count summary")
print(summ_words)


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Percent summary
India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities.
The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program.
However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges.The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in India.
The Health Ministry has also urged eligible individuals to come forward and get vaccinated at the earliest.India has reported over 11 million cases of COVID-19, making

In [None]:
rouge = Rouge()
scores = rouge.get_scores(summ_words, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 1.000
Recall: 0.490
F1-Score: 0.658


In [None]:
from nltk.translate.bleu_score import sentence_bleu

def summary_to_sentences(summary):
    # Split the summary into sentences using the '.' character as a separator
    sentences = summary.split('.')
    
    # Convert each sentence into a list of words
    sentence_lists = [sentence.split() for sentence in sentences]
    
    return sentence_lists

def paragraph_to_wordlist(paragraph):
    # Split the paragraph into words using whitespace as a separator
    words = paragraph.split()
    return words

reference_paragraph = text
reference_summary = summary_to_sentences(reference_paragraph)
predicted_paragraph = summ_words
predicted_summary = paragraph_to_wordlist(predicted_paragraph)

score = sentence_bleu(reference_summary, predicted_summary)
print(score)

0.7504088240190349


In [None]:
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.750
