# BART
**BART (Bidirectional and Auto-Regressive Transformer)** is a transformer-based language model developed by Facebook that can be used for text summarization, among other tasks. Here are some pros and cons of text summarization of news articles using BART:


### Pros:

* State-of-the-art performance: BART is a powerful transformer model that has achieved state-of-the-art performance on many natural language processing tasks, including text summarization.

* High accuracy: BART's transformer architecture allows it to capture complex relationships between words and sentences, resulting in high-quality summaries that preserve important information.

* Customizable: BART's architecture can be fine-tuned to specific domains or use cases, allowing users to generate summaries tailored to their needs.

* Multilingual support: BART can be trained on data in multiple languages, allowing it to generate summaries in a wide range of languages.

### Cons:

* Resource-intensive: Training and using BART for text summarization requires significant computational resources, including high-end GPUs, large amounts of memory, and high-speed storage.

* Large model size: BART is a large model that requires a lot of disk space to store, making it challenging to deploy on devices with limited storage capacity.

* Expertise required: Fine-tuning BART for specific use cases or domains requires expertise in natural language processing and machine learning.

* Dependence on training data: BART's performance is highly dependent on the quality and relevance of the training data used to train the model. If the training data is biased or limited, the quality of the summaries may be compromised.


Overall, BART is a powerful tool for text summarization that can generate high-quality summaries, but it requires significant computational resources and expertise to use effectively. It is best suited for large-scale projects or applications where high accuracy is critical.

These are the scores we achieved: 

    ROUGE Score:
    Precision: 1.000
    Recall: 0.252
    F1-Score: 0.402
    
    BLEU Score: 0.905

### References

1. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Stoyanov, V. (2020). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871-7880).

2. Wang, Y., Liu, Y., & Zhang, X. (2020). Chinese Text Summarization with Pretrained BERT. In Proceedings of the 2020 International Conference on Asian Language Processing (pp. 96-100).

3. Yasunaga, M., & Narayan, S. (2021). Learning to Summarize Scientific Articles with BART. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1824-1834).

4. Peña-Castillo, A., & Blanquer, I. (2021). Summarizing Medical Text with Pretrained Language Models. In International Conference on Computational Methods in Systems Biology (pp. 26-36).

5. Elgohary, A., & Salah, A. (2021). Arabic Text Summarization using BERT and BART. In Proceedings of the International Conference on Machine Learning and Data Engineering (pp. 298-306).

These papers provide insights into how BART can be used for various text summarization tasks such as scientific article summarization, medical text summarization, and summarization in different languages like Chinese and Arabic. They also provide information on the specific techniques and methodologies used for fine-tuning BART for these tasks.





In [1]:
!pip install -U transformers
!pip install sentencepiece
!pip install rouge
!pip install nltk
import torch
import nltk 
nltk.download('punkt')
import json 
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
from rouge import Rouge 
import nltk.translate.bleu_score as bleu

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m45.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.1-py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.2/199.2 KB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m86.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.13.1 tokenizers-0.13.2 transformers-4.26.1
Looking in indexes: https://pypi.org/simple, http

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [2]:
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
device = torch.device('cpu')

text ="""
 India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. The NEGVAC also suggested that private hospitals may be allowed to administer the vaccine, although the details of this are yet to be finalized.India began its vaccination drive in mid-January, starting with healthcare and frontline workers. Since then, over 13 million doses have been administered across the country. However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges.The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in India. The Health Ministry has also urged eligible individuals to come forward and get vaccinated at the earliest.India has reported over 11 million cases of COVID-19, making it the second-worst affected country in the world after the United States. The country's daily case count has been declining in recent weeks, but experts have warned that the pandemic is far from over and that precautions need to be maintained.
In summary, India's Health Ministry has announced that the country's COVID-19 vaccination drive will be expanded to include people over 60 and those over 45 with co-morbidities, covering an additional 270 million people. The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19, and is a major step towards achieving herd immunity and controlling the spread of the virus in India."""

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [9]:
preprocess_text = text.strip().replace("\n","")
bart_prepared_Text = "summarize: "+preprocess_text
print ("Original text preprocessed: \n", preprocess_text)

tokenized_text = tokenizer.encode(bart_prepared_Text, return_tensors="pt").to(device)


Original text preprocessed: 
 India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. The NEGVAC also suggested that private hospitals may be allowed to administer the vaccine, although the details of this are yet to be finalized.India began its vaccination drive in mid-January, starting with healthcare and frontline workers. Since then, over 13 million doses have been administered across the country. However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges.The expansion of the vac

In [8]:
summary_ids = model.generate(tokenized_text,
                                    num_beams=4,
                                    no_repeat_ngram_size=2,
                                    min_length=30,
                                    max_length=700)

output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print ("Summarized text: \n",output)

Summarized text: 
 India has reported over 11 million cases of COVID-19, making it the second-worst affected country in the world after the United States. The country's daily case count has been declining in recent weeks, but experts have warned that the pandemic is far from over.


In [5]:
rouge = Rouge()
scores = rouge.get_scores(output, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 1.000
Recall: 0.252
F1-Score: 0.402


In [6]:
from nltk.translate.bleu_score import sentence_bleu

def summary_to_sentences(summary):
    # Split the summary into sentences using the '.' character as a separator
    sentences = summary.split('.')
    
    # Convert each sentence into a list of words
    sentence_lists = [sentence.split() for sentence in sentences]
    
    return sentence_lists

def paragraph_to_wordlist(paragraph):
    # Split the paragraph into words using whitespace as a separator
    words = paragraph.split()
    return words

reference_paragraph = text
reference_summary = summary_to_sentences(reference_paragraph)
predicted_paragraph = output
predicted_summary = paragraph_to_wordlist(predicted_paragraph)

score = sentence_bleu(reference_summary, predicted_summary)
print(score)

0.905342587629501


In [7]:
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.905
