<a href="https://colab.research.google.com/github/KevinLolochum/BERT-MODELS/blob/main/BERT_Text_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**PyTorch Solution**

***1. Installing text summarizer, neuralcoref and spacy.***


* Spacy models enable corefencing explained [here](https://spacy.io/universe/project/neuralcoref). 
* Coreferencing is the ability of models to predict the antecedent that is being refered to later in the sentence/paragraph through ranking. 
* I will not use coferencing for this model but it is good to know and might use in future models.



In [None]:
! pip install bert-extractive-summarizer
! pip install neuralcoref
! pip install spacy==2.2.2

In [38]:
# Libraries

import spacy
import numpy as np
import pandas as pd
import torch
from transformers import AutoConfig, AutoModel, AutoTokenizer, AutoConfig
from summarizer import Summarizer
from summarizer.coreference_handler import CoreferenceHandler

import en_core_web_sm


***2. Instantiating the Model***



In [None]:
# Initiating model and tokenizer

Config = AutoConfig.from_pretrained('bert-base-cased')
Config.output_hidden_states = True
MAIN_MODEL = AutoModel.from_pretrained('bert-base-cased', config = Config)
TOKENIZER = AutoTokenizer.from_pretrained('bert-base-cased')

# Model

model = Summarizer(custom_model= MAIN_MODEL, custom_tokenizer= TOKENIZER)

***3. Loading COVID19 News Data from the Web and Cleaning***

* Cleanig data is pretty simple because the model only requires a corpus and it will do all the rest on it's own.
* I will use Beautifulsoup and requests to create a corpus to parse to the model.



In [47]:
from bs4 import BeautifulSoup
from bs4.element import Comment
from urllib.request import urlopen

# Data
URL = "https://www.cbc.ca/news/canada/toronto/demand-covid-19-treatment-ontario-1.5816276"


def visible_tags(element):
  ''' Function to return soup data under this tags'''

  if element.parent.name in ['style', 'script','head', 'title', 'meta', '[document]']:
        return False
  if isinstance(element, Comment):
        return False

  return True


def html_to_text(raw):

  '''Function to clean text data and join'''

  soup = BeautifulSoup(raw, 'html.parser')
  texts = soup.findAll(text=True)
  visible_texts = filter(visible_tags, texts)

  return u" ".join(t.strip() for t in visible_texts)

html = urllib.request.urlopen(URL).read()
body = html_to_text(html)
body

'         Skip to Main Content Menu Search Search Quick Links News Sports Radio Music Listen Live TV Watch COVID-19 Local updates Watch Live COVID-19 tracker Subscribe to newsletter Top Stories Local The National Opinion World Canada Politics Indigenous Business Health Entertainment Tech & Science CBC News Investigates Go Public Shows About CBC News Toronto Demand spikes for COVID-19 treatment that\'s saving lives but is in limited supply in Canada As coronavirus infections are surging to record-breaking levels in Ontario, there\'s concern that demand is spiking\xa0for ECMO, or extracorporeal membrane oxygenation, a last-resort treatment for some of the sickest COVID-19 patients that\'s in limited supply. Social Sharing About 40 Canadian hospitals, or 3% of all sites, have access to an ECMO machine Lauren Pelley · CBC News · Posted: Nov 26, 2020 4:00 AM ET | Last Updated: November 26 Tony and Linda Passarelli of Bolton, Ont., and their three children all wound up sick this past spring 

***4. Creating the summary and evaluation***


* Now we can pass the corpus above through our model to create the summary and evaluate
* There are a number of parameters that you can specify e.g size of output summary relative to input corpus (ratio) and minimum length per sentence(min_length).





In [55]:
# Exploring the output
result = model(body, ratio=0.25, min_length=25)
summary = ''.join(result)
print(summary)

Social Sharing About 40 Canadian hospitals, or 3% of all sites, have access to an ECMO machine Lauren Pelley · CBC News · Posted: Nov 26, 2020 4:00 AM ET | Last Updated: November 26 Tony and Linda Passarelli of Bolton, Ont., While she isolated in a room at the couple's Bolton, Ont., 150 COVID-19 patients now in Ontario ICUs, hitting key threshold to impact other procedures  After passing out in the hospital, he wound up intubated in an intensive care unit, was transferred to Etobicoke General Hospital in Toronto, suffered round after round of fevers and infections, then became so ill that doctors thought there was nothing more they could do to keep him alive. Then came a sliver of hope. In the pandemic's first wave in Ontario, 34 COVID-19 patients were given this potentially life-saving treatment, and more than half survived. "It's a pretty high level," said Dr. Marcelo Cypel, surgical director for the University Health Network's extracorporeal life support program, which includes the 

Calculating accuracy score using BLEU.

In [None]:
from nltk.translate.bleu_score import corpus_bleu

score = corpus_bleu(body, summary)
score