## HuggingFace Summarization Transformers

In [10]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [17]:
from brief_news.interface.main import get_articles
from brief_news.ml_logic.params import HUGGING_API_TOKEN

import tensorflow
import json
import requests
import pandas as pd
from itertools import chain
from transformers import pipeline

from dotenv import load_dotenv, find_dotenv
import os

## Loading data

Calling the module:
  - gets links from NewsApi
  - scrapes webpages

In [36]:
%%time
# getting actual articles from module
articles_df = get_articles('sports')

CPU times: user 850 ms, sys: 269 ms, total: 1.12 s
Wall time: 3.16 s


In [37]:
articles_df

Unnamed: 0,title,article,id,orig_id
0,Cómo ver el partido Inglaterra - Francia por t...,Por CNN Español,0,0
1,Deion Sanders decided to stop coaching at HBCU...,College football fans and HBCU alumni are stil...,0,0
2,Brittney Griner release: After release from Ru...,After being imprisoned in Russia for nearly 10...,0,0
3,Kylian Mbappé is reaching speeds of 22 miles p...,Ahead of its World Cup quarterfinal against Fr...,0,0
4,World Cup quarterfinals: Morocco on the verge ...,Morocco already caused the upset of the round ...,0,0


In [38]:
articles_df.article[0]

'Por CNN Español'

In [39]:
articles_df.title[0]

'Cómo ver el partido Inglaterra - Francia por televisión e internet'

## Testing API bart-large-cnn

Calling HuggingFace API

In [40]:
hf_token = HUGGING_API_TOKEN

In [41]:
headers = {"Authorization": f"Bearer {hf_token}"}
API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn"

def query(payload):
    """
    Function sends post request to hugging face api for 'summarization' service
    with bart-large-cnn model
    """
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8"))


In [42]:
df_bart_large = articles_df.copy()
df_bart_large

Unnamed: 0,title,article,id,orig_id
0,Cómo ver el partido Inglaterra - Francia por t...,Por CNN Español,0,0
1,Deion Sanders decided to stop coaching at HBCU...,College football fans and HBCU alumni are stil...,0,0
2,Brittney Griner release: After release from Ru...,After being imprisoned in Russia for nearly 10...,0,0
3,Kylian Mbappé is reaching speeds of 22 miles p...,Ahead of its World Cup quarterfinal against Fr...,0,0
4,World Cup quarterfinals: Morocco on the verge ...,Morocco already caused the upset of the round ...,0,0


In [46]:
%%time
# summarizing articles into 150 words with facebook/bart-large-cnn
df_bart_large['summary_text'] = df_bart_large['article'].apply(lambda article: query({'inputs':article, "parameters": {"max_length": 150}})[0]['summary_text'] if len(article.split()) > 10 else None)
df_bart_large

CPU times: user 83.4 ms, sys: 9.39 ms, total: 92.8 ms
Wall time: 28.1 s


Unnamed: 0,title,article,id,orig_id,summary_text
0,Cómo ver el partido Inglaterra - Francia por t...,Por CNN Español,0,0,
1,Deion Sanders decided to stop coaching at HBCU...,College football fans and HBCU alumni are stil...,0,0,Deion Sanders announced his departure from Jac...
2,Brittney Griner release: After release from Ru...,After being imprisoned in Russia for nearly 10...,0,0,Brittney Griner arrives at Brooke Army Medical...
3,Kylian Mbappé is reaching speeds of 22 miles p...,Ahead of its World Cup quarterfinal against Fr...,0,0,Kylian Mbappé has scored five goals at the Wor...
4,World Cup quarterfinals: Morocco on the verge ...,Morocco already caused the upset of the round ...,0,0,Morocco takes on Portugal in the quarterfinals...


In [48]:
print('Article : ')
print(df_bart_large['article'][1])
print('\n')
print(' * ' * 20)
print('\n')
print('Summary : ')
print(df_bart_large['summary_text'][1])

Article : 
College football fans and HBCU alumni are still coming to terms with Deion Sanders announcing his departure from Jackson State University for his new head coaching gig at the University of Colorado. The move struck a chord, especially among alumni of the Mississippi college, with some calling Sanders a “sell out” for leaving the historically Black JSU for the predominantly white CU. Others are angry about him selling the dream of changing the athletic culture at historically Black colleges and universities, or HBCUs, across the US and leaving after only three years. While some were hopeful about everything Sanders said he could accomplish for JSU and other HBCUs, they “failed to realize this history of segregation, the history of integration and the history of the way TV contracts work really put these schools behind the 8-ball, so to speak,” said Louis Moore, a history professor at Grand Valley State University in Michigan. It’s complicated, but the anger, confusion and dis

### Distilbart

In [51]:
df_distillbart = articles_df.copy()
df_distillbart

Unnamed: 0,title,article,id,orig_id
0,Cómo ver el partido Inglaterra - Francia por t...,Por CNN Español,0,0
1,Deion Sanders decided to stop coaching at HBCU...,College football fans and HBCU alumni are stil...,0,0
2,Brittney Griner release: After release from Ru...,After being imprisoned in Russia for nearly 10...,0,0
3,Kylian Mbappé is reaching speeds of 22 miles p...,Ahead of its World Cup quarterfinal against Fr...,0,0
4,World Cup quarterfinals: Morocco on the verge ...,Morocco already caused the upset of the round ...,0,0


In [52]:
API_URL_distilbart = "https://api-inference.huggingface.co/models/philschmid/distilbart-cnn-12-6-samsum"

def query(payload):
	response = requests.post(API_URL_distilbart, headers=headers, json=payload)
	return response.json()
	


In [54]:
data = query({'inputs': df_distillbart['article'][4]})
data

[{'summary_text': 'France and England will play each other in the last eight of the World Cup quarterfinals. Portugal will play Morocco in the quarterfinals against the Atlas Lions. The last time Portugal and England met was in the group stage of the 2018 World Cup in a frenetic game which Ronaldo settled with the only goal.'}]

In [57]:
%%time
# summarizing articles into 150 words with distilbart-cnn
df_distillbart['summary_text'] = df_distillbart.article.apply(lambda article: query({'inputs':article})[0]['summary_text'] if len(article.split()) > 10 else None)
df_distillbart

CPU times: user 77.3 ms, sys: 9.54 ms, total: 86.9 ms
Wall time: 1.97 s


Unnamed: 0,title,article,id,orig_id,summary_text
0,Cómo ver el partido Inglaterra - Francia por t...,Por CNN Español,0,0,
1,Deion Sanders decided to stop coaching at HBCU...,College football fans and HBCU alumni are stil...,0,0,Deion Sanders is leaving Jackson State Univers...
2,Brittney Griner release: After release from Ru...,After being imprisoned in Russia for nearly 10...,0,0,Brittney Griner arrived at Brooke Army Medica...
3,Kylian Mbappé is reaching speeds of 22 miles p...,Ahead of its World Cup quarterfinal against Fr...,0,0,Kylian Mbappé is the tournament's top scorer a...
4,World Cup quarterfinals: Morocco on the verge ...,Morocco already caused the upset of the round ...,0,0,France and England will play each other in the...


In [58]:
print('Bert CNN Large : ')
print(df_bart_large.summary_text[4])
print('\n')
print(' * ' * 20)
print('\n')
print('Distillbart summary : ')
print(df_distillbart.summary_text[4])

Bert CNN Large : 
Morocco takes on Portugal in the quarterfinals of the World Cup. The Atlas Lions are the first African side to qualify for the last eight in over a decade. Portugal's Cristiano Ronaldo was dropped by Fernando Santos. England takes on France in the final game of the day.


 *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  * 


Distillbart summary : 
France and England will play each other in the last eight of the World Cup quarterfinals. Portugal will play Morocco in the quarterfinals against the Atlas Lions. The last time Portugal and England met was in the group stage of the 2018 World Cup in a frenetic game which Ronaldo settled with the only goal.
