In [138]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


##AI Blog Post Summarization with Hugging Face Transformers & Beautiful Soup Web Scraping

In [139]:
from transformers import pipeline
from bs4 import BeautifulSoup
import requests


##1. Load Summarization Pipeline

In [140]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


#2. Get Blog Post From Medium

In [141]:
URL ="https://hackernoon.com/will-the-game-stop-with-gamestop-or-is-this-just-the-beginning-2j1x32aa"


In [142]:
r = requests.get(URL)

In [143]:
r.text



In [144]:
soup = BeautifulSoup(r.text,"html.parser")
results = soup.find_all(['h1','p'])

In [145]:
results

[<h1>Will The Game Stop with Gamestop Or Is This Just The Beginning?</h1>,
 <p>Crypto, Markets, Trading</p>,
 <p class="paragraph">The GameStop squeeze on short-sellers is an extraordinary event in markets, where at face value, retail traders and investors have worked together in an attempt to put some of the largest wall street institutions out of business.</p>,
 <p class="paragraph">The events can be interpreted with many viable lenses and there are ironies baked in that are pure serendipity. There has been a centrally controlled game in the global financial system in which insiders benefited while outsiders got hurt that comes to a head with a company called GameStop. The broking firm of most of the retail side of this warfare ‘RobinHood’ is literally stealing from its poor, retail investors to give to its rich, capital backers.</p>,
 <p class="paragraph">One of the historical realities of this game has been that macro-investing – the sages of not only portfolio management, but ofte

In [146]:
#soup

In [147]:
results[0]

<h1>Will The Game Stop with Gamestop Or Is This Just The Beginning?</h1>

In [148]:
results[1]

<p>Crypto, Markets, Trading</p>

In [149]:
text = [result.text for result in results]
ARTICLE = ' '.join(text)

In [150]:
ARTICLE

'Will The Game Stop with Gamestop Or Is This Just The Beginning? Crypto, Markets, Trading The GameStop squeeze on short-sellers is an extraordinary event in markets, where at face value, retail traders and investors have worked together in an attempt to put some of the largest wall street institutions out of business. The events can be interpreted with many viable lenses and there are ironies baked in that are pure serendipity. There has been a centrally controlled game in the global financial system in which insiders benefited while outsiders got hurt that comes to a head with a company called GameStop. The broking firm of most of the retail side of this warfare ‘RobinHood’ is literally stealing from its poor, retail investors to give to its rich, capital backers. One of the historical realities of this game has been that macro-investing – the sages of not only portfolio management, but often also sophisticated social and cultural figures – have had a hard time making money in markets

##3. Chunk Text:

In [151]:
ARTICLE = ARTICLE.replace('.','.<eos>')
ARTICLE = ARTICLE.replace('!','!<eos>')
ARTICLE = ARTICLE.replace('?','?<eos>')
sentences = ARTICLE.split('<eos>')

In [152]:
sentences

['Will The Game Stop with Gamestop Or Is This Just The Beginning?',
 ' Crypto, Markets, Trading The GameStop squeeze on short-sellers is an extraordinary event in markets, where at face value, retail traders and investors have worked together in an attempt to put some of the largest wall street institutions out of business.',
 ' The events can be interpreted with many viable lenses and there are ironies baked in that are pure serendipity.',
 ' There has been a centrally controlled game in the global financial system in which insiders benefited while outsiders got hurt that comes to a head with a company called GameStop.',
 ' The broking firm of most of the retail side of this warfare ‘RobinHood’ is literally stealing from its poor, retail investors to give to its rich, capital backers.',
 ' One of the historical realities of this game has been that macro-investing – the sages of not only portfolio management, but often also sophisticated social and cultural figures – have had a hard ti

In [153]:
sentences[0]

'Will The Game Stop with Gamestop Or Is This Just The Beginning?'

In [154]:
sentences[1]

' Crypto, Markets, Trading The GameStop squeeze on short-sellers is an extraordinary event in markets, where at face value, retail traders and investors have worked together in an attempt to put some of the largest wall street institutions out of business.'

In [155]:
a = sentences[0].split(' ')
a

['Will',
 'The',
 'Game',
 'Stop',
 'with',
 'Gamestop',
 'Or',
 'Is',
 'This',
 'Just',
 'The',
 'Beginning?']

In [156]:
max_chunk = 500
current_chunk = 0
chunks = []
for sentence in sentences:
  if len(chunks) ==current_chunk+1: #counter starting from 1
    if len(chunks[current_chunk])+ len(sentence.split(' '))<=max_chunk: #length of current sentence <= lengh of  the max_chunk (<=500)
      chunks[current_chunk].extend(sentence.split(' '))
      
    else: # 
      current_chunk=+1
      chunks.append(sentence.split(' '))
     
  else:
    #print(current_chunk)
    chunks.append(sentence.split(' '))



In [157]:
chunks

[['Will',
  'The',
  'Game',
  'Stop',
  'with',
  'Gamestop',
  'Or',
  'Is',
  'This',
  'Just',
  'The',
  'Beginning?',
  '',
  'Crypto,',
  'Markets,',
  'Trading',
  'The',
  'GameStop',
  'squeeze',
  'on',
  'short-sellers',
  'is',
  'an',
  'extraordinary',
  'event',
  'in',
  'markets,',
  'where',
  'at',
  'face',
  'value,',
  'retail',
  'traders',
  'and',
  'investors',
  'have',
  'worked',
  'together',
  'in',
  'an',
  'attempt',
  'to',
  'put',
  'some',
  'of',
  'the',
  'largest',
  'wall',
  'street',
  'institutions',
  'out',
  'of',
  'business.',
  '',
  'The',
  'events',
  'can',
  'be',
  'interpreted',
  'with',
  'many',
  'viable',
  'lenses',
  'and',
  'there',
  'are',
  'ironies',
  'baked',
  'in',
  'that',
  'are',
  'pure',
  'serendipity.',
  '',
  'There',
  'has',
  'been',
  'a',
  'centrally',
  'controlled',
  'game',
  'in',
  'the',
  'global',
  'financial',
  'system',
  'in',
  'which',
  'insiders',
  'benefited',
  'while',
  '

In [158]:
for chunk_id in range(len(chunks)):
  chunks[chunk_id] = ' '.join(chunks[chunk_id])

In [159]:
chunks

['Will The Game Stop with Gamestop Or Is This Just The Beginning?  Crypto, Markets, Trading The GameStop squeeze on short-sellers is an extraordinary event in markets, where at face value, retail traders and investors have worked together in an attempt to put some of the largest wall street institutions out of business.  The events can be interpreted with many viable lenses and there are ironies baked in that are pure serendipity.  There has been a centrally controlled game in the global financial system in which insiders benefited while outsiders got hurt that comes to a head with a company called GameStop.  The broking firm of most of the retail side of this warfare ‘RobinHood’ is literally stealing from its poor, retail investors to give to its rich, capital backers.  One of the historical realities of this game has been that macro-investing – the sages of not only portfolio management, but often also sophisticated social and cultural figures – have had a hard time making money in m

In [160]:
chunks[0]

'Will The Game Stop with Gamestop Or Is This Just The Beginning?  Crypto, Markets, Trading The GameStop squeeze on short-sellers is an extraordinary event in markets, where at face value, retail traders and investors have worked together in an attempt to put some of the largest wall street institutions out of business.  The events can be interpreted with many viable lenses and there are ironies baked in that are pure serendipity.  There has been a centrally controlled game in the global financial system in which insiders benefited while outsiders got hurt that comes to a head with a company called GameStop.  The broking firm of most of the retail side of this warfare ‘RobinHood’ is literally stealing from its poor, retail investors to give to its rich, capital backers.  One of the historical realities of this game has been that macro-investing – the sages of not only portfolio management, but often also sophisticated social and cultural figures – have had a hard time making money in ma

In [161]:
chunks[1]

'  Niederhoffer studied statistics and economics at Harvard and the University of Chicago, was a finance professor at the University of California and while at college he co-founded an investment bank.  Having never picked up a racquet before Harvard, Niederhoffer won the squash national junior title a year later, graduated as the national intercollegiate champion, won the U. S.  nationals 5 times and defeated one of the greatest players in the history of the sport in the North American Open, becoming a member of the squash hall of fame.  His investment record is tremendous, with an average of a 35% return annualised and once working for fellow legend of funds management George Soros.  When it came time for his son to learn the family business, Soros sent his son to Niederhoffer.  Niederhoffers books ‘Practical Speculation’ and ‘The Education of a Speculator’ are essential reading for market speculation, a ‘Reminiscences of a Stock Operator’ for the thinking man.   And with changes in 

#4. Summarise text

In [168]:
res = summarizer(chunks,max_length=120,min_length=30,do_sample=False)

Your max_length is set to 120, but you input_length is only 36. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=18)
Your max_length is set to 120, but you input_length is only 14. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=7)
Your max_length is set to 120, but you input_length is only 68. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=34)
Your max_length is set to 120, but you input_length is only 3. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=1)
Your max_length is set to 120, but you input_length is only 22. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=11)
Your max_length is set to 120, but you input_length is only 80. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=40)
Your max_length is set to 120, but you input_length is only 53. You might consider de

In [169]:
res

[{'summary_text': ' The GameStop squeeze on short-sellers is an extraordinary event in markets, where at face value, retail traders and investors have worked together in an attempt to put some of the largest wall street institutions out of business . The broking firm of most of the retail side of this warfare ‘RobinHood’ is stealing from its poor, retail investors to give to its rich, capital backers .'},
 {'summary_text': ' Victor Niederhoffer blew his hedge fund up in 1997 in a highly statistically improbable event, in which he sold puts that were targeted by market mechanics, rather than ‘truth’. The market here is more leveraged, more volatile, more aggressive, better for types of trading and worse for investing. The fundamentals no longer matter; and this was demonstrated only 5 years later with what could be the largest bubble of viable assets in history .'},
 {'summary_text': ' \'It was a one in 2,000 shot that the market would decline like that” ‘ And it baffled many sophistica

In [170]:
len(res)

115