In [1]:
# Importing transformer and web scraping libraries
from transformers import pipeline
from bs4 import BeautifulSoup
import requests

In [2]:
# Getting the text from the URL
URL = 'https://futurism.com/amazon-kindle-lock-screens-ai-generated-books'
r = requests.get(URL)
r.text



In [3]:
# Web Scraping the page to get only the header and paragraph tags
soup = BeautifulSoup(r.text,'html.parser')
results = soup.find_all(['h1','p'])

sentences=[]
for result in results:
    sentences.append(result.text)
sentences = ' '.join(sentences)
sentences

'Amazon Kindle Lock Screens Are Showing Ads for AI-Generated Books Amazon has been a huge staging ground for the proliferation of AI-generated spam. In fact, as we noticed earlier this year, its marketplace has already started to fill up with shoddy AI-generated listings — at the same time, of course, that Amazon itself is working on tech to generate more of the same. Now the consequences of this proliferation seem to be spilling over into the world of Amazon\'s millions of readers. Many of its Kindles, by far the most popular e-readers in the world, are displaying ads for blatantly AI-generated books. And they\'re showing up not as a little box but in one of the most conspicuous advertising spaces in the publishing industry: the Kindle\'s lock screen. If you were unaware that these reading devices could also be ad vehicles, here\'s a quick background. In the US, Amazon sells Kindles, including the popular Kindle Paperwhite, at a $20 discount off its retail price of $189.99 if you buy 

In [4]:
# Pre processing the text
sentences = sentences.replace('.','.<EOS>')
sentences = sentences.replace('?','?<EOS>')
sentences = sentences.replace('!','!<EOS>')
sentences = sentences.split('<EOS>')

In [5]:
# Text corpus
sentences

['Amazon Kindle Lock Screens Are Showing Ads for AI-Generated Books Amazon has been a huge staging ground for the proliferation of AI-generated spam.',
 ' In fact, as we noticed earlier this year, its marketplace has already started to fill up with shoddy AI-generated listings — at the same time, of course, that Amazon itself is working on tech to generate more of the same.',
 " Now the consequences of this proliferation seem to be spilling over into the world of Amazon's millions of readers.",
 ' Many of its Kindles, by far the most popular e-readers in the world, are displaying ads for blatantly AI-generated books.',
 " And they're showing up not as a little box but in one of the most conspicuous advertising spaces in the publishing industry: the Kindle's lock screen.",
 " If you were unaware that these reading devices could also be ad vehicles, here's a quick background.",
 ' In the US, Amazon sells Kindles, including the popular Kindle Paperwhite, at a $20 discount off its retail p

In [6]:
# Splitting the text corpus into chunks of text with max words < 150
current_chunk = 0
chunks = []
for sentence in sentences:
    if len(chunks) == current_chunk + 1:
        if len(chunks[current_chunk]) + len(sentence.split(' ')) <= 150:
            chunks[current_chunk].extend(sentence.split(' '))
        else:
            current_chunk += 1
            chunks.append(sentence.split(' '))
    else:
        print(current_chunk)
        chunks.append(sentence.split(' '))

for chunk_id in range(len(chunks)):
    chunks[chunk_id] = ' '.join(chunks[chunk_id])

0


In [7]:
# 8 chunks of text
for i in range(len(chunks)):
    print(i, chunks[i])

0 Amazon Kindle Lock Screens Are Showing Ads for AI-Generated Books Amazon has been a huge staging ground for the proliferation of AI-generated spam.  In fact, as we noticed earlier this year, its marketplace has already started to fill up with shoddy AI-generated listings — at the same time, of course, that Amazon itself is working on tech to generate more of the same.  Now the consequences of this proliferation seem to be spilling over into the world of Amazon's millions of readers.  Many of its Kindles, by far the most popular e-readers in the world, are displaying ads for blatantly AI-generated books.  And they're showing up not as a little box but in one of the most conspicuous advertising spaces in the publishing industry: the Kindle's lock screen.  If you were unaware that these reading devices could also be ad vehicles, here's a quick background.
1  In the US, Amazon sells Kindles, including the popular Kindle Paperwhite, at a $20 discount off its retail price of $189. 99 if yo

In [8]:
# Using the transformer summarization model to summarise the chunks of texts
summarizer = pipeline("summarization")
result = summarizer(chunks,max_length=70, min_length = 30, do_sample=False)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [9]:
# Displaying the summarised text
result

[{'summary_text': ' Amazon Kindle Lock Screens are showing ads for AI-generated books on lock screens . Amazon has been a huge staging ground for the proliferation of spam . The company is working on tech to generate more of the same .'},
 {'summary_text': ' In the US, Amazon sells Kindles at a $20 discount off its retail price of $189.99 if you buy an "ad-supported" edition . Ad-supported Kindles will essentially show ads as a big screensaver when the device is locked . Users can pay to remove them at any point for a one-time'},
 {'summary_text': ' "I\'ve owned a Kindle for 10 years or so now," wrote one Reddit user in a post with over 700 upvotes . "I don\'t know why or how this is happening, but it\'s driving me insane," wrote another .'},
 {'summary_text': ' Some of the AI books advertised by the Kindles appear to be flagrant ripoffs of existing works . "Scary Stories to Tell in the Dark: The Haunted House" is an obvious imitation of the classic children\'s horror short story colle