The notebook below uses the nltk package (Natural Language Tool Kit) to create a summary of online articles. The sample article is a wikipedia article about reinforcement learning. Change the URL to get a summary of a different article. An article by Ekta Shah guided this approach to text summarization. 
https://www.analyticsvidhya.com/blog/2020/12/tired-of-reading-long-articles-text-summarization-will-make-your-task-easier/.

In [10]:
! pip install bs4
! pip install lxml
! pip install --user -U nltk

import bs4 as bs
import urllib.request
import re
import nltk
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

The code below obtains data through web scraping. The code uses the the BeautifulSoup and lxml libraries to parse text. Swap in another URL to summarize another article. 

In [11]:
# Replace URL with another article
scraped_data = urllib.request.urlopen('https://en.wikipedia.org/wiki/Reinforcement_learning')
article = scraped_data.read()
parsed_article = bs.BeautifulSoup(article, 'lxml')
paragraphs = parsed_article.find_all('p')
article_text = ""
for p in paragraphs:
  article_text += p.text

In [12]:
# Remove square brackets and extra spaces from article
original_word_count = article_text.count(" ") + 1
article_text = re.sub(r"[[0-9]*]", "", article_text)
article_text = re.sub(r"\s+", " ", article_text)

# Remove special characters and extra whitespace
formatted_text = re.sub("[^a-zA-Z]", " ", article_text)
formatted_text = re.sub(r"\s+", " ", formatted_text)

The code below creates a word frequency count. The nltk package provides stop words. 

In [13]:
# break sentences into words
sentence_list = nltk.sent_tokenize(article_text)
# obtain stop words from nltk library
stopwords = nltk.corpus.stopwords.words('english')
word_frequencies = {}

# Create a word count of all words that are not stopwords
for word in nltk.word_tokenize(formatted_text):
  if word not in stopwords:
    if word not in word_frequencies.keys():
      word_frequencies[word] = 1
    else:
      word_frequencies[word] +=1

max_frequency = max(word_frequencies.values())

# Calculate the weighted frequencies by dividing the frequency of each word by te max frequency 
for word in word_frequencies.keys():
  word_frequencies[word] = (word_frequencies[word]/max_frequency)


Calculate scores for the sentences.

In [14]:
sentence_scores = {}
for sentence in sentence_list:
  for word in nltk.word_tokenize(sentence.lower()):
    if word in word_frequencies.keys() and len(sentence.split(' ')) < 30:
        if sentence not in sentence_scores.keys():
          sentence_scores[sentence] = word_frequencies[word]
        else:
          sentence_scores[sentence] += word_frequencies[word]

The code below creates a summary using the top n sentences in the sentence scores dictionary. 

In [15]:
import heapq
import textwrap

# Create a summary of sentences using the top n sentences. 
summary_sentences = heapq.nlargest(7, sentence_scores, key=sentence_scores.get)
summary = " ".join(summary_sentences)

# Format paragraph output
summary = textwrap.dedent(summary).strip()
print(textwrap.fill(summary, width = 150))
print("")

# Print orignal word count and summary word count
word_count_summary = summary.count(" ") + 1
print(f"Summary Word Count: {word_count_summary}")
print(f"Original Word Count: {original_word_count}")

Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Research topics
include Associative reinforcement learning tasks combine facets of stochastic learning automata tasks and supervised learning pattern classification
tasks. In reinforcement learning methods, expectations are approximated by averaging over samples and using function approximation techniques to cope
with the need to represent value functions over large state-action spaces. The work on learning ATARI games by Google DeepMind increased attention to
deep reinforcement learning or end-to-end reinforcement learning. Policy iteration consists of two steps: policy evaluation and policy improvement.
Two elements make reinforcement learning powerful: the use of samples to optimize performance and the use of function approximation to deal with large
environments. Assuming full knowledge of the MDP, the two basic approaches to compute the optimal action-v