# Gensim Text Summarizer

In this notebook we will use the gensim Text Summarizer to summarize text

In [1]:
import warnings

warnings.filterwarnings('ignore')
from nltk.tokenize import sent_tokenize
from gensim.summarization import summarize
import requests

In [2]:
%autosave 120

Autosaving every 120 seconds


# Load the WhyClick Page

In [3]:
from bs4 import BeautifulSoup 

r = requests.get('https://click.palletsprojects.com/en/7.x/why/')
r.encoding = 'utf-8'
soup = BeautifulSoup(r.text)
why_click = soup.find(id="why-click").text.replace('\n', ' ')

In [4]:
import pandas as pd

pd.set_option('display.max_colwidth',500)
def show_sentences(text):
    sentences = sent_tokenize(text) if type(text) is str else text
    return pd.DataFrame({'Sentence': sentences})
   

In [5]:
why_click

' Why Click?¶ There are so many libraries out there for writing command line utilities; why does Click exist? This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box supports for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click and you can have a look at them if you enjoy them better.  The obvious ones are optparse and argparse from the standard library. Click actually implements its own parsing of arguments and does not use optparse or argp

In [13]:
show_sentences(why_click)

Unnamed: 0,Sentence
0,Why Click?¶ There are so many libraries out there for writing command line utilities; why does Click exist?
1,This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes: is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box supports for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting ...
2,There are many alternatives to Click and you can have a look at them if you enjoy them better.
3,The obvious ones are optparse and argparse from the standard library.
4,Click actually implements its own parsing of arguments and does not use optparse or argparse following the optparse parsing behavior.
5,The reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some deficiencies when it comes to POSIX compliant argument handling.
6,Click is designed to be fun to work with and at the same time not stand in your way.
7,It’s not overly flexible either.
8,"Currently, for instance, it does not allow you to customize the help pages too much."
9,This is intentional because Click is designed to allow you to nest command line utilities.


## Summarize

In [9]:
summary = summarize(why_click)
summary

'This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box supports for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click and you can have a look at them if you enjoy them better.\nThe reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some deficiencies when it comes to POSIX compliant argument handling.\nThe reason however Click is not using argparse is that it has some problematic be

In [10]:
summary = summarize(why_click, split=True)
summary

['This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box supports for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click and you can have a look at them if you enjoy them better.',
 'The reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some deficiencies when it comes to POSIX compliant argument handling.',
 'The reason however Click is not using argparse is that it has some problem

In [96]:
summary = summarize(why_click, ratio=0.1)
show_sentences(summary)

Unnamed: 0,Sentence
0,This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes: is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box supports for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting ...
1,There are many alternatives to Click and you can have a look at them if you enjoy them better.
2,The reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some deficiencies when it comes to POSIX compliant argument handling.
3,The reason however Click is not using argparse is that it has some problematic behaviors that make handling arbitrary command line interfaces hard: argparse has built-in magic behavior to guess if something is an argument or an option.
4,"Click aims to support fully composable command line user interfaces by doing the following: Click does not just parse, it also dispatches to the appropriate code."
5,Click has strong information available for all parameters and commands so that it can generate unified help pages for the full CLI and to assist the user in converting the input data as necessary.


In [12]:
summary = summarize(why_click, word_count=200)
summary

'This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box supports for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click and you can have a look at them if you enjoy them better.\nThe reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some deficiencies when it comes to POSIX compliant argument handling.\nThe reason however Click is not using argparse is that it has some problematic be