# Text summarization of article/document using different algorithms in Python.


Text summarization is the process of making large documents into smaller ones without losing the context, which eventually saves readers time. This can be done using different techniques like the following:
    • TextRank: A graph-based ranking algorithm
    • Feature-based text summarization
    • LexRank: TF-IDF with a graph-based algorithm
    • Topic based
    • Using sentence embeddings
    • Encoder-Decoder Model: Deep learning techniques

# TextRank


"""
TextRank is the graph-based ranking algorithm for NLP. It is basically inspired by PageRank, which is used in the Google search engine but particularly designed for text. It will extract the topics, create nodes out of
them, and capture the relation between nodes to summarize the text.

"""

In [1]:
# Import BeautifulSoup and urllib libraries to fetch data from Wikipedia.
from bs4 import BeautifulSoup
from urllib.request import urlopen

In [4]:
# Function to get data from Wikipedia
def get_only_text(url):
    page = urlopen(url)
    soup = BeautifulSoup(page)
    text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
    #print (text)
    return soup.title.text, text


In [5]:
# Mention the Wikipedia url
url="https://en.wikipedia.org/wiki/Natural_language_processing"
# Call the function created above
text = get_only_text(url) 

In [8]:
# Count the number of letters 
len(''.join(text))

11102

In [2]:
# Lets see first 1000 letters from the text
text[:1000]

NameError: name 'text' is not defined

In [15]:
text = input(" enter your resume : ")

 enter your resume : Triplebyte screens and evaluates thousands of engineers per month to find the best candidates for our partner companies. Human decision making doesn't work at our scale; our marketplace is powered by automated assessment and decision making. Triplebyte has three cornerstone ML products: our quiz, our interview, and our matchmaking. As a machine learning engineer, you'll be responsible for the end-to-end process of designing and running experiments to serving production models at scale. Some of our pipelines use off the shelf components, but we're also implementing custom models and techniques from the latest research papers. We're also building forecasting tools for internal teams to measure and predict outcomes. This is an ideal role for an engineer or data scientist who wants the scope and responsibility to own features/products from the inception and research phase through to measuring real-world results. Fields your work will touch on  Psychometrics Recommender

In [16]:
# Import summarize from gensim
from gensim.summarization.summarizer import summarize
from gensim.summarization import keywords
# Convert text to string format
text = str(text)

In [17]:
#Summarize the text with ratio 0.1 (10% of the total words.)
summarize(text, ratio=0.2)

"Triplebyte screens and evaluates thousands of engineers per month to find the best candidates for our partner companies.\nAs a machine learning engineer, you'll be responsible for the end-to-end process of designing and running experiments to serving production models at scale.\nSome of our pipelines use off the shelf components, but we're also implementing custom models and techniques from the latest research papers.\nOur ultimate goal is to collect the largest dataset and use this to build the world's best technical hiring process."

In [18]:
#keywords
print(keywords(text, ratio=0.1))

companies
company
research
engineers
engineer
engineering
hiring
hire
outcomes
evaluates
evaluating
evaluation
production models
products
analysis
novel
forecasting
