Skip to content

An automated Text Summarization agent to get relevant extractive summary from a corpus against a user query.

Notifications You must be signed in to change notification settings

colinsongf/query-text-summarization

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Text Summarization

This automated research assistant outputs summarized bullet points of sub-topics by taking keywords input from user and output only related summary as bullet points.

Automated Text summarization tasks can be classified into extractive and abstractive summarization which could be further broken down into Single Document and Multi Document text summarization. Our goal is to generate multi-document abstractive summaries of sub-topics in this Automated Research Assistant (ARA); which is “understanding” the original text and “retelling” it in fewer words. However due to limited timeline I remained stick to the extractive summaries of sub-topics.

Design Overview

alt text

Goals

The design of our ARA could have following goals:

 Reading and preprocessing documents from plain text files which includes tokenization, stop words removal, case change and stemming.

 Document Clustering of input documents to group similar documents in clusters.

 Topic Modelling due to no label or keyword information, unsupervised technique to be used for topic modelling.

 Topic Input from the user for topics and subtopics.

 Relevant Documents retrieval against input topics and subtopics. The similarity is to be measured between input topic and topic modelling output to identify most relevant cluster.

 Summarization using ‘TextRank’ approach to model text as graph networks and retrieve high importance sentences as summaries.

File Descripion

The submitted task contains 04 files, details and instructions as following:

summarizer.py is the main file which can be run from Command Prompt as

python summarizer.py

without arguments. On execution it first asks for the path where articles are location, e.g. to build on given articles, ‘./articles/’ is required. Next it asks for Topic and Sub Topics.

preprocess.py contains the pre-processing function.

extractive.py applies ‘TextRank’ algorithm and contains ‘summarize’ function for extractive summarization.

summary.py contains the function for finding similar sentences to relevant to subtopics.

About

An automated Text Summarization agent to get relevant extractive summary from a corpus against a user query.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%