Skip to content

Abstractor is a domain independent automatic text summarization web-app.

Notifications You must be signed in to change notification settings

arjundatt/Abstractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Abstractor

Abstractor is a domain independent automatic text summarization web-app. The methodology, to summarize text, focuses on generating an arbitrary length summary by extracting semantically important information from the document, by analyzing term frequencies, tagging certain parts of speech like proper nouns and signal words, and retrieving font semantics of the text

Eventhough, the process of automatically summarizing documents has been greatly researched upon since a long time, but even today it stands as a great challenge to generate a 100% accurate summary/abstract of any piece of text. Some of the many reasons to which this may be attributed to are:

  • Identifying and segmenting a document into sub-parts (like, extracting every sentence from a document distinctly) is a a highly ambiguous task.
  • Extraction of structural and style information (like heading, end of paragraphs, bullets, etc.) of a document can't be done with 100% accuracy.
  • Removal of external noise (like advertisments, etc.) from documents (like, text copied from web pages) is a daunting task.
The methodology behind this tool is an attempt to couter all(and many more) of the aforementioned factors to the best of its efficacy.



TIP to use the tool: While copying the content to be summarized into the document-input window, try to remove irrelevant or unwanted content(like advertisments) from the input. Also, one ssould be careful in maintaining the document structure and style as of the original document. All such modifications might improve the results drastically.

About

Abstractor is a domain independent automatic text summarization web-app.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published