Python implementation of TextRank algorithm (https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) for automatic keyword extraction and summarization using Levenshtein distance as relation between text units.
Python
Latest commit 020c242 Nov 13, 2016 @davidadamojr committed on GitHub Merge pull request #2 from suminb/develop
Make it Python 3 compatible
Permalink
Failed to load latest commit information.
articles first commit Dec 4, 2013
keywords first commit Dec 4, 2013
summaries first commit Dec 4, 2013
README.md updated paper link Nov 3, 2015
requirements.txt Command line interface Nov 13, 2016
setup.py Simple packaging Nov 13, 2016
textrank.py Command line interface Nov 13, 2016

README.md

This is a python implementation of TextRank for automatic keyword and sentence extraction (summarization) as done in https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf. However, this implementation uses Levenshtein Distance as the relation between text units.

This implementation carries out automatic keyword and sentence extraction on 10 articles gotten from http://theonion.com

  • 100 word summary
  • Number of keywords extracted is relative to the size of the text (a third of the number of nodes in the graph)
  • Adjacent keywords in the text are concatenated into keyphrases

Dependencies

Networkx - http://networkx.github.io/download.html NLTK 3.0 - http://nltk.org/install.html Numpy - http://sourceforge.net/projects/numpy/files/