Skip to content
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs Update docs Jun 13, 2018
rake_nltk Version bump Jun 10, 2018
tests Simple corrections to spellings, grammar and structure. Jun 10, 2018
.gitignore Add gitignore Jun 13, 2018
.travis.yml
CHANGELOG.rst Github pages -> Sphinx Jun 10, 2018
LICENSE Initial commit Jan 18, 2017
MANIFEST.in
Pipfile Resolve security vulnerabilities reported by github. Jan 8, 2019
Pipfile.lock Resolve security vulnerabilities reported by github. Jan 8, 2019
README.md Github pages -> Sphinx Jun 10, 2018
README.rst
requirements.txt Run continuous integration tests. Jan 18, 2017
setup.cfg
setup.py [#19] Alternative word scoring metrics Apr 21, 2018

README.md

rake-nltk

pypiv pyv Build Status Coverage Status Licence Thanks

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

Demo

Setup

Using pip

pip install rake-nltk

Directly from the repository

git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install

Quick start

from rake_nltk import Rake

# Uses stopwords for english from NLTK, and all puntuation characters by
# default
r = Rake()

# Extraction given the text.
r.extract_keywords_from_text(<text to process>)

# Extraction given the list of strings where each string is a sentence.
r.extract_keywords_from_sentences(<list of sentences>)

# To get keyword phrases ranked highest to lowest.
r.get_ranked_phrases()

# To get keyword phrases ranked highest to lowest with scores.
r.get_ranked_phrases_with_scores()

Debugging Setup

If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.

python -c "import nltk; nltk.download('stopwords')"

References

This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley

Why I chose to implement it myself?

  • It is extremely fun to implement algorithms by reading papers. It is the digital equivalent of DIY kits.
  • There are some rather popular implementations out there, in python(aneesha/RAKE) and node(waseem18/node-rake) but neither seemed to use the power of NLTK. By making NLTK an integral part of the implementation I get the flexibility and power to extend it in other creative ways, if I see fit later, without having to implement everything myself.
  • I plan to use it in my other pet projects to come and wanted it to be modular and tunable and this way I have complete control.

Contributing

Bug Reports and Feature Requests

Please use issue tracker for reporting bugs or feature requests.

Development

Pull requests are most welcome.

Buy the developer a cup of coffee!

If you found the utility helpful you can buy me a cup of coffee using

Donate

You can’t perform that action at this time.