Skip to content

NLTK: Natural Language Toolkit

gioiastevens edited this page Apr 29, 2014 · 4 revisions
Clone this wiki locally

How to find NLTK in DH Box

You can access the bash shell at Once there, input your DH Box username and password. NLTK is a Python library, so it can be accessed through IPython, or in a Python script.


NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

[List of sample texts and corpora to download] (

NLTK site

[NLTK: Natural Language Toolkit] (


[NLTK Documentation] (

Installing NLTK data


[Natural Language Processing with Python] ( This online book provides a practical introduction to programming for language processing. Written by the creators of NLTK, it guides the reader through the fundamentals of writing Python programs, working with corpora, categorizing text, analyzing linguistic structure, and more. A new version with updates for Python 3 and NLTK 3 is in preparation.

[NLTK-Trainer] ( NLTK-Trainer is a set of Python command line scripts for natural language processing. With these scripts, you can do the following things without writing a single line of code: train NLTK based models, evaluate pickled models against a corpus, analyze a corpus.


[Python NLTK Demos for Natural Language Text Processing] (

[NLTK Project Ideas] (

[Centre for Language Technology, Gothenburg, Suggested NLTK Projects] (