GitHub - KT12/tag-lemmatize: nltk utility which more accurately lemmatizes text using pre-trained part-of-speech tagger.

Summary

tag-lemmatize is a small bolt-on utility function to be used in concert with the nltk package. The function accepts un-tokenized strings. The original intent was to write a small function which would ease the use of the VADER sentiment analysis tool.

The function uses nltk.tokenize.word_tokenizeto tokenize the string. It then tags parts-of-speech (POS) taking into account context using nltk.pos_tag, which assigns a Penn Treebank POS tag. The function then converts the Penn Treebank tag into the appropriate WordNet POS tag. Finally, it lemmatizes each word using nltk.stem.WordNetLemmatizer.

Installation

Clone and add to path.

import to the Python interpreter.

tag_and_lem is the primary function.

Motivation

The nltk pre-trained part-of-speech tagger uses Penn Treebank tags which must be converted to Wordnet tags in order to use nltk's lemmatizer. This small utility should make it easier to test of Natural Language Processing techniques without training a tagger which uses Wordnet tags.

Requirements

Python 2.6+ nltk

Contributors

@KT12

If this small function was useful, please star/follow me!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.MD		README.MD
__init__.py		__init__.py
tag-lemmatize.py		tag-lemmatize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

Installation

Motivation

Requirements

Contributors

About

Releases

Packages

Languages

License

KT12/tag-lemmatize

Folders and files

Latest commit

History

Repository files navigation

Summary

Installation

Motivation

Requirements

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages