[nlptools] Python NLP Tools

A straightforward Natural Language Processing Toolbox

NLP Tools is a set of tools written in python that covers the most common NLP tasks with an easy and clear to understand style of code.

It is being developed together with a Series of Articles about NLP by the main author in Medium. You can find the articles at tfduque.medium.com

Installation

Installing with pip

pip install nlpytools

Usage example

Tokenization

Using the tokenizer:

from nlptools.core.structures import tokenize

tokenize("This is a sentence")

[<SOS>, this, is, a, sentence, <EOS>]

Using sentence/document format:

from nlptools.core.structures import Document
doc = Document("This is a sentence. This is another sentence.")

for sentence in doc:
    print(sentence, sentence.tokens)

This is a sentence. [<SOS>, This, is, a, sentence, ., <EOS>]
This is another sentence. [<SOS>, This, is, another, sentence, ., <EOS>]

Normalization

These are the currently available normalization steps:

pre_tokenization_functions = {'simplify_punctuation': simplify_punctuation,
                                  'normalize_whitespace': normalize_whitespace}
post_tokenization_functions = {'normalize_contractions': normalize_contractions,
                               'spell_correction': spell_correction,
                               'remove_stopwords': remove_stopwords}

Usage:

from nlptools.preprocessing.normalization import Normalizer
normalizer = Normalizer(pre_tokenization_steps=['simplify_punctuation', 'normalize_whitespace'],
                        post_tokenization_steps=['normalize_contractions', 'spell_correction'])
norm.normalize_string("This is a nnormalized sentence!!!!         Yeah,,!!") # one can also use normalize_document

'This is a normalized sentence! Yeah,!'

Stemming:

from nlptools.preprocessing.stemming import PorterStemmer
from nlptools.core.structures import tokenize
stemmer = PorterStemmer()
tokens = tokenize("The words in this sentence will be stemmed.")
stemmed_tokens = [stemmer.stem(token) for token in tokens]

['<sos>', 'the', 'word', 'in', 'thi', 'sent', 'will', 'be', 'stem', '.', '<eos>']

Lemmatizing and Tagging

First: tagging

from nlptools.preprocessing.tagging import MLTagger
tagger = MLTagger()
tag_pairs = tagger.tag("Tag this sentence")
for tag in tag_pairs:
     print(tag, tag.PoS)

<SOS> None
Tag NNP
this DT
sentence NN
<EOS> None

Every token carries its own Part of Speech in the PoS attribute after the tagging.

Then, after tagging, we can do Lemmatization

from nlptools.preprocessing.tagging import MLTagger
tagger = MLTagger(force_ud=True) # Force UD format to use compatible tags
tag_pairs = tagger.tag("The cars are running")
lemmatized_words = [lemmatizer.lemmatize(word, word.PoS) for word in tag_pairs.tokens]
print(" ".join(lemmatized_words[1:-1]))

the car are run

Featurization

from nlptools.preprocessing.featurization import Tfidf
tfidf = Tfidf()
tfidf.fit(["The first sentence", "The second sentence", "The third sentence", "First, second, third."])
tfidf.transform(["The first sentence", "The second sentence", "The third sentence", "First, second, third."]) #or just go with fit_transform

matrix([[0.30543024, 0.        , 0.        , 0.        , 0.        ,
         0.07438118, 0.        , 0.07438118],
        [0.        , 0.30543024, 0.        , 0.        , 0.        ,

For more examples and usage, please refer to the medium series.

Release History

0.1.0
- Pypi release

Contributing

Fork it (https://github.com/yourname/yourproject/fork)
Create your feature branch (git checkout -b feature/fooBar)
Write understandable code!!!
Commit your changes (git commit -am 'Add some fooBar')
Push to the branch (git push origin feature/fooBar)
Create a new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.idea		.idea
build/lib/nlptools		build/lib/nlptools
dist		dist
nlptools		nlptools
nlpytools.egg-info		nlpytools.egg-info
.gitignore		.gitignore
.travis.yml		.travis.yml
Logo.png		Logo.png
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[nlptools] Python NLP Tools

Installation

Usage example

Tokenization

Normalization

Stemming:

Lemmatizing and Tagging

Featurization

Release History

Meta

Contributing

About

Releases

Packages

Languages

Sirsirious/NLPTools

Folders and files

Latest commit

History

Repository files navigation

[nlptools] Python NLP Tools

Installation

Usage example

Tokenization

Normalization

Stemming:

Lemmatizing and Tagging

Featurization

Release History

Meta

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages