Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

README.md

Happier Fun Tokenizer

This code implements a basic, Twitter-aware tokenizer. Originally developed by Christopher Potts (Happy Fun Tokenizer) and updated by H. Andrew Schwartz. Shared with Christopher's permission.

Usage

from happierfuntokenizing.happierfuntokenizing import Tokenizer

tokenizer = Tokenizer()

message = """OMG!!!! :) I looooooove this tokenizer lololol"""
tokens = tokenizer.tokenize(message)
print(tokens)
['omg', '!', '!', '!', '!', ':)', 'i', 'looooooove', 'this', 'tokenizer', 'lololol']

message = """OMG!!!! :) I looooooove this tokenizer LoLoLoLoLooOOOOL"""
tokenizer = Tokenizer(preserve_case=True)
tokens = tokenizer.tokenize(message)
print(tokens)
['OMG', '!', '!', '!', '!', ':)', 'I', 'looooooove', 'this', 'tokenizer', 'LoLoLoLoLooOOOOL']

Installation

This is available through pip

pip install happierfuntokenizing

If you do not have sudo privileges you can use the --user flag

pip install --user happierfuntokenizing

Requirements

This uses Python 2.7. Package dependencies include re and htmlentitydefs.

License

Licensed under a GNU General Public License v3 (GPLv3)

Background

Adapted by the World Well-Being Project based out of the University of Pennsylvania and Stony Brook University. Originally developed by Christopher Potts.

About

This code implements a basic, Twitter-aware tokenizer.

Resources

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.