Skip to content

Al-Hasib/eng_text_cleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Cleaning of English Language Python Package

Text Cleaning is a common preprocessing technique for almost all NLP task. Mainly I have designed the package for Text Classification Task. Also You can use it for other NLP task also. You are welcome to contribute the package.

Install the package

pip install eng-text-cleaner

There has number of methods to clean the text such as removing emoji, punctuation, html_tags, urls, characters not words or digits or underscore, digits, stopwords, spell correction, lemmatize the words. One Method named clean text will apply all the methods to clean the text at a glance. Let's explore the simple package.

from eng_text_cleaner import preprocessing 

Start by removing punctuation

text = "Neither too small nor too large, and nice resolution at a good price."
# create textcleaner instance
textcleaner = preprocessing.TextCleaner()
# remove punctuation
textcleaner.remove_punctuation(text)

Output:

Neither too small nor too large and nice resolution at a good price

For Clean the text totally

# fully clean the text
textcleaner.clean_text(text)

Output:

neither small large nice resolution good price

Author:

  • Md Abdullah Al Hasib