NLP Preprocessing Package

Text Mining relies heavily on the pre-processing. This library is an assortment of common text processing techniques. The library is divided into following modules.

Load Data

This module contains all the functions for loading the data and outputting results

Text Processing

This module contains the functions for tokenizing, normalising, removing special characters etc.

Feature Selection

This module contains the functions for selecting text features like word frequency, ngrams, TTR etc.

Distance Measures

This module contains the functions for calculating the distance and similarity between two vectors

Corpus Processor

This is a module which helps to convert the corpus into dictionaries (Key- Author, Values - Books by author)

Building and Installation

Build package

Building requires wheel . If not installed, please install using the following command.

python3 -m pip install --user --upgrade setuptools wheel

Install requirements

pip install -r requirements.txt

Then enter the package directory and build the package using the following command.

python3 setup.py sdist bdist_wheel

This creates the dist folder containing the packaged tar files.

Install package

pip install ./dist/preprocess_NLP_pkg-0.0.1.tar.gz

To Uninstall package

pip uninstall preprocess_NLP_pkg-0.0.1

Resources

List of most frequent word_list in different languages from the Computation Linguistics Group, University of Neuchatel can be found here

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
preprocess_NLP_pkg		preprocess_NLP_pkg
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Preprocessing Package

Building and Installation

Build package

Install package

To Uninstall package

Resources

About

Releases

Packages

Languages

License

8sukanya8/preprocess_NLP_pkg

Folders and files

Latest commit

History

Repository files navigation

NLP Preprocessing Package

Building and Installation

Build package

Install package

To Uninstall package

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages