Skip to content

8sukanya8/preprocess_NLP_pkg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Preprocessing Package

Text Mining relies heavily on the pre-processing. This library is an assortment of common text processing techniques. The library is divided into following modules.

Load Data

This module contains all the functions for loading the data and outputting results

Text Processing

This module contains the functions for tokenizing, normalising, removing special characters etc.

Feature Selection

This module contains the functions for selecting text features like word frequency, ngrams, TTR etc.

Distance Measures

This module contains the functions for calculating the distance and similarity between two vectors

Corpus Processor

This is a module which helps to convert the corpus into dictionaries (Key- Author, Values - Books by author)

Building and Installation

Build package

Building requires wheel . If not installed, please install using the following command.

python3 -m pip install --user --upgrade setuptools wheel

Install requirements

pip install -r requirements.txt

Then enter the package directory and build the package using the following command.

python3 setup.py sdist bdist_wheel

This creates the dist folder containing the packaged tar files.

Install package

pip install ./dist/preprocess_NLP_pkg-0.0.1.tar.gz

To Uninstall package

pip uninstall preprocess_NLP_pkg-0.0.1

Resources

  1. List of most frequent word_list in different languages from the Computation Linguistics Group, University of Neuchatel can be found here

About

Library for text processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages