Skip to content

ShayanBanerjee/processdat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Preprocessing Module

This python module preprocesses a csv dataset, which has a categorical data at the last column. This module makes use of scikit-learn, pandas and numpy.

The following steps of preprocessing be done in order:

  1. Importing Dataset
  2. Missing Value treatement
  3. Encoding Categorical Data
  4. Splitting Dataset into training and testing set
  5. Feature Scaling using Standard Scaler

Inut and output

  • The input value to the sole function preprocess is a csv dataset

  • The return value is a tuple of 4 values in the fashing of train_test_split of scikit-learn, i.e X_train, X_test, y_train, y_test**


Module Installation from PyPI

$ pip install processdat

Usage

> import processdat as pro

...
X_train, X_test, y_train, y_test = pro.preprocess('Data.csv')
...

Developing processdat

To install processdat, along with the tools you need to develop and run tests, run the following in the terminal/environment:

$ pip install -e .[dev]

Created by: Shayan Banerjee (shayanbanerhee96@gmail.com)

About

A data science module in python for the purpose of preprocessing of data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages