This python module preprocesses a csv dataset, which has a categorical data at the last column. This module makes use of scikit-learn, pandas and numpy.
The following steps of preprocessing be done in order:
- Importing Dataset
- Missing Value treatement
- Encoding Categorical Data
- Splitting Dataset into training and testing set
- Feature Scaling using Standard Scaler
-
The input value to the sole function preprocess is a csv dataset
-
The return value is a tuple of 4 values in the fashing of train_test_split of scikit-learn, i.e X_train, X_test, y_train, y_test**
$ pip install processdat
> import processdat as pro
...
X_train, X_test, y_train, y_test = pro.preprocess('Data.csv')
...
To install processdat, along with the tools you need to develop and run tests, run the following in the terminal/environment:
$ pip install -e .[dev]