# AutoFeat

AutoFeat is a python library that provides automated feature engineering and feature selection along with models such as AutoFeatRegressor and AutoFeatClassifier. These are built with many scientific calculations and need good computational power.

In this article, I’ll be discussing the aspects of using AutoFeat, steps involved and its implementation with a real-world dataset.  

To read about it more, please refer [this](https://analyticsindiamag.com/guide-to-automatic-feature-engineering-using-autofeat/) article.

# Implementation

## Installation 

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels tensorflow keras --user -q

In [None]:
!python -m pip install autofeat --user -q

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

## AutoFeat in Regression

For regression, AutoFeatRegressor module is used.

Parameters:

> * categorical_cols=None, list of categorical columns that will be one hot encoded.
> * feateng_cols=None, list of columns that will be used for feature engineering.
> * units=None, all columns are dimensionless otherwise the measurement unit is converted to pint unit.
> * feateng_steps=2, iteration for feature engineering steps
> * featsel_runs=5, number of times to perform feature selection with a fraction of data.
>* max_gb=None, maximum gigabytes to be used in feature engineering
> * transformations=(“1/”, “exp”, “log”, “abs”, “sqrt”, “^2”, “^3”), list of transformations to be applied.
> * apply_pi_theorem=True, whether to or not to apply pi theorem.
> * always_return_numpy=False, whether to return numpy.ndarray or pandas.DataFrame.
> * n_jobs=1, parallel jobs to be run.
> * verbose=0, verbosity level.

The dataset used for demonstration is Boston housing price dataset from scikit learn library.

Performance metrics used for regression is R-squared error present within model.score().

In [None]:
import pandas as pd
from autofeat import AutoFeatRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
import matplotlib.pyplot as plt
X,y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.3,random_state =0)
model = AutoFeatRegressor()
df = model.fit_transform(X, y)
pred = model.predict(X_test)
print("Final R^2: %.4f" % model.score(df, y))

Originally X.shape = (506,13) and after transformation df.shape = (506,32)
autofeat

The column names show the newly formed features.

In [None]:
plt.figure()
plt.scatter(model.predict(df), y, s=2)

## AutoFeat in Classification

For classification, AutoFeatClassifier module is used. Parameters are the same as a regressor. The dataset used for demonstration is wine classification dataset from scikit learn library. Performance metrics used for classification is Accuracy present within model.score().

In [None]:
from autofeat import AutoFeatClassifier
from sklearn.datasets import load_wine
X,y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.3,random_state =0)
model = AutoFeatClassifier()
df = model.fit_transform(X, y)
y_pred = model.predict(X_test)
print("Final Accuracy: %.4f" % model.score(df, y))

## AutoFeatModel 

Instead of using Regressor or Classifier separately, they could be used as an argument to AutoFeatModel class. By default, it’s set to regression.

In [None]:
from autofeat import AutoFeatModel
model = AutoFeatModel(problem_type='regression')
df = model.fit_transform(X, y)
y_pred = model.predict(X_test)
print("Final R^2: %.4f" % model.score(df, y))

## Feature Selector 

FeatureSelector class provides automatic feature selection. The selected features are returned as a dataframe.

Parameters

> * problem_type=”regression”, by default regression otherwise could be set to classification.
> * featsel_runs=5, number of iterations to be performed for feature selection.
> * keep=None, a list of features that are to be kept.
> * n_jobs=1, number of parallel jobs to be run.
> * verbose=0, verbosity level.

In [None]:
from autofeat import FeatureSelector
X,y = load_wine(return_X_y=True)
fsel = FeatureSelector(verbose=1)
new_X = fsel.fit_transform(pd.DataFrame(X), pd.DataFrame(y))