# SKLearn-pandas

SKLearn-pandas provides a bridge between Scikit-Learn's machine learning methods and pandas-style Data Frames.

In particular, it provides:

    A way to apply SKLearn transformations to pandas columns, and keep track of each transformation object
    A convenient way to transform Pandas dataframes into SKLearn-compliant X and y numpy arrays
    A CategoricalImputer that replaces null-like values with the mode and works with string columns.


In [3]:
from sklearn_pandas import DataFrameMapper, cross_val_score

In [4]:
import pandas as pd
import numpy as np
import sklearn.preprocessing, sklearn.decomposition, sklearn.linear_model, sklearn.pipeline, sklearn.metrics
from sklearn.feature_extraction.text import CountVectorizer

In [5]:
data = pd.DataFrame({'pet':      ['cat', 'dog', 'dog', 'fish', 'cat', 'dog', 'cat', 'fish'],
                     'children': [4., 6, 3, 3, 2, 3, 5, 4],
                     'salary':   [90, 24, 44, 27, 32, 59, 36, 27]})

In [6]:
data.head()

Unnamed: 0,children,pet,salary
0,4.0,cat,90
1,6.0,dog,24
2,3.0,dog,44
3,3.0,fish,27
4,2.0,cat,32


In [7]:
mapper = DataFrameMapper([
        ('pet', sklearn.preprocessing.LabelBinarizer()),
        (['children'], sklearn.preprocessing.StandardScaler()) ])


In [8]:
np.round(mapper.fit_transform(data.copy()), 2)

array([[ 1.  ,  0.  ,  0.  ,  0.21],
       [ 0.  ,  1.  ,  0.  ,  1.88],
       [ 0.  ,  1.  ,  0.  , -0.63],
       [ 0.  ,  0.  ,  1.  , -0.63],
       [ 1.  ,  0.  ,  0.  , -1.46],
       [ 0.  ,  1.  ,  0.  , -0.63],
       [ 1.  ,  0.  ,  0.  ,  1.04],
       [ 0.  ,  0.  ,  1.  ,  0.21]])

In [9]:
mapper.transformed_names_

['pet_cat', 'pet_dog', 'pet_fish', 'children']