### Try some of the following:
* Standardize numerical data (e.g. mean of 0 and standard deviation of 1) using the scale
and center options.
* Normalize numerical data (e.g. to a range of 0-1) using the range option.
* Explore more advanced feature engineering such as Binarizing, possibly using this [SO answer](http://stackoverflow.com/a/8505658/893766) to append a dummy feature

In [1]:
from sklearn import preprocessing
import pandas as pd
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv(url, names=names)

X = df.values[:, 0:8]
y = df.values[:, 8]

In [2]:
print(X.shape)
print(y.shape)

(768, 8)
(768,)


In [3]:
scalar = preprocessing.StandardScaler().fit(X)
print(scalar)

StandardScaler(copy=True, with_mean=True, with_std=True)


In [4]:
help(preprocessing)

Help on package sklearn.preprocessing in sklearn:

NAME
    sklearn.preprocessing

DESCRIPTION
    The :mod:`sklearn.preprocessing` module includes scaling, centering,
    normalization, binarization and imputation methods.

PACKAGE CONTENTS
    _function_transformer
    _weights
    data
    imputation
    label

CLASSES
    sklearn.base.BaseEstimator(builtins.object)
        sklearn.preprocessing._function_transformer.FunctionTransformer(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin)
        sklearn.preprocessing.data.Binarizer(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin)
        sklearn.preprocessing.data.KernelCenterer(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin)
        sklearn.preprocessing.data.MaxAbsScaler(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin)
        sklearn.preprocessing.data.MinMaxScaler(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin)
        sklearn.preprocessing.data.Normalizer(sklearn.base.BaseEstimat

In [5]:
help(scalar)

Help on StandardScaler in module sklearn.preprocessing.data object:

class StandardScaler(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin)
 |  Standardize features by removing the mean and scaling to unit variance
 |  
 |  Centering and scaling happen independently on each feature by computing
 |  the relevant statistics on the samples in the training set. Mean and
 |  standard deviation are then stored to be used on later data using the
 |  `transform` method.
 |  
 |  Standardization of a dataset is a common requirement for many
 |  machine learning estimators: they might behave badly if the
 |  individual feature do not more or less look like standard normally
 |  distributed data (e.g. Gaussian with 0 mean and unit variance).
 |  
 |  For instance many elements used in the objective function of
 |  a learning algorithm (such as the RBF kernel of Support Vector
 |  Machines or the L1 and L2 regularizers of linear models) assume that
 |  all features are centered around 0 a

In [6]:
# summarize transformed data
np.set_printoptions(precision=3)

X_scaled = scalar.transform(X)

print(X_scaled[0:5,:])

[[ 0.64   0.848  0.15   0.907 -0.693  0.204  0.468  1.426]
 [-0.845 -1.123 -0.161  0.531 -0.693 -0.684 -0.365 -0.191]
 [ 1.234  1.944 -0.264 -1.288 -0.693 -1.103  0.604 -0.106]
 [-0.845 -0.998 -0.161  0.155  0.123 -0.494 -0.921 -1.042]
 [-1.142  0.504 -1.505  0.907  0.766  1.41   5.485 -0.02 ]]
