# Feature Eng Lab: Standard Scaling

In this lab, we'll redo our power plant regression with one change: we'll standardize our variables.

This means that we'll take our predictors and, for each one,
1. calculate its mean and standard deviation
2. subtract the mean
3. divide by the standard deviation

The result is that 
* the mean will be zero and 
* the units will be comparable amounts of variance

Aside from helping some algorithms perform better, this standardization will let us compare the importance of our variables after training. That can be really useful in understanding our data, or just simplifying a model that might start out with hundreds or thousands of parameters.

We'll start with our last working solution:

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
import numpy as np

df = pd.read_csv('data/powerplant.csv')
X = df.iloc[:, :4]
y = df.iloc[:, 4]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
lr = linear_model.LinearRegression()
linear = lr.fit(X_train, y_train)
y_pred = linear.predict(X_test)
print("RMSE %f" % np.sqrt(mean_squared_error(y_test, y_pred)) )
list(zip(X_train.columns,linear.coef_))

Now we want to use scikit-learn's built-in StandardScaler to do the work.

We'll fit-and-transform the features just like we did in the deskewing demo.

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

In [None]:
from sklearn.preprocessing import StandardScaler

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

ss = StandardScaler()

X_train_ss = ss.fit_transform(X_train)

lr = linear_model.LinearRegression()
linear = lr.fit(X_train_ss, y_train)

In [None]:
y_pred = linear.predict(ss.transform(X_test))
print("RMSE %f" % np.sqrt(mean_squared_error(y_test, y_pred)) )

Is the result substantially better, worse, or neither?

What about the coefficients?

In [None]:
list(zip(X_train.columns,linear.coef_))