## Feature Scaling Quiz

#### Feature Scaling
What is feature scaling?
Feature scaling is a way of transforming your data into a common range of values. There are two common scalings: 1.Standardizing 2.Normalizing

#### Standardizing
Standardizing is completed by taking each value of your column, subtracting the mean of the column, and then dividing by the standard deviation of the column. In Python, let's say you have a column in df called height. You could create a standardized height as:

#### df["height_standard"] = (df["height"] - df["height"].mean()) / df["height"].std()

This will create a new "standardized" column where each value is a comparison to the mean of the column, and a new, standardized value can be interpreted as the number of standard deviations the original height was from the mean. This type of feature scaling is by far the most common of all techniques (for the reasons discussed here, but also likely because of precedent).

#### Normalizing
A second type of feature scaling that is very popular is known as normalizing. With normalizing, data are scaled between 0 and 1. Using the same example as above, we could perform normalizing in python in the following way:

#### df["height_normal"] = (df["height"] - df["height"].min()) / (df["height"].max() - df['height'].min())

#### When Should I Use Feature Scaling?
In many machine learning algorithms, the result will change depending on the units of your data. This is especially true in two specific cases:
* When your algorithm uses a distance based metric to predict.
* When you incorporate regularization.

In [1]:
1. Load in the data
The data is in the file called 'data.csv'. Note that there's no header row on this file.
Split the data so that the six predictor features (first six columns) are stored in X, and the outcome feature (last column) is 
stored in y.

2. (NEW) Perform feature scaling on data via standardization
Create an instance of sklearn's StandardScaler and assign it to the variable scaler.
Compute the scaling parameters by using the .fit_transform() method on the predictor feature array, which also returns the 
predictor variables in their standardized values. Store those standardized values in X_scaled.

3. Fit data using linear regression with Lasso regularization
Create an instance of sklearn's Lasso class and assign it to the variable lasso_reg. You don't need to set any parameter values:
use the default values for the quiz.
Use the Lasso object's .fit() method to fit the regression model onto the data. Make sure that you apply the fit to the 
standardized data from the previous step (X_scaled), not the original data.

4. Inspect the coefficients of the regression model
Obtain the coefficients of the fit regression model using the .coef_ attribute of the Lasso object. Store this in the reg_coef 
variable: the coefficients will be printed out, and you will use your observations to answer the question at the bottom of the 
page.

In [18]:
# TODO: Add import statements
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler

# Assign the data to predictor and outcome variables
# TODO: Load the data
train_data = pd.read_csv('../Datasets/scaling_data.csv', header = None)
X = train_data.iloc[:, :-1]
y = train_data.iloc[:, -1]

# TODO: Create the standardization scaling object.
scaler = StandardScaler()

# TODO: Fit the standardization parameters and scale the data.
X_scaled = scaler.fit_transform(X)

# TODO: Create the linear regression model with lasso regularization.
lasso_reg = Lasso()

# TODO: Fit the model.
lasso_reg.fit(X_scaled, y)

# TODO: Retrieve and print out the coefficients from the regression model.
reg_coef = lasso_reg.coef_
print(reg_coef)

[  0.           3.90753617   9.02575748  -0.         -11.78303187
   0.45340137]
