# Feature Scaling Solution Notebook


Feature scaling is a preprocessing technique used in machine learning to standardize or normalize the range of independent variables or features of data. It is important because many machine learning algorithms are sensitive to the scale of the input features. Feature scaling helps in ensuring that all features have the same influence on the model and prevents features with larger scales from dominating those with smaller scales.

There are two common methods for feature scaling:

Normalization (Min-Max Scaling): In this method, the features are scaled to a specific range, typically between 0 and 1. The formula for normalization is:

X
n
​
 ormalized=
X
max
​
 −X
min
​
 /

X−X
min
​

​


where

X is the original feature value,

X
min
​
  is the minimum value of that feature, and

X
max
​
  is the maximum value of that feature.

Normalization is useful when you want to constrain your features within a specific range, especially when they have different minimum and maximum values.

Standardization (Z-score Scaling): In this method, the features are scaled to have a mean of 0 and a standard deviation of 1. The formula for standardization is:


s
​
 tandardized=

(X−μ)/σ
​


where

X is the original feature value,

μ is the mean of that feature, and

σ is the standard deviation of that feature.

Standardization is a good choice when the features have different means and standard deviations. It does not bound the features to a specific range but centers them around zero.

When to use which method depends on the nature of your data and the requirements of the machine learning algorithm you're using. Some algorithms, like support vector machines and k-nearest neighbors, can be sensitive to the scale of features, and standardization often works well with them. Other algorithms, like decision trees and random forests, are invariant to feature scaling.

It's important to note that the choice of feature scaling method can impact the performance of your machine learning model, so it's often a good practice to experiment with both methods and see which one works better for your specific dataset and model.

### Importing Libraries

In [1]:
import pandas as pd
import numpy as np

### Importing Dataset

In [2]:
df = pd.read_csv("Dataset_03.csv")
df.head(15)

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience,Purchased
0,0,0,1,0,39343,1.1,0
1,0,1,0,0,46205,1.3,1
2,0,1,0,0,37731,1.5,0
3,0,1,0,0,43525,2.0,0
4,0,0,0,1,39891,2.2,0
5,0,0,1,0,56642,2.9,0
6,0,1,0,0,60150,3.0,1
7,1,0,0,0,54445,3.2,0
8,0,0,1,0,64445,3.2,1
9,0,0,1,0,57189,3.7,0


In [3]:
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

### Splitting Dataset

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size = 0.2, random_state = 1)

### Perform Feature Scaling

In [5]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Initialize the scaler
scaler = StandardScaler()  # or MinMaxScaler()

# Fit the scaler to the training data and transform it
X_train_scaled = scaler.fit_transform(X_train)

# Transform the test data using the same scaler
X_test_scaled = scaler.transform(X_test)


In [6]:
X_train

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience
26,0,0,1,0,116969,9.5
3,0,1,0,0,43525,2.0
24,0,0,1,0,109431,8.7
22,1,0,0,0,101302,7.9
23,1,0,0,0,113812,8.2
4,0,0,0,1,39891,2.2
2,0,1,0,0,37731,1.5
25,1,0,0,0,105582,9.0
6,0,1,0,0,60150,3.0
18,1,0,0,0,81363,5.9


In [7]:
X_test

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience
17,0,1,0,0,83088,5.3
21,1,0,0,0,98273,7.1
10,0,0,1,0,63218,3.9
19,0,1,0,0,93940,6.0
14,1,0,0,0,61111,4.5
20,0,1,0,0,91738,6.8


In [8]:
X_train_scaled

array([[-0.51298918, -0.57735027,  1.55838744, -0.57735027,  1.4613054 ,
         1.39108016],
       [-0.51298918,  1.73205081, -0.64168895, -0.57735027, -1.06760255,
        -1.05896317],
       [-0.51298918, -0.57735027,  1.55838744, -0.57735027,  1.20174835,
         1.1297422 ],
       [ 1.94935887, -0.57735027, -0.64168895, -0.57735027,  0.92184131,
         0.86840425],
       [ 1.94935887, -0.57735027, -0.64168895, -0.57735027,  1.35259996,
         0.96640598],
       [-0.51298918, -0.57735027, -0.64168895,  1.73205081, -1.1927326 ,
        -0.99362869],
       [-0.51298918,  1.73205081, -0.64168895, -0.57735027, -1.2671082 ,
        -1.2222994 ],
       [ 1.94935887, -0.57735027, -0.64168895, -0.57735027,  1.06921517,
         1.22774394],
       [-0.51298918,  1.73205081, -0.64168895, -0.57735027, -0.4951515 ,
        -0.73229073],
       [ 1.94935887, -0.57735027, -0.64168895, -0.57735027,  0.23527882,
         0.21505936],
       [-0.51298918,  1.73205081, -0.64168895, -0.

In [10]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

# Assume df contains your original data
original_data = X_train.copy()

# Initialize the StandardScaler
scaler = StandardScaler()


# Create a DataFrame for the scaled data
scaled_df = pd.DataFrame(X_train_scaled, columns=X_train.columns) #taking X_train columns as ref

# Compare the first few rows of the original and scaled data
print("Original Data:")
print(original_data.head())

print("\nScaled Data:")
print(scaled_df.head())


Original Data:
    Australia  Canada  Dubai  USA  Salary  YearsExperience
26          0       0      1    0  116969              9.5
3           0       1      0    0   43525              2.0
24          0       0      1    0  109431              8.7
22          1       0      0    0  101302              7.9
23          1       0      0    0  113812              8.2

Scaled Data:
   Australia    Canada     Dubai      USA    Salary  YearsExperience
0  -0.512989 -0.577350  1.558387 -0.57735  1.461305         1.391080
1  -0.512989  1.732051 -0.641689 -0.57735 -1.067603        -1.058963
2  -0.512989 -0.577350  1.558387 -0.57735  1.201748         1.129742
3   1.949359 -0.577350 -0.641689 -0.57735  0.921841         0.868404
4   1.949359 -0.577350 -0.641689 -0.57735  1.352600         0.966406


Training a machine learning model with either the original data or the scaled data can lead to different results and may depend on the specific algorithm you are using. Here's an explanation of the implications of training a model with both types of data based on your provided example:

Training with Original Data:

When you train a model with the original data, it means that the model will work directly with the raw, unscaled feature values. Here are the implications:

Differences in Feature Scales: If your features have significantly different scales, such as 'Salary' (likely in thousands) and 'YearsExperience' (smaller values), some machine learning algorithms may be sensitive to these differences. For example, algorithms based on distance metrics (e.g., k-nearest neighbors) or gradient descent optimization (e.g., linear regression) may be affected by varying feature scales.

Impact of Categorical Features: Categorical features like 'Australia,' 'Canada,' 'Dubai,' and 'USA' might have a different impact on the model due to their binary nature (0 or 1). Some models may require one-hot encoding for categorical features to treat them appropriately.

Interpretability: Models trained on the original data may have coefficients or feature importances that are challenging to interpret because they depend on the original feature scales.

Training with Scaled Data:

When you train a model with scaled data (e.g., after standardization), it means that all features have been transformed to have a mean of 0 and a standard deviation of 1. Here are the implications:

Consistent Feature Scales: All features have the same scale, making it easier for many machine learning algorithms to process the data. This can improve the model's performance, especially when using distance-based algorithms or optimization techniques.

Handling Categorical Features: Categorical features are also scaled, which ensures they contribute equally to the model and can be interpreted more uniformly.

Interpretability: Models trained on scaled data may have more interpretable coefficients or feature importances since the scaling process brings all features to a common scale.

In most cases, it's recommended to train models with scaled data because it often leads to better model performance and interpretability. However, the choice of whether to use scaled or original data depends on the specific machine learning algorithm you're using and the nature of your data. Some algorithms, like decision trees or random forests, are less sensitive to feature scales and may perform reasonably well with the original data.

Ultimately, it's a good practice to try both approaches (with and without scaling) and evaluate the model's performance to determine which one works better for your particular dataset and problem. Scikit-Learn provides tools for easy integration of feature scaling into your machine learning pipelines, allowing you to exp

Normalization


In [11]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Initialize the scaler
scaler = MinMaxScaler()

# Fit the scaler to the training data and transform it
X_train_scaled1 = scaler.fit_transform(X_train)

# Transform the test data using the same scaler
X_test_scaled1 = scaler.transform(X_test)


In [12]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

# Assume df contains your original data
original_data = X_train.copy()

# Initialize the StandardScaler
scaler = MinMaxScaler()


# Create a DataFrame for the scaled data
scaled_df = pd.DataFrame(X_train_scaled1, columns=X_train.columns)

# Compare the first few rows of the original and scaled data
print("Original Data:")
print(original_data.head())

print("\nScaled Data:")
print(scaled_df.head())


Original Data:
    Australia  Canada  Dubai  USA  Salary  YearsExperience
26          0       0      1    0  116969              9.5
3           0       1      0    0   43525              2.0
24          0       0      1    0  109431              8.7
22          1       0      0    0  101302              7.9
23          1       0      0    0  113812              8.2

Scaled Data:
   Australia  Canada  Dubai  USA    Salary  YearsExperience
0        0.0     0.0    1.0  0.0  0.935956         0.893617
1        0.0     1.0    0.0  0.0  0.068438         0.095745
2        0.0     0.0    1.0  0.0  0.846917         0.808511
3        1.0     0.0    0.0  0.0  0.750898         0.723404
4        1.0     0.0    0.0  0.0  0.898665         0.755319


### Check it

In [None]:
X_train

In [None]:
X_test