<a href="https://colab.research.google.com/github/ahsan11777/life--story/blob/main/Feature_Scaling_Solution_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Feature Scaling Solution Notebook

### Importing Libraries

In [None]:
import pandas as pd
import numpy as np

### Importing Dataset

In [None]:
df = pd.read_csv("Dataset_03.csv")
df.head()

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience,Purchased
0,0,0,1,0,39343,1.1,0
1,0,1,0,0,46205,1.3,1
2,0,1,0,0,37731,1.5,0
3,0,1,0,0,43525,2.0,0
4,0,0,0,1,39891,2.2,0


In [None]:
# X = df.drop("Purchased",axis = 1)
# y = df["Purchased"]

# y = df.iloc[::,-1]
# X = df.iloc[::,:-1]
# X.head(), y.head()

# Scalling Or Normalization

* **Scalling** : This means that you're transforming your data so that it fits within a specific scale, like 0-100 or 0-1. You want to scale data when you're using methods based on measures of how far apart data points are, like support vector machines (SVM) or k-nearest neighbors (KNN).
* **Normalization** : Data normalization is generally considered the development of clean data.

In [None]:
salary = df["Salary"]
salary[:5]

Unnamed: 0,Salary
0,39343
1,46205
2,37731
3,43525
4,39891


In [None]:
means = salary.mean()
std = salary.std()
# print(f"Mean Salary: {means:.2}\nSalary Standard Deviation: {std:.2f}")
print(f"Mean Salary: {means}\nSalary Standard Deviation: {std}")

Mean Salary: 76003.0
Salary Standard Deviation: 27414.4297845823


In [None]:
standard_salary = (salary-means)/std
standard_salary.head()

Unnamed: 0,Salary
0,-1.337252
1,-1.086946
2,-1.396053
3,-1.184705
4,-1.317262


In [None]:
standard_salary.mean(), standard_salary.std()

(5.921189464667501e-17, 1.0)

In [None]:
df.shape

(30, 7)

In [None]:
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

### Splitting Dataset

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,random_state = 42)

In [None]:
X_train.shape, X_train, X_test

((24, 6),
     Australia  Canada  Dubai  USA  Salary  YearsExperience
 28          0       0      0    1  122391             10.3
 24          0       0      1    0  109431              8.7
 12          0       0      1    0   56957              4.0
 0           0       0      1    0   39343              1.1
 4           0       0      0    1   39891              2.2
 16          0       0      0    1   66029              5.1
 5           0       0      1    0   56642              2.9
 13          0       1      0    0   57081              4.1
 11          0       0      0    1   55794              4.0
 22          1       0      0    0  101302              7.9
 1           0       1      0    0   46205              1.3
 2           0       1      0    0   37731              1.5
 25          1       0      0    0  105582              9.0
 3           0       1      0    0   43525              2.0
 21          1       0      0    0   98273              7.1
 26          0       0      1 

### Perform Feature Scaling

In [None]:
#Importing StandardScaler
from sklearn.preprocessing import StandardScaler
#Creating Instance of StandarScaler
sc = StandardScaler()
#Perform scaling in X_train with fit_transform.
#Here we are applying fit_transform because,
        # fit will calculate mean and standard deviation of X_train
        # transform will actually perform scaling with calculated mean and std.
        # fit_transform method does this both thing in one line of code.
X_train.iloc[:, 4:] = sc.fit_transform(X_train.iloc[:, 4:])

# Here we will only use transform because we have already calculated mean and std.
# Another reason is we don't want to know the mean and std of our test dataset As it
# Lead to information leakage.
X_test.iloc[:, 4:] = sc.transform(X_test.iloc[:, 4:])

### Check it

In [None]:
X_train

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience
28,0,0,0,1,1.742862,1.774458
24,0,0,1,0,1.27408,1.219126
12,0,0,1,0,-0.62398,-0.41216
0,0,0,1,0,-1.261103,-1.418698
4,0,0,0,1,-1.241282,-1.036908
16,0,0,0,1,-0.295833,-0.03037
5,0,0,1,0,-0.635374,-0.793951
13,0,1,0,0,-0.619495,-0.377452
11,0,0,0,1,-0.666047,-0.41216
22,1,0,0,0,0.980042,0.94146


In [None]:
X_test

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience
27,0,0,0,1,1.389973,1.5315
15,0,0,0,1,-0.226781,-0.099786
23,1,0,0,0,1.432547,1.045585
17,0,1,0,0,0.321216,0.039047
8,0,0,1,0,-0.353128,-0.689826
9,0,0,1,0,-0.615588,-0.516285


In [None]:
df["Salary"].head()

Unnamed: 0,Salary
0,39343
1,46205
2,37731
3,43525
4,39891


In [None]:
salary.head()


Unnamed: 0,Salary
0,39343
1,46205
2,37731
3,43525
4,39891


In [None]:
from sklearn.preprocessing import normalize

In [None]:
salary_norm = normalize(np.array(salary).reshape(-1,1))

In [None]:
salary_norm = pd.Series(salary_norm.ravel(), name = "Salary")

In [None]:
salary_norm

Unnamed: 0,Salary
0,1.0
1,1.0
2,1.0
3,1.0
4,1.0
5,1.0
6,1.0
7,1.0
8,1.0
9,1.0


Easier way to normalize

In [None]:
Salary_norm = normalize([salary])

In [None]:
np.array([[1],[5]]).ravel()

array([1, 5])