# APPLYING FEATURE SCALING

- feature scaling refers to techniques used to standardize or normalize numerical features so that they are on a similar scale.
<br>

> Imagine you have a group of friends and you want to compare them in two ways: height in cm and weight in kg.<br>For Example: <br>
>Height might be around 150–190 cm,<br>Weight might be around 50–100 kg.

-If you try to compare them directly, height numbers are bigger than weight numbers, so height dominates your comparison this is a problem for many ML algorithms.

-F eature scaling is like resizing all measurements so they are comparable, 
> For example:<br>

>Scaling height to a 0–1 range.<br>caling weight to the same 0–1 range.


## While Encoding is for Categorical data
- Transforming categorail data into numbers so that machine can understand
>For Example:<br> Label Encoding: <br>
>Male = 1 , Female = 0 <br> One-Hot Encoding:  
>Male = 0,1  Female = 1,0 

## how to actually use feature scaling and encoding

In [88]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelEncoder, MinMaxScaler, Normalizer


# Sample dataset
data = {
    'Age': [25, 30, 35, 40, 50,21,22,18,19,22],
    'Salary': [50000, 60000, 65000, 80000, 120000,120000,80000,100000,80000,130000],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Female','Male', 'Female', 'Female', 'Male', 'Female']
}

df = pd.DataFrame(data)

display(df)


#Min-Max Scaling

min_max_scal = MinMaxScaler()
df_minMax_Scaled = min_max_scal.fit_transform(df[['Age','Salary']])
df_minMax_Scaled = pd.DataFrame(df_minMax_Scaled)

print("Min_max scaled DATA")
df[['Age','Salary']] = df_minMax_Scaled
display(df)




normalizer = Normalizer()
df_normalized = normalizer.fit_transform(df[['Age','Salary']])
df_normalized = pd.DataFrame(df_normalized)

print("NORMALIZED DATA")
df[['Age','Salary']] = df_normalized
display(df)




# Concatenate the one-hot columns to the original DataFrame
label = LabelEncoder()

df['Gender'] = label.fit_transform(df['Gender'])
display(df)


Unnamed: 0,Age,Salary,Gender
0,25,50000,Male
1,30,60000,Female
2,35,65000,Female
3,40,80000,Male
4,50,120000,Female
5,21,120000,Male
6,22,80000,Female
7,18,100000,Female
8,19,80000,Male
9,22,130000,Female


Min_max scaled DATA


Unnamed: 0,Age,Salary,Gender
0,0.21875,0.0,Male
1,0.375,0.125,Female
2,0.53125,0.1875,Female
3,0.6875,0.375,Male
4,1.0,0.875,Female
5,0.09375,0.875,Male
6,0.125,0.375,Female
7,0.0,0.625,Female
8,0.03125,0.375,Male
9,0.125,1.0,Female


NORMALIZED DATA


Unnamed: 0,Age,Salary,Gender
0,1.0,0.0,Male
1,0.948683,0.316228,Female
2,0.94299,0.33282,Female
3,0.877896,0.478852,Male
4,0.752577,0.658505,Female
5,0.106533,0.994309,Male
6,0.316228,0.948683,Female
7,0.0,1.0,Female
8,0.083045,0.996546,Male
9,0.124035,0.992278,Female


Unnamed: 0,Age,Salary,Gender
0,1.0,0.0,1
1,0.948683,0.316228,0
2,0.94299,0.33282,0
3,0.877896,0.478852,1
4,0.752577,0.658505,0
5,0.106533,0.994309,1
6,0.316228,0.948683,0
7,0.0,1.0,0
8,0.083045,0.996546,1
9,0.124035,0.992278,0


## Now this data is ready to train and test
- below is an example

In [89]:
from sklearn.model_selection import train_test_split

#Lets add new column from data which is our target 

df['Purchased'] = [0,1,0,1,0,1,1,0,0,1]
display(df)


#SEPERATING FEATURES AND TARGET
X = df.drop('Purchased',axis=1) #ALL COLUMNS EXCEPT PURCHASED 
y = df['Purchased'] #THE TARGET

#X variable
display(X)
#y variable
display(y)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=42
)



#scale the features using standard scaler





from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)  # ✅ Correct

# Train model
model = LogisticRegression(multi_class='auto', solver='lbfgs', max_iter=1000)
model.fit(X_train_scaled, y_train)

# Predict and evaluate
y_pred = model.predict(X_test_scaled)
print("Test Accuracy:", accuracy_score(y_test, y_pred)) 

Unnamed: 0,Age,Salary,Gender,Purchased
0,1.0,0.0,1,0
1,0.948683,0.316228,0,1
2,0.94299,0.33282,0,0
3,0.877896,0.478852,1,1
4,0.752577,0.658505,0,0
5,0.106533,0.994309,1,1
6,0.316228,0.948683,0,1
7,0.0,1.0,0,0
8,0.083045,0.996546,1,0
9,0.124035,0.992278,0,1


Unnamed: 0,Age,Salary,Gender
0,1.0,0.0,1
1,0.948683,0.316228,0
2,0.94299,0.33282,0
3,0.877896,0.478852,1
4,0.752577,0.658505,0
5,0.106533,0.994309,1
6,0.316228,0.948683,0
7,0.0,1.0,0
8,0.083045,0.996546,1
9,0.124035,0.992278,0


0    0
1    1
2    0
3    1
4    0
5    1
6    1
7    0
8    0
9    1
Name: Purchased, dtype: int64

Test Accuracy: 0.4


