# **Feature Scaling** and **Encoding of Data** in Machine Learning

## **Encoding Data in Machine Learning**

- In machine learning, **encoding** means **converting categorical (non-numeric) data into numeric form** so that algorithm can process it.
- Most ML models (like Linear Regression, SVM, etc) work only with **numerical data**, so encoding transforms **text or categorical labels** into numbers with numbers without losing inportant information.
- **Why Encoding is Needed**: To get faster processing.

## **Types of Data Requiring Encoding**:

- **Nominal Data**: without any ordering
- **Ordinal Data**: with some ordering

### A. **Label Encoding in Python**: Converts categorical data into labelled data

In [1]:
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
data=['Small','Medium','Large']
encoded = encoder.fit_transform(data)
print(encoded)

[2 1 0]


### B. **One-Hot Encoding**: Converts to Binary columns (0/1)

In [2]:
from sklearn.preprocessing import OneHotEncoder
import pandas as pd

df = pd.DataFrame({ 'Color': ['Red', 'Blue', 'Green'] })
encoder = OneHotEncoder(sparse_output=False)
encoded = encoder.fit_transform(df[['Color']])
print(encoded)

[[0. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]]


### C. **Ordinal Encoding**: Converts categorical data into labelled data with an ordering.

In [3]:
from sklearn.preprocessing import OrdinalEncoder
import pandas as pd

df = pd.DataFrame({ 'Rating': ['Poor', 'Average', 'Good'] })
encoder = OrdinalEncoder(categories=[['Poor', 'Average', 'Good']])
encoded = encoder.fit_transform(df)
print(encoded)

[[0.]
 [1.]
 [2.]]


### D. **Binary Encoding**: Converts into **binary digits** and splits into columns

In [4]:
!pip install category_encoders




[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [5]:
import category_encoders as ce
import pandas as pd

df = pd.DataFrame({ 'City': ['Delhi', 'Mumbai', 'Kolkata'] })
encoder = ce.BinaryEncoder(cols=['City'])
encoded = encoder.fit_transform(df)
print(encoded)

   City_0  City_1
0       0       1
1       1       0
2       1       1


### E. **Target Encoding**: Replaces the **category** with the **mean of the target variable** for that category

In [9]:
import category_encoders as ce
import pandas as pd

df = pd.DataFrame({ 
    'City': ['Delhi', 'Mumbai', 'Delhi', 'Kolkata', 'Mumbai'],
    'House_Price': [10.5, 12.0, 11.0, 8.0, 13.0]
})

# Apply target Encoder
encoder = ce.TargetEncoder(cols=['City'])
df['City_Encoded'] = encoder.fit_transform(df['City'], df['House_Price'])
print(encoded)

   City_0  City_1
0       0       1
1       1       0
2       1       1
