# Label  Encoding



**Label Encoding:**

- Converts categorical data into numerical labels.
- Preserves ordinal relationships if they exist.
- May introduce unintended ordinal relationships for nominal data.
- Suitable for features with a clear order, like low, medium, high.



In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [2]:
data = pd.read_csv('Loan Prediction Dataset.csv')
data.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


In [5]:
df = data[['Gender','Married','Education','Self_Employed']]

In [6]:
df

Unnamed: 0,Gender,Married,Education,Self_Employed
0,Male,No,Graduate,No
1,Male,Yes,Graduate,No
2,Male,Yes,Graduate,Yes
3,Male,Yes,Not Graduate,No
4,Male,No,Graduate,No
...,...,...,...,...
609,Female,No,Graduate,No
610,Male,Yes,Graduate,No
611,Male,Yes,Graduate,No
612,Male,Yes,Graduate,No


In [7]:
#one hot encoding

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

In [11]:
df['Gender'] = le.fit_transform(df['Gender'])

In [13]:
for col in df.columns:
    df[col] = le.fit_transform(df[col])
    

In [14]:
df

Unnamed: 0,Gender,Married,Education,Self_Employed
0,1,0,0,0
1,1,1,0,0
2,1,1,0,1
3,1,1,1,0
4,1,0,0,0
...,...,...,...,...
609,0,0,0,0
610,1,1,0,0
611,1,1,0,0
612,1,1,0,0


In [28]:
lab_df= data[['Gender','Married','Education','Self_Employed']]

In [29]:
lab_df['Gender'].value_counts()

Male      489
Female    112
Name: Gender, dtype: int64

In [31]:
lab = {'Male':int(1), 'Female':int(0)}

lab_df['Gender'] = lab_df['Gender'].map(label)

In [32]:
lab_df

Unnamed: 0,Gender,Married,Education,Self_Employed
0,1.0,No,Graduate,No
1,1.0,Yes,Graduate,No
2,1.0,Yes,Graduate,Yes
3,1.0,Yes,Not Graduate,No
4,1.0,No,Graduate,No
...,...,...,...,...
609,0.0,No,Graduate,No
610,1.0,Yes,Graduate,No
611,1.0,Yes,Graduate,No
612,1.0,Yes,Graduate,No


# One- Hot Encoding

**One-Hot Encoding:**

- Creates binary (0 or 1) columns for each category.
- Represents nominal categories without introducing ordinal relationships.
- Increases the dimensionality of the dataset.
- Ideal for features with no inherent order, like colors or types.

In [33]:
on_data = data[['Gender','Married','Education','Self_Employed']]

In [35]:
pd.get_dummies(on_data['Gender'], drop_first = True)

Unnamed: 0,Male
0,1
1,1
2,1
3,1
4,1
...,...
609,0
610,1
611,1
612,1
