# Label Encoder

In [1]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

In [2]:
dataset = pd.read_csv('Data.csv')
dataset.head()

Unnamed: 0,Country,Age,Salary,Purchased
0,France,44.0,,No
1,Spain,,48000.0,Yes
2,Germany,30.0,54000.0,No
3,Spain,38.0,61000.0,No
4,Germany,40.0,,Yes


In [3]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Country    10 non-null     object 
 1   Age        8 non-null      float64
 2   Salary     8 non-null      float64
 3   Purchased  10 non-null     object 
dtypes: float64(2), object(2)
memory usage: 452.0+ bytes


In [4]:
dataset.describe()

Unnamed: 0,Age,Salary
count,8.0,8.0
mean,40.25,62750.0
std,6.734771,12691.391908
min,30.0,48000.0
25%,36.5,53500.0
50%,39.0,59500.0
75%,45.0,70000.0
max,50.0,83000.0


In [5]:
dataset.isna().sum()

Country      0
Age          2
Salary       2
Purchased    0
dtype: int64

In [6]:
dataset.columns

Index(['Country', 'Age', 'Salary', 'Purchased'], dtype='object')

`LabelEncoder` is a class in scikit-learn that is used to encode categorical labels as integer values. It assigns a unique integer value to each unique label in the input data.

Here's an example of how to use `LabelEncoder` to encode categorical labels:

In [7]:
le_country = LabelEncoder()
le_purchased = LabelEncoder()
dataset['Country'] = le_country.fit_transform(dataset['Country'])
dataset['Purchased'] = le_purchased.fit_transform(dataset['Purchased'])

In this example, we are creating a `LabelEncoder` object called `encoder`. We then use the `fit_transform` method of the `encoder` object to encode the categorical labels.

In [8]:
print(le_country.classes_)
print(le_purchased.classes_)

['France' 'Germany' 'Spain']
['No' 'Yes']


In [9]:
print(le_country.inverse_transform(dataset['Country']))
print(le_purchased.inverse_transform(dataset['Purchased']))

['France' 'Spain' 'Germany' 'Spain' 'Germany' 'France' 'Spain' 'France'
 'Germany' 'France']
['No' 'Yes' 'No' 'No' 'Yes' 'Yes' 'No' 'Yes' 'No' 'Yes']


In [10]:
dataset

Unnamed: 0,Country,Age,Salary,Purchased
0,0,44.0,,0
1,2,,48000.0,1
2,1,30.0,54000.0,0
3,2,38.0,61000.0,0
4,1,40.0,,1
5,0,35.0,58000.0,1
6,2,,52000.0,0
7,0,48.0,79000.0,1
8,1,50.0,83000.0,0
9,0,37.0,67000.0,1
