- What is One-Hot Encoding? <br>
One-hot encoding is a process to convert categorical data into a binary matrix representation. Each category is represented as a binary vector, where one element is 1 (indicating the presence of that category) and the rest are 0.
<br><br>
- Why Use One-Hot Encoding?<br>
Many machine learning algorithms expect numerical input, so categorical data needs to be transformed into numerical data. One-hot encoding is useful because it avoids ordinality; it does not impose any implicit order on the categories.

In [2]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder

In [3]:
data = {
    'Color':['Red','Blue','Green','Yellow'],
}
df = pd.DataFrame(data)

In [3]:
df

Unnamed: 0,Color
0,Red
1,Blue
2,Green
3,Yellow


In [4]:
# One-Hot encoding
df_encoded = pd.get_dummies(df, columns=['Color'])
df_encoded

Unnamed: 0,Color_Blue,Color_Green,Color_Red,Color_Yellow
0,False,False,True,False
1,True,False,False,False
2,False,True,False,False
3,False,False,False,True


In [None]:
import numpy as np

# 解码&还原 decoding
# Convert one-hot encoded data back to original labels using np.argmax
encoded_data = df_encoded.values  # Get the values as NumPy array
original_labels = df_encoded.columns[np.argmax(encoded_data, axis=1)]
df_encoded['Color'] = original_labels
print("\nDataFrame after Converting Back to Original Labels:")
print(df_encoded)

In [22]:
df_encoded

Unnamed: 0,Color_Blue,Color_Green,Color_Red,Color_Yellow
0,False,False,True,False
1,True,False,False,False
2,False,True,False,False
3,False,False,False,True


In [18]:
df_encoded.values

array([[False, False,  True, False],
       [ True, False, False, False],
       [False,  True, False, False],
       [False, False, False,  True]])

In [17]:
df_encoded.columns

Index(['Color_Blue', 'Color_Green', 'Color_Red', 'Color_Yellow'], dtype='object')

In [21]:
np.argmax(df_encoded.values, axis=1)

array([2, 0, 1, 3], dtype=int64)

In [24]:
df_encoded.columns[0]

'Color_Blue'

In [5]:
# use scikit-learn
encoder = OneHotEncoder()
one_hot_encoded = encoder.fit_transform(df).toarray()
one_hot_encoded

array([[0., 0., 1., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 0., 1.]])

In [6]:
# decoding 解码
encoder.inverse_transform(one_hot_encoded)

array([['Red'],
       ['Blue'],
       ['Green'],
       ['Yellow']], dtype=object)