***Q:** for a one-hot encoded feature, what can you do if new data contains categories that weren't seen during training*
> *Set `handle_unknown='ignore'` to encoded new categories as all zeros.*

*eg:*
<br>*if you know all possible categories that might ever appear, you can instead specifying the categories manually, `handle_unknown='ignore'` is useful specifically when you don't know all possible categories.*

In [14]:
from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder(
    sparse_output=False,
    handle_unknown='ignore'  #unseen categories will not raise an error
    )

In [15]:
import numpy as np
import pandas as pd

X = pd.DataFrame({
    'col': ['A','B','C','B']
})

X

Unnamed: 0,col
0,A
1,B
2,C
3,B


In [16]:
ohe.fit_transform(X)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.]])

In [17]:
Xnew = pd.DataFrame({
    'col': ['A','C','D']
})

Xnew

Unnamed: 0,col
0,A
1,C
2,D


In [18]:
# 'D' was not seen during training.
# Because {handle_unknown='ignore'}, the encoder does not throw an error.
# Instead, it encodes the unseen category as all zeros.

ohe.transform(Xnew)

array([[1., 0., 0.],
       [0., 0., 1.],
       [0., 0., 0.]])

: 

>*`handle_unknown='ignore'` allows transforming unseen categories by encoding them as all-zero vectors instead of raising an error.*

> ***All-zero encoding is safe because it introduces no false signal, prevents data leakage, and keeps feature space consistent for unseen categories.***