### # [`sklearn.preprocessing`](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing "sklearn.preprocessing").LabelEncoder
* _class_ sklearn.preprocessing.LabelEncoder[[source]](https://github.com/scikit-learn/scikit-learn/blob/ff1023fda/sklearn/preprocessing/_label.py#L36)[](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder "Permalink to this definition")

Encode target labels with value between 0 and n_classes-1.

This transformer should be used to encode target values,  _i.e._  `y`, and not the input  `X`.

Read more in the  [User Guide](https://scikit-learn.org/stable/modules/preprocessing_targets.html#preprocessing-targets).

#### Methods

[`fit`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder.fit "sklearn.preprocessing.LabelEncoder.fit")(y)
Fit label encoder.

[`fit_transform`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder.fit_transform "sklearn.preprocessing.LabelEncoder.fit_transform")(y)
Fit label encoder and return encoded labels.

[`get_params`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder.get_params "sklearn.preprocessing.LabelEncoder.get_params")([deep])
Get parameters for this estimator.

[`inverse_transform`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder.inverse_transform "sklearn.preprocessing.LabelEncoder.inverse_transform")(y)
Transform labels back to original encoding.

[`set_output`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder.set_output "sklearn.preprocessing.LabelEncoder.set_output")(*[, transform])
Set output container.

[`set_params`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder.set_params "sklearn.preprocessing.LabelEncoder.set_params")(**params)
Set the parameters of this estimator.

[`transform`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder.transform "sklearn.preprocessing.LabelEncoder.transform")(y)
Transform labels to normalized encoding.

In [7]:
from sklearn.preprocessing import LabelEncoder

items = ['TV', '냉장고', '전자렌지', '컴퓨터', '선풍기', '선풍기', '믹서', '믹서']

LE = LabelEncoder()
LE.fit(items)
labels = LE.transform(items)
print('인코딩 변환값', labels)

인코딩 변환값 [0 1 4 5 3 3 2 2]


In [14]:
LE.fit_transform(items)

array([0, 1, 4, 5, 3, 3, 2, 2], dtype=int64)

In [8]:
print('인코딩 클래스', LE.classes_)

인코딩 클래스 ['TV' '냉장고' '믹서' '선풍기' '전자렌지' '컴퓨터']


In [15]:
print('디코딩 원본 값', LE.inverse_transform([0, 1, 2, 3, 4, 5]))

디코딩 원본 값 ['TV' '냉장고' '믹서' '선풍기' '전자렌지' '컴퓨터']


In [16]:
LE.inverse_transform([0,1,2])

array(['TV', '냉장고', '믹서'], dtype='<U4')

### [`sklearn.preprocessing`](https://scikit-learn.org/1.1/modules/classes.html#module-sklearn.preprocessing "sklearn.preprocessing").OneHotEncoder
* _class_ sklearn.preprocessing.OneHotEncoder(_*_,  _categories='auto'_,  _drop=None_,  _sparse=True_,  _dtype=<class  'numpy.float64'>_,  _handle_unknown='error'_,  _min_frequency=None_,  _max_categories=None_)[[source]](https://github.com/scikit-learn/scikit-learn/blob/f3f51f9b6/sklearn/preprocessing/_encoders.py#L201)[](https://scikit-learn.org/1.1/modules/generated/sklearn.preprocessing.OneHotEncoder.html?highlight=onehot#sklearn.preprocessing.OneHotEncoder "Permalink to this definition")

Encode categorical features as a one-hot numeric array.

The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka ‘one-of-K’ or ‘dummy’) encoding scheme. This creates a binary column for each category and returns a sparse matrix or dense array (depending on the  `sparse`  parameter)

By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the  `categories`  manually.

This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels.

Note: a one-hot encoding of y labels should use a LabelBinarizer instead.

Read more in the  [User Guide](https://scikit-learn.org/1.1/modules/preprocessing.html#preprocessing-categorical-features).

In [30]:
from sklearn.preprocessing import OneHotEncoder
import numpy as np
X = np.array([["a"] * 5 + ["b"] * 20 + ["c"] * 10 + ["d"] * 3], dtype=object).T
ohe = OneHotEncoder(max_categories=3, sparse_output=False).fit(X)
ohe.infrequent_categories_

[array(['a', 'd'], dtype=object)]

In [24]:
ohe.transform([["a"], ["b"]])

array([[0., 0., 1.],
       [1., 0., 0.]])

In [33]:
# sparse_output 삭제
ohe_1 = OneHotEncoder(max_categories=3).fit(X)
ohe_1.transform([["a"], ["b"]])

<2x3 sparse matrix of type '<class 'numpy.float64'>'
	with 2 stored elements in Compressed Sparse Row format>

In [35]:
# max_categories 삭제
ohe_2 = OneHotEncoder(sparse_output=False).fit(X)
ohe_2.transform([["a"], ["b"]])

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.]])

In [41]:
# max_categories 변경
ohe_3 = OneHotEncoder(max_categories 변경=5, sparse_output=False).fit(X)
ohe_3.transform([["a"], ["b"], ["d"]])

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 0., 1.]])

In [44]:
# handle_unknown 사용해 없는 변수 삽입
ohe_4 = OneHotEncoder(max_categories=5, handle_unknown='ignore', sparse_output=False).fit(X)
ohe_4.transform([["a"], ["b"], ["f"]])

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 0., 0.]])

In [62]:
from sklearn.preprocessing import OneHotEncoder
import numpy as np

items = ['TV', '냉장고', '전자렌지', '컴퓨터', '선풍기', '선풍기', '믹서', '믹서']
encoder = LabelEncoder()
encoder.fit(items)
labels = encoder.transform(items)
labels = labels.reshape(-1, 1)

oh_encoder = OneHotEncoder()
oh_encoder.fit(labels)
oh_labels = oh_encoder.transform(labels)
print('원-핫 인코딩 데이터')
print(oh_labels.toarray())
print('원-핫 인코딩 데이터 차원')
print(oh_labels.shape)

원-핫 인코딩 데이터
[[1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]]
원-핫 인코딩 데이터 차원
(8, 6)


In [57]:
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
import numpy as np
items = np.array(['TV', '냉장고', '전자렌지', '컴퓨터', '선풍기', '선풍기', '믹서', '믹서']).reshape(-1,1)

# items = np.array(items)
# print(items.shape)
# items = np.array(items).reshape(-1,1)
# print(items.shape)

# LE = LabelEncoder()
# LE.fit(items)
# labels = LE.transform(items)
# print('인코딩 변환값', labels)

ohe = OneHotEncoder(sparse=False)
result = ohe.fit_transform(items)



In [58]:
result

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.]])

In [60]:
import pandas as pd
pd.DataFrame(result, columns=['TV', '냉장고', '믹서', '선풍기', '전자렌지', '컴퓨터'])

Unnamed: 0,TV,냉장고,믹서,선풍기,전자렌지,컴퓨터
0,1.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,1.0,0.0
3,0.0,0.0,0.0,0.0,0.0,1.0
4,0.0,0.0,0.0,1.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0
6,0.0,0.0,1.0,0.0,0.0,0.0
7,0.0,0.0,1.0,0.0,0.0,0.0
