# 06_data_school_sklearn_categorical_OneHot_Ordinal_Label

1. OneHotEncoder: Para `Unordered (nominal) data`
2. OrdinalEncoder: Para `ordered (ordinal) data`
3. LabelEncoder: Similar a OrdinalEncoder, pero `NO podemos definir un orden`, lo hace el encoder a su criterio.  Evitar

In [104]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, LabelEncoder
from sklearn.compose import make_column_transformer, ColumnTransformer, make_column_selector
from sklearn.impute import SimpleImputer, KNNImputer

In [79]:
import warnings
warnings.filterwarnings("ignore")

In [80]:
train = pd.read_csv(r'./dataset/train.csv', sep=';')
X = train.drop(['PassengerId','Survived','Pclass', 'Name', 'Ticket', 'SibSp', 
                'Parch', 'Ticket', 'Cabin', 'Age', 'Fare', 'Embarked'], axis=1)
X = X.iloc[6:10]
print(X.columns.tolist())

['Sex']


In [81]:
X['Size'] = ['Small','Medium','Small','Large']
X['Class'] = ['First','Second','Third','First']
X

Unnamed: 0,Sex,Size,Class
6,male,Small,First
7,male,Medium,Second
8,female,Small,Third
9,female,Large,First


## OneHotEncoder

In [87]:
transformer = OneHotEncoder(sparse=False)
transformed = transformer.fit_transform(X[['Sex']])

In [90]:
# de izquierda a derecha, las columnas se ordenana de forma alfabética (columna1:Sex_female, columna2:Sex_male)
transformed

array([[0., 1.],
       [0., 1.],
       [1., 0.],
       [1., 0.]])

In [91]:
transformer.get_feature_names_out()  # se preguntan los nombres de las nuevas columnas al transformador

array(['Sex_female', 'Sex_male'], dtype=object)

In [92]:
SexDf = pd.DataFrame(transformed, columns=transformer.get_feature_names_out().tolist())
SexDf

Unnamed: 0,Sex_female,Sex_male
0,0.0,1.0
1,0.0,1.0
2,1.0,0.0
3,1.0,0.0


## OrdinalEncoder

In [100]:
# Se ordenan las categorías dentro de cada feature de acuerdo a nuestra propia definición
ordTransformer = OrdinalEncoder(categories=[['Small','Medium','Large'],['First','Second','Third']])
# debe mantenerse el orden
ordTransformed = ordTransformer.fit_transform(X[['Size','Class']])

In [101]:
ordTransformer.categories_

[array(['Small', 'Medium', 'Large'], dtype=object),
 array(['First', 'Second', 'Third'], dtype=object)]

In [102]:
ordTransformed

array([[0., 0.],
       [1., 1.],
       [0., 2.],
       [2., 0.]])

## LabelEncoder

In [114]:
X

Unnamed: 0,Sex,Size,Class
6,male,Small,First
7,male,Medium,Second
8,female,Small,Third
9,female,Large,First


In [113]:
# Se ordenan las categorías dentro de cada feature de acuerdo a nuestra propia definición
labTransformer = LabelEncoder()
# el orden lo establece el encoder: 0:Large, 1:Medium, 2:Small (de forma alfabética)
labTransformed = X[['Size','Class']].apply(labTransformer.fit_transform)
labTransformed

Unnamed: 0,Size,Class
6,2,0
7,1,1
8,2,2
9,0,0
