## One-hot Vector

One-hot vectors are frequently used in machine learning to encode categorical data, which can be either ordinal or nominal. For example, suppose we wanted to encode the names of fruits. One way to do this would be to map each name to a unique ID, as follows:

In [1]:
#Imports
import pandas as pd

In [2]:
categorical_df = pd.DataFrame(
 {"Name": ["Melon", "Watermelon", "Orange", "Melon"], "Label ID": [0,1,2,0]})
categorical_df

Unnamed: 0,Name,Label ID
0,Melon,0
1,Watermelon,1
2,Orange,2
3,Melon,0


The problem with this approach is that it creates a fictitious ordering between the names, and neural networks are really good at learning these kinds of relationships. So instead, we can create a new column for each category and assign a 1 where the
category is true, and a 0 otherwise. In **Pandas**, this can be implemented with the *get_dummies()* function as follows:

In [3]:
pd.get_dummies(categorical_df["Name"])

Unnamed: 0,Melon,Orange,Watermelon
0,1,0,0
1,0,0,1
2,0,1,0
3,1,0,0


The rows of this DataFrame are the one-hot vectors, which have a single “hot” entry with a 1 and 0s everywhere else.

To create one-hot vectors we can use the **PyTorch** or **Tensorflow** libraries to generate these vectors using the *F.one_hot()* and *tf.one_hot* methods respectively


In [4]:
fruits_label_ids=[0,1,2,0,1,2,2,2,1,2,3]
num_classes=len(list(set(fruits_label_ids)))


In [5]:
#PyTorch
import torch
import torch.nn.functional as F

input_ids = torch.tensor(fruits_label_ids)
one_hot_encodings = F.one_hot(input_ids, num_classes=num_classes)

print(f"Tensor index: {fruits_label_ids[0]}")
print(f"One-hot: {one_hot_encodings[0]}")

Tensor index: 0
One-hot: tensor([1, 0, 0, 0])


In [6]:
#TensorFlow
import tensorflow as tf

tf.one_hot(fruits_label_ids, num_classes)

print(f"Tensor index: {fruits_label_ids[0]}")
print(f"One-hot: {one_hot_encodings[0]}")

Tensor index: 0
One-hot: tensor([1, 0, 0, 0])
