### One Hot Encoding

Today, we see the concept of <b>one hot encoding</b> in tensorflow. The term '<b>one hot</b>' is come from the electronics background where it means among all the bits only one bit is <i>high(1)</i> and other bits are <i>low(0)</i>. It's opposite '<b>one cold</b>' means among all the bits only one bit is <i>low(0)</i> and other bits are <i>high(1)</i>. 

But today we will focus on <b>tf.one_hot</b>.


<b>What is one hot encoding?</b> <br>
Let's take an example of most popular dataset i.e. Iris dataset in which we are having three different labels - <b>setosa, versicolor, and virginica</b>. Now we all know that computers only understand numerical data, we can't perform any mathematical computation on strings. So, we need to map our labels to some numerical values. This mapping process is known as <b>encoding</b>.<br>
The easiest way of encoding is:
* setosa --> 0
* versicolor --> 1
* virginica --> 2 <br>

we replace each label with it's mapped value. <br>
But, there is a problem in this method, what if the number of labels increases. Say, you have 30 unique labels, then you have to map all 30 of them. And it might also be possible that your ML model start taking these categorical values as integers and also perform numerical operations on it. A suitation may arises when you need to find out total number of setosa and versicolor, then what if it sum up all of them by identifying them as integer value. Therefore, this is not the best approach. So what should be need to do then. <br>
Well, the solution to this problem is <b>one-hot encoding</b> where we create <i>N</i> number of new features, where <i>N</i> is the total number of unique labels in the dataset. In our example of <i>Iris dataset</i> we have 3 unique labels (setosa, versicolor, and virginica), hence <b>N = 3</b>.

The way of <b>One Hot Encoding</b> is:
* setosa : [1, 0, 0]
* versicolor : [0, 1, 0]
* virginica : [0, 0, 1] <br>

Now, instead of having 0, 1, and 2 as 3 values, we are having an array of 3 values(similiar to <i>N</i> values). The index of the array is set to 1 for the relative flower and 0 for the rest of them. For example, in case of setosa we have 1 at first position only and in case of virginica we have 1 at third position only. This can be explained with a simple diagram :

|Labels|      |  |setosa|versicolor|virginica|
|---|    -----  |------|----------|---------|
|setosa|      |  | 1 | 0 | 0 |
|versicolor|  |  | 0 | 1 | 0 |
|virginica|   |  | 0 | 0 | 1 |

Now, you know about one-hot encoding to let's see some examples on it. <br>
In TensorFlow, for one-hot-encoding we use <b>tf.one_hot</b>. It's signature is <br>
<b>tf.one_hot(indices, depth, on_value, off_value, axis, dtype, name)</b><br>
where,<br>
* <b>indices</b> : A tensor of indices.
* <b>depth</b> : depth of one hot dimension.
* <b>on_value</b> : A scalar defining the value to fill in output when indices[j] = i. (default: 1)
* <b>off_value</b> : A scalar defining the value to fill in output when indices[j] != i. (default: 0)
* <b>axis</b> : the axis to fill. (default: -1)
* <b>dtype</b> : The data type of output tensor.
* <b>name</b> : A name of the operation. (optional)

#### 1. Simple example

In [2]:
# Import required libraries
import tensorflow as tf
import numpy as np

In [4]:
labels = [0, 1, 2]
result = tf.one_hot(indices= labels, depth= 3) # depth = N
print(result)

Tensor("one_hot:0", shape=(3, 3), dtype=float32)


In [5]:
with tf.Session():
    print(result.eval())

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


Now, you see how we optained the same result.<br>
We can also change the other parameters of <b>tf.one_hot</b>.

In [6]:
new_result = tf.one_hot(indices= labels, depth= 3, on_value= 5.0, off_value= 0.0, axis= -1)
with tf.Session():
    print(new_result.eval())

[[5. 0. 0.]
 [0. 5. 0.]
 [0. 0. 5.]]


<b>on_value</b> replaces 1.0 with 5.0 and <b>off_value</b> remains 0.0 <br>

#### 2. Iris Dataset Example:

In [37]:
# load the dataset
from sklearn.datasets import load_iris
data = load_iris()
iris_labels = data.target
print(iris_labels)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


Here, 0 represents setosa, 1 represents versicolor, and 2 represents virginica

In [38]:
# convert array to tensor
tensor = tf.convert_to_tensor(iris_labels, dtype=tf.int32)
print(tensor)

Tensor("Const_3:0", shape=(150,), dtype=int32)


Now tensor is created and we repeat the first example

In [39]:
final_result = tf.one_hot(indices= tensor, depth = 3)
with tf.Session():
    df = pd.DataFrame(data = final_result.eval(), columns = ["setosa", "versicolor", "virginica"])
    print(final_result.eval())

[[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0.

Now, we get the same result

In [40]:
df.head()

Unnamed: 0,setosa,versicolor,virginica
0,1.0,0.0,0.0
1,1.0,0.0,0.0
2,1.0,0.0,0.0
3,1.0,0.0,0.0
4,1.0,0.0,0.0


In [41]:
# load iris data into the dataframe
iris_df = pd.DataFrame(data = data.data, columns= data.feature_names)
iris_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [47]:
# combined both the dataframes
final_df = pd.concat([iris_df, df], axis= 1, join_axes=[iris_df.index])
final_df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),setosa,versicolor,virginica
0,5.1,3.5,1.4,0.2,1.0,0.0,0.0
1,4.9,3.0,1.4,0.2,1.0,0.0,0.0
2,4.7,3.2,1.3,0.2,1.0,0.0,0.0
3,4.6,3.1,1.5,0.2,1.0,0.0,0.0
4,5.0,3.6,1.4,0.2,1.0,0.0,0.0
5,5.4,3.9,1.7,0.4,1.0,0.0,0.0
6,4.6,3.4,1.4,0.3,1.0,0.0,0.0
7,5.0,3.4,1.5,0.2,1.0,0.0,0.0
8,4.4,2.9,1.4,0.2,1.0,0.0,0.0
9,4.9,3.1,1.5,0.1,1.0,0.0,0.0


### Now, I am sure that you can do the one-hot encoding my your self.