---
layout: post
title:  "Trivial One Hot Encoding in Python"
desc: "The most efficient code snippet to one-hot encode columns"
date: 2020-06-17
categories: [tutorial]
tags: [snippet]
loc: 'tutorials/one_hot_encoding/'
permalink: /tutorials/one_hot_encoding
math: true
---


One hot encoding is something we do very commonly in machine learning, where we want to turn a categorical feature into a vector of ones and zeros that algorithms can make much easier sense of. 

For example, take this toy example dataframe of people and their favourite food. At the moment, it's useless to us.

In [33]:
import pandas as pd

df = pd.DataFrame({
    "Person": ["Sam", "Ali", "Jane", "John"], 
    "FavFood": ["Pizza", "Vegetables", "Cake", "Hapiness"]
}).set_index("Person")

display(df)

Unnamed: 0_level_0,FavFood
Person,Unnamed: 1_level_1
Sam,Pizza
Ali,Vegetables
Jane,Cake
John,Hapiness


I've seen enough different implementations of one-hot But in machine learning from first pricinples, that I thought I'd throw my own version into the ring. If you want a "big boy" solution, you can always just appeal to [scikit-learn's OneHotEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html), but the method just below is even simpler in my mind.

In [31]:
def one_hot(df):
    names = df.index.names
    melted = df.reset_index().melt(id_vars=names)
    return melted.pivot_table(index=names, 
                              columns=["variable", "value"], 
                              aggfunc=len, 
                              fill_value=0)

!!!main carbon

You can see if we invoke the method on our dataframe from before, it automatically pulls out the index (as melt doesn't preserve the index), and then uses a pivot to determine whether or not you get a one or a zero in your encoded columns.

In [18]:
display(one_hot(df))

variable,FavFood,FavFood,FavFood,FavFood
value,Cake,Hapiness,Pizza,Vegetables
index,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Ali,0,0,0,1
Jane,1,0,0,0
John,0,1,0,0
Sam,0,0,1,0


Amazing and super simple!