# Convert Classification Data to Hot Encoding
The data I have coming in for classification has many different classifications which is problematic to our classifier. So I'm converting my csv to include multiple columns for each classification. If the row is classification X, then it will have a 1 in column X and a zero in the rest.

In [1]:
import pandas as pd

In [2]:
old_class_df = pd.read_csv("./saved_data/post_comment_class.csv")
old_class_df.head()

Unnamed: 0,prompt,class
0,title: (UPDATE) AITA for telling my step-daugh...,nta
1,title: AITA (22M) for telling my mom (46F) to ...,nta
2,title: AITA (31F) for not wanting to go out dr...,nta
3,title: AITA (39F) for NOT wanting to move to L...,nta
4,title: AITA (F25) for telling my bridesmaid (F...,nta


In [3]:
hot_encoded_df = pd.DataFrame(index=old_class_df.index)

# Iterate through the unique values in the 'class' column
for c in old_class_df['class'].unique():
    # Create a new column in the new DataFrame for the current class
    new_col = (old_class_df['class'] == c).astype(int)
    hot_encoded_df[c] = new_col

In [4]:
hot_encoded_df.head()

Unnamed: 0,nta,yta,info,nah,esh,ywbta
0,1,0,0,0,0,0
1,1,0,0,0,0,0
2,1,0,0,0,0,0
3,1,0,0,0,0,0
4,1,0,0,0,0,0


In [5]:
# Concatenate the new DataFrame with the original DataFrame
new_class_df = pd.concat([old_class_df['prompt'], hot_encoded_df], axis=1)
new_class_df.head()

Unnamed: 0,prompt,nta,yta,info,nah,esh,ywbta
0,title: (UPDATE) AITA for telling my step-daugh...,1,0,0,0,0,0
1,title: AITA (22M) for telling my mom (46F) to ...,1,0,0,0,0,0
2,title: AITA (31F) for not wanting to go out dr...,1,0,0,0,0,0
3,title: AITA (39F) for NOT wanting to move to L...,1,0,0,0,0,0
4,title: AITA (F25) for telling my bridesmaid (F...,1,0,0,0,0,0


In [7]:
new_class_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 920 entries, 0 to 919
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   prompt  920 non-null    object
 1   nta     920 non-null    int64 
 2   yta     920 non-null    int64 
 3   info    920 non-null    int64 
 4   nah     920 non-null    int64 
 5   esh     920 non-null    int64 
 6   ywbta   920 non-null    int64 
dtypes: int64(6), object(1)
memory usage: 50.4+ KB


In [6]:
new_class_df.to_csv("./saved_data/hot_encoded_class.csv", index=False)