# One-Hot Encoding: Concept, Application, and Implementation

## 1. Applied Field and Purpose

**Applied Field:**

* Machine Learning
* Natural Language Processing (NLP)
* Data Preprocessing

**Purpose:**
One-Hot Encoding is a method for converting categorical variables into a numerical format suitable for machine learning models. It replaces each category with a binary vector indicating the presence (1) or absence (0) of a category.


## 2. Mathematical Formula

Given a categorical variable $C$ with $k$ unique categories $\{c_1, c_2, \dots, c_k\}$, One-Hot Encoding maps $C$ into a vector $\mathbf{v} \in \{0,1\}^k$ such that:

$$
\mathbf{v}_i =
\begin{cases}
1, & \text{if the category is } c_i \\
0, & \text{otherwise}
\end{cases}
$$

Where $i = 1, 2, \dots, k$.

## 3. Python Implementation Example

In [14]:
import numpy as np

def one_hot_encode(categories, value):
    category_to_index = {cat: idx for idx, cat in enumerate(categories)}
    vector = np.zeros(len(categories), dtype=int)
    vector[category_to_index[value]] = 1
    return vector

# Example usage
categories = ['red', 'green', 'blue']
encoded = one_hot_encode(categories, 'green')
print(encoded)  # Output: [0 1 0]

[0 1 0]


## 4. C++ Implementation Example

```cpp
#include <iostream>
#include <vector>
#include <string>
#include <unordered_map>

using namespace std;

vector<int> one_hot_encode(const vector<string>& categories, const string& value) 
{
    unordered_map<string, int> category_to_index;
    for (int i = 0; i < categories.size(); ++i) 
        category_to_index[categories[i]] = i;
    
    vector<int> result(categories.size(), 0);
    if (category_to_index.count(value)) 
        result[category_to_index[value]] = 1;
    
    return result;
}

int main() {
    vector<string> categories = {"red", "green", "blue"};
    vector<int> encoded = one_hot_encode(categories, "green");

    for (int val : encoded) 
    {
        cout << val << " ";
    }
    cout << std::endl;  // Output: 0 1 0
    return 0;
}
```

## 5. Summary

* **One-Hot Encoding** is simple yet effective for converting categorical features.
* It is widely used in ML pipelines where models cannot handle string labels directly.
* Python and C++ implementations both rely on mapping the category index to a binary vector.
