## Overview of Feature Encoding

In machine learning, many algorithms require numerical input data. **Feature encoding** transforms categorical data into a numerical format so that models can process it. The most common encoding techniques include:

- **Ordinal Encoding:** For categorical features that have a natural order.
- **Label Encoding:** For mapping categories to integer values (often used for target variables).
- **One-Hot Encoding:** For representing each category as a binary vector, especially when no ordinal relationship exists.

---

## 1. Ordinal Encoding  | [Link](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2010%20Ordinal%20Encoding)

**Concept:**  
Ordinal Encoding is used when your categorical feature has an intrinsic order (e.g., *Low*, *Medium*, *High*). The technique assigns an integer value to each category while preserving the order.

**Mathematical Representation:**  

<p>  
    If you have an ordered set of categories <code>{c<sub>1</sub>, c<sub>2</sub>, &hellip;, c<sub>k</sub>}</code> (e.g., <code>Low &lt; Medium &lt; High</code>), then the encoding function can be written as:  
</p>  

$$
f(c_i) = i \quad \text{for } i = 1, 2, \dots, k.
$$

For example,  
- Low → 1  
- Medium → 2  
- High → 3  

**Python Code Example:**

```python
from sklearn.preprocessing import OrdinalEncoder
import numpy as np

# Example data with ordered categories
data = np.array([["Low"], ["Medium"], ["High"]])

# Specify the order of categories explicitly
encoder = OrdinalEncoder(categories=[["Low", "Medium", "High"]])
encoded = encoder.fit_transform(data)

print("Ordinal Encoding:\n", encoded)
```

---

## 2. Label Encoding

**Concept:**  
Label Encoding assigns a unique integer to each category without implying any order. This method is commonly used for the target variable in classification tasks but can also be used for features when order is not important.

**Mathematical Representation:**  

<p>  
    For a set of categories <code>{c<sub>1</sub>, c<sub>2</sub>, &hellip;, c<sub>k</sub>}</code>, label encoding maps each category to an integer value:  
</p>  

$$
\text{Label}(c_i) = i \quad \text{for each } c_i.
$$

**Python Code Example:**

```python
from sklearn.preprocessing import LabelEncoder

# Example categorical data
data = ["red", "green", "blue", "green", "red"]

encoder = LabelEncoder()
encoded = encoder.fit_transform(data)

print("Label Encoding:\n", encoded)
```

---

## 3. One-Hot Encoding  | [Link](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2011%20One-Hot%20Encoding)

**Concept:**  
One-Hot Encoding converts each categorical value into a binary vector of length equal to the number of unique categories. Only one element in the vector is 1 (indicating the presence of the category) and all others are 0. This method is ideal when there is no ordinal relationship between categories.

**Mathematical Representation:**  

<p>  
    For a category <code>c</code> from a set of <code>k</code> categories, the one-hot encoded vector <strong>y</strong> is defined as:  
</p>  

$$
y_j =
\begin{cases}
1 & \text{if } j = c, \\
0 & \text{otherwise,}
\end{cases}
\quad \text{for } j = 1, 2, \dots, k.
$$

**Python Code Example (Using Pandas):**

```python
import pandas as pd

# Example DataFrame with categorical data
data = pd.DataFrame({'color': ['red', 'green', 'blue', 'green', 'red']})

# Generate one-hot encoding
one_hot = pd.get_dummies(data['color'])
print("One-Hot Encoding using pandas:\n", one_hot)
```

**Python Code Example (Using Scikit-Learn):**

```python
from sklearn.preprocessing import OneHotEncoder
import numpy as np

# Example data
data = np.array([["red"], ["green"], ["blue"], ["green"], ["red"]])

encoder = OneHotEncoder(sparse=False)
encoded = encoder.fit_transform(data)

print("One-Hot Encoding using scikit-learn:\n", encoded)
```

---

## Summary

- **Ordinal Encoding** is ideal for ordered categories. It maps each category to an integer in a way that preserves the order.  
- **Label Encoding** simply assigns a unique integer to each category and is most useful for target variables or non-ordinal features.
- **One-Hot Encoding** creates binary vectors for each category, ensuring no ordinal relationship is imposed on the data.

