# üåà 1. What is Ordinal Encoding?
##### üß† Concept:

##### **Ordinal Encoding is a technique to convert categorical (textual) features into numeric values, while preserving the order (ranking)** among the categories.

##### Example:

| Size   | Meaning |
| ------ | ------- |
| Small  | Lowest  |
| Medium | Middle  |
| Large  | Highest |

##### If you encode:

```mathematica
    Small ‚Üí 0  
    Medium ‚Üí 1  
    Large ‚Üí 2

```
##### Here, the **numerical values reflect an order or ranking between categories ‚Äî this is why it‚Äôs called Ordinal Encoding.**

<HR>


## üö¶ 2. When to Use Ordinal Encoding

##### Use it **when the categories have a natural order**.

##### ‚úÖ Examples of **Ordered Categories**

| Feature      | Categories                         | Ordinal Encoding Example |
| ------------ | ---------------------------------- | ------------------------ |
| Size         | Small, Medium, Large               | 0, 1, 2                  |
| Education    | High School, Bachelor, Master, PhD | 0, 1, 2, 3               |
| Satisfaction | Poor, Average, Good, Excellent     | 0, 1, 2, 3               |


#### ‚ùå Do NOT use it when:
- There‚Äôs **no order** (like ‚ÄúRed‚Äù, ‚ÄúGreen‚Äù, ‚ÄúBlue‚Äù or ‚ÄúMale‚Äù, ‚ÄúFemale‚Äù).
- In those cases, use **Label Encoding** (if ordinal meaning doesn‚Äôt matter) **or One-Hot Encoding.**
  

<HR>


## ‚öôÔ∏è 3. How Ordinal Encoding Works (Behind the Scenes)

##### It maps each category to an integer.
##### The mapping can be:

- **Automatic** (default alphabetical order)

- **Custom-defined** (you define the order manually)

For example:

#### **Automatic mode:** (Default: alphabetical order)

```python

    ['Large', 'Medium', 'Small'] ‚Üí  [0, 1, 2]

```
#### **Custom order (preferred for ordinal features):** (This preserves the true order)

```python
    ['Small', 'Medium', 'Large'] ‚Üí  [0, 1, 2]
```
<HR>

## üß© 4. Ordinal Encoding in Python

In [49]:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

#### ‚û°Ô∏è Creating the dataset

In [50]:
df = pd.DataFrame({"Size": ["s", "m", "l", "xl", "s", "m", "l", "s", "s", "l", "xl", "m"]})
df.head()

Unnamed: 0,Size
0,s
1,m
2,l
3,xl
4,s


#### ‚û°Ô∏è Applying Ordinal Encoding (Automatic)

In [51]:
# created an OrdinalEncoder Object
encoder = OrdinalEncoder()

#Fit and transform the data
encoded_data = encoder.fit_transform(df)

encoded_df = pd.DataFrame(encoded_data, columns=encoder.get_feature_names_out())

encoded_df.head() # as we can see the encoding is in Alphabetical Order which does not adhere to real world logic

Unnamed: 0,Size
0,2.0
1,1.0
2,0.0
3,3.0
4,2.0


#### üê¥ **Hold your horses** : **Applying Custom Order**

##### ‚û°Ô∏è Defining Category order

In [52]:
# ordered_data = [["s","m","l","xl"]] # this also works
ordered_data = ["s","m","l","xl"] 

##### ‚û°Ô∏è Initiating Ordinal Encoder with Categories

In [53]:
oe = OrdinalEncoder(categories= [ordered_data] ) 
#categories parameter takes the argument of list of different categories order

In [54]:
oe.fit(df[["Size"]])

In [55]:
df["Size_en"] = oe.transform(df[["Size"]])

In [56]:
df

Unnamed: 0,Size,Size_en
0,s,0.0
1,m,1.0
2,l,2.0
3,xl,3.0
4,s,0.0
5,m,1.0
6,l,2.0
7,s,0.0
8,s,0.0
9,l,2.0


#### üí° Understanding What `.fit_transform()` Does Here

Let‚Äôs break it down:

1. `.fit()` ‚Üí Learns which categories exist in each feature and stores the mapping.

2. `.transform()` ‚Üí Converts the original categories into their corresponding numbers based on that mapping.

3. `.fit_transform()` = Does both in one step.

In [57]:
# Inspect Encoded Mappings
print(oe.categories_)

[array(['s', 'm', 'l', 'xl'], dtype=object)]


#### üßæ Decode Back (Inverse Transform)

In [58]:
df["Decoded_data"] = oe.inverse_transform(df[['Size_en']]).flatten()
df

Unnamed: 0,Size,Size_en,Decoded_data
0,s,0.0,s
1,m,1.0,m
2,l,2.0,l
3,xl,3.0,xl
4,s,0.0,s
5,m,1.0,m
6,l,2.0,l
7,s,0.0,s
8,s,0.0,s
9,l,2.0,l


### üåü Important NOTE:

- #### When you assign a value to a new column in a pandas DataFrame `(df["Decoded_data"] = value)`, pandas expects value to be a 1-dimensional array (a pandas Series or a NumPy array with shape `(n_samples,)`). 
- #### A 2D array cannot be directly assigned to a single column. The **`ValueError: 2 often indicates that the object being assigned has ndim=2`.**
  
### ‚û°Ô∏è üåü üåü üåü This Pandas Lib üêº is a Nigga : it is smart enough to `.flatten()` the **2-D numpy array** returned by **`.fit_transform`** if it has numerical data But in case of the **Categorical Data** it becomes **Dumb as a sloth** it is not able to infer / flatten the 2D numpy array of categorical data returned by **`.inverse_transform()`** thats why before adding it to the new column in the dataFrame we either have to flatten it or make it into a pd.series or dataframe

### ‚ö†Ô∏è Common Mistakes

| Mistake                                        | Why it‚Äôs a Problem                                                       |
| ---------------------------------------------- | ------------------------------------------------------------------------ |
| Using Ordinal Encoding on unordered categories | Creates fake relationships (e.g. Male=0, Female=1 implies Female > Male) |
| Forgetting to define custom order              | Defaults to alphabetical order (which may be meaningless)                |
| Encoding training and test data separately     | Category mapping may mismatch between them                               |


**üëâ Always use the same encoder instance (fit on train, transform on test).**



### üìä Comparison with Other Encoders

| Encoder Type    | Preserves Order? | Suitable for          | Output Example                  |
| --------------- | ---------------- | --------------------- | ------------------------------- |
| Label Encoder   | ‚ùå No             | Target/Single Feature | `Red ‚Üí 0, Blue ‚Üí 1`             |
| Ordinal Encoder | ‚úÖ Yes            | Ordered Features      | `Low ‚Üí 0, Medium ‚Üí 1, High ‚Üí 2` |
| One-Hot Encoder | ‚ùå No             | Unordered Features    | `[1,0,0], [0,1,0], [0,0,1]`     |


In [59]:
#  one more example from loan.csv
dataset = pd.read_csv("loan_data_set.csv")

dataset.isnull().sum()

Loan_ID               0
Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64

In [60]:
dataset["Property_Area"].unique()

array(['Urban', 'Rural', 'Semiurban'], dtype=object)

In [61]:
dataset["Property_Area"].fillna(dataset["Property_Area"].mode()[0],inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  dataset["Property_Area"].fillna(dataset["Property_Area"].mode()[0],inplace = True)


In [62]:
ord_data = ["Rural","Semiurban","Urban"]

In [63]:
oe1 = OrdinalEncoder(categories=[ord_data])

In [64]:
dataset["Property_Area"] = oe1.fit_transform(dataset[["Property_Area"]])

In [65]:
dataset["Property_Area"].unique()

array([2., 0., 1.])