In [35]:
import numpy as np
import pandas as pd

In [36]:
df = pd.read_csv('customer.csv')

In [37]:
df.sample(5)

Unnamed: 0,age,gender,review,education,purchased
47,38,Female,Good,PG,Yes
5,31,Female,Average,School,Yes
42,30,Female,Good,PG,Yes
30,73,Male,Average,UG,No
18,19,Male,Good,School,No


In [38]:
df = df.iloc[:,2:]

In [39]:
df

Unnamed: 0,review,education,purchased
0,Average,School,No
1,Poor,UG,No
2,Good,PG,No
3,Good,PG,No
4,Average,UG,No
5,Average,School,Yes
6,Good,School,No
7,Poor,School,Yes
8,Average,UG,No
9,Good,UG,Yes


In [40]:
df.head()

Unnamed: 0,review,education,purchased
0,Average,School,No
1,Poor,UG,No
2,Good,PG,No
3,Good,PG,No
4,Average,UG,No


In [41]:
from sklearn.model_selection import train_test_split


In [42]:
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:, 0:2], df.iloc[:,-1],test_size=0.2)

In [43]:
from sklearn.preprocessing import OrdinalEncoder

In [44]:
df['review'].value_counts()

review
Poor       18
Good       18
Average    14
Name: count, dtype: int64

In [45]:
df['education'].value_counts()

education
PG        18
School    16
UG        16
Name: count, dtype: int64

In [46]:
oe = OrdinalEncoder(categories=[['Good', 'Average', 'Poor'], ['PG', 'UG', 'School']])

### OrdinalEncoder Explained

- **Purpose**: Converts ordinal categorical variables into numerical forma
  ```
- **Categories**: 
  - First feature: `['Good', 'Average', 'Poor']` (Good = 0, Average = 1, Poor = 2)
  - Second feature: `['PG', 'UG', 'School']` (PG = 0, UG = 1, School = 2)
- **When to Use**: For categories with a meaningful order.

In [47]:
oe

In [48]:
oe.fit(X_train)

In [49]:
X_train_trf = oe.transform(X_train)

In [50]:
X_train_trf

array([[1., 1.],
       [2., 2.],
       [2., 0.],
       [1., 0.],
       [2., 0.],
       [1., 0.],
       [0., 2.],
       [1., 0.],
       [0., 2.],
       [2., 1.],
       [1., 2.],
       [2., 2.],
       [0., 2.],
       [1., 2.],
       [2., 2.],
       [2., 0.],
       [0., 0.],
       [1., 1.],
       [2., 2.],
       [0., 0.],
       [0., 0.],
       [1., 2.],
       [0., 2.],
       [2., 1.],
       [0., 2.],
       [0., 1.],
       [1., 1.],
       [0., 1.],
       [1., 1.],
       [2., 0.],
       [2., 0.],
       [1., 2.],
       [1., 1.],
       [0., 1.],
       [2., 2.],
       [0., 1.],
       [2., 0.],
       [2., 0.],
       [0., 0.],
       [2., 0.]])

In [51]:
oe.categories_

[array(['Good', 'Average', 'Poor'], dtype=object),
 array(['PG', 'UG', 'School'], dtype=object)]

In [52]:
from sklearn.preprocessing import LabelEncoder

In [53]:
le = LabelEncoder()

In [54]:
le.fit(y_train)

In [55]:
le.classes_

array(['No', 'Yes'], dtype=object)

In [56]:
y_train

29    Yes
35    Yes
46     No
21     No
27     No
37    Yes
6      No
24    Yes
40     No
15     No
0      No
7     Yes
38     No
5     Yes
31    Yes
22    Yes
2      No
32    Yes
12     No
42    Yes
41    Yes
13     No
25     No
17    Yes
23     No
11    Yes
44     No
48    Yes
8      No
43     No
14    Yes
34     No
4      No
36    Yes
28     No
9     Yes
39     No
45    Yes
3      No
26     No
Name: purchased, dtype: object

In [59]:
y_train_trf = le.transform(y_train)
y_test_trf = le.transform(y_test)

In [60]:
y_train_trf

array([1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0,
       0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0])

In [61]:
y_test_trf

array([1, 1, 1, 0, 0, 0, 0, 1, 1, 1])

The `OrdinalEncoder` is a tool from the `sklearn.preprocessing` module in scikit-learn, used for converting categorical features into a format that can be used by machine learning algorithms. 

Here's a breakdown of the code snippet you provided:

```python
oe = OrdinalEncoder(categories=[['Good', 'Average', 'Poor'], ['PG', 'UG', 'School']])
```

### Key Components:

1. **OrdinalEncoder**: This is the class used for encoding ordinal categorical variables, meaning that the categories have a meaningful order.

2. **categories**: This parameter allows you to specify the order of the categories for each feature. In your example:
   - The first feature has categories: `['Good', 'Average', 'Poor']`. This indicates that "Good" is the highest rank, followed by "Average," and "Poor" is the lowest.
   - The second feature has categories: `['PG', 'UG', 'School']`, where "PG" (Postgraduate) is the highest rank, followed by "UG" (Undergraduate), and "School" is the lowest.

### Usage:

When you fit this encoder on your dataset, it will transform the categorical values into integers based on the specified order. For example:
- "Good" → 0
- "Average" → 1
- "Poor" → 2
- "PG" → 0
- "UG" → 1
- "School" → 2

This numerical representation allows machine learning models to understand and utilize the ordinal relationships in the data.

### When to Use:

Use `OrdinalEncoder` when you have categorical data where the categories have a clear ranking or order, as opposed to nominal data where there is no inherent order (like colors or types of fruit).

The `OrdinalEncoder` is a tool from the `sklearn.preprocessing` module in scikit-learn, used for converting categorical features into a format that can be used by machine learning algorithms. 

Here's a breakdown of the code snippet you provided:

### Key Components:

1. **OrdinalEncoder**: This is the class used for encoding ordinal categorical variables, meaning that the categories have a meaningful order.

2. **categories**: This parameter allows you to specify the order of the categories for each feature. In your example:
   - The first feature has categories: `['Good', 'Average', 'Poor']`. This indicates that "Good" is the highest rank, followed by "Average," and "Poor" is the lowest.
   - The second feature has categories: `['PG', 'UG', 'School']`, where "PG" (Postgraduate) is the highest rank, followed by "UG" (Undergraduate), and "School" is the lowest.

### Usage:

When you fit this encoder on your dataset, it will transform the categorical values into integers based on the specified order. For example:
- "Good" → 0
- "Average" → 1
- "Poor" → 2
- "PG" → 0
- "UG" → 1
- "School" → 2

This numerical representation allows machine learning models to understand and utilize the ordinal relationships in the data.

### When to Use:

Use `OrdinalEncoder` when you have categorical data where the categories have a clear ranking or order, as opposed to nominal data where there is no inherent order (like colors or types of fruit).