# **Ordinal Encoding** on **ZOMATO** dataset

Documentation : https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html

- The input to this transformer should be **an array-like of integers or strings,** denoting the values **taken on by categorical (discrete) features.**

- The **features** are converted to **ordinal integers.**

- ***This results in a single column of integers (0 to n_categories - 1) per feature.***

In [665]:
import pandas as pd

In [666]:
df = pd.read_csv("C:\\Users\\pcoda\\OneDrive\\Dataset{Manually}\\zomato.csv", encoding='latin1')
df

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",...,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.584450,"Japanese, Korean",...,Botswana Pula(P),Yes,No,No,No,4,4.8,Dark Green,Excellent,229
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9546,5915730,NamlÛ± Gurme,208,ÛÁstanbul,"Kemankeô Karamustafa Paôa Mahallesi, RÛ±htÛ±...",Karakí_y,"Karakí_y, ÛÁstanbul",28.977392,41.022793,Turkish,...,Turkish Lira(TL),No,No,No,No,3,4.1,Green,Very Good,788
9547,5908749,Ceviz AÛôacÛ±,208,ÛÁstanbul,"Koôuyolu Mahallesi, Muhittin íìstí_ndaÛô Cadd...",Koôuyolu,"Koôuyolu, ÛÁstanbul",29.041297,41.009847,"World Cuisine, Patisserie, Cafe",...,Turkish Lira(TL),No,No,No,No,3,4.2,Green,Very Good,1034
9548,5915807,Huqqa,208,ÛÁstanbul,"Kuruí_eôme Mahallesi, Muallim Naci Caddesi, N...",Kuruí_eôme,"Kuruí_eôme, ÛÁstanbul",29.034640,41.055817,"Italian, World Cuisine",...,Turkish Lira(TL),No,No,No,No,4,3.7,Yellow,Good,661
9549,5916112,Aôôk Kahve,208,ÛÁstanbul,"Kuruí_eôme Mahallesi, Muallim Naci Caddesi, N...",Kuruí_eôme,"Kuruí_eôme, ÛÁstanbul",29.036019,41.057979,Restaurant Cafe,...,Turkish Lira(TL),No,No,No,No,4,4.0,Green,Very Good,901


In [667]:
df.shape

(9551, 21)

In [668]:
df.dtypes

Restaurant ID             int64
Restaurant Name          object
Country Code              int64
City                     object
Address                  object
Locality                 object
Locality Verbose         object
Longitude               float64
Latitude                float64
Cuisines                 object
Average Cost for two      int64
Currency                 object
Has Table booking        object
Has Online delivery      object
Is delivering now        object
Switch to order menu     object
Price range               int64
Aggregate rating        float64
Rating color             object
Rating text              object
Votes                     int64
dtype: object

In [669]:
df['Rating color'].unique()

array(['Dark Green', 'Green', 'Yellow', 'Orange', 'White', 'Red'],
      dtype=object)

In [670]:
df['Rating text'].unique()

array(['Excellent', 'Very Good', 'Good', 'Average', 'Not rated', 'Poor'],
      dtype=object)

In [671]:
from sklearn.preprocessing import OrdinalEncoder
from sklearn.model_selection import train_test_split

In [672]:
x_train, x_test, y_train, y_test = train_test_split(df.iloc[:, 18:20], df.iloc[:, 12:14], test_size=0.2)

In [673]:
x_train

Unnamed: 0,Rating color,Rating text
8937,Orange,Average
9392,Dark Green,Excellent
351,Dark Green,Excellent
8393,White,Not rated
911,Orange,Average
...,...,...
4214,Yellow,Good
251,Dark Green,Excellent
6017,Orange,Average
4377,White,Not rated


In [674]:
y_train

Unnamed: 0,Has Table booking,Has Online delivery
8937,No,Yes
9392,Yes,No
351,No,No
8393,No,No
911,No,No
...,...,...
4214,No,No
251,No,No
6017,Yes,No
4377,No,No


In [675]:
oe = OrdinalEncoder(categories=[['Dark Green', 'Green', 'Yellow', 'Orange', 'White', 'Red'],['Excellent', 'Very Good', 'Good', 'Average', 'Not rated', 'Poor']])
oe

In [676]:
x_train_encoded = oe.fit_transform(x_train)
x_train_encoded

array([[3., 3.],
       [0., 0.],
       [0., 0.],
       ...,
       [3., 3.],
       [4., 4.],
       [3., 3.]])

In [677]:
oe.categories # Give all categories 

[['Dark Green', 'Green', 'Yellow', 'Orange', 'White', 'Red'],
 ['Excellent', 'Very Good', 'Good', 'Average', 'Not rated', 'Poor']]

In [678]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

In [679]:
for cols in y_train.columns:
    lev = le.fit_transform(y_train[cols])
    y_train[cols] = lev
    
y_train

Unnamed: 0,Has Table booking,Has Online delivery
8937,0,1
9392,1,0
351,0,0
8393,0,0
911,0,0
...,...,...
4214,0,0
251,0,0
6017,1,0
4377,0,0


# we do ordinal encoding 