Task-1:Predict Restaurant Ratings

Step-1:Importing all required libraries

In [55]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error,r2_score

Step-2: Loading the Dataset (keeping original restaurant name for final output )
  1. info()
  2. head()
  Details of dataset

In [56]:
df=pd.read_csv("C:/Users/KAVYA/Documents/Dataset .csv")
restaurant_names=df["Restaurant Name"]
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9551 entries, 0 to 9550
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Restaurant ID         9551 non-null   int64  
 1   Restaurant Name       9551 non-null   object 
 2   Country Code          9551 non-null   int64  
 3   City                  9551 non-null   object 
 4   Address               9551 non-null   object 
 5   Locality              9551 non-null   object 
 6   Locality Verbose      9551 non-null   object 
 7   Longitude             9551 non-null   float64
 8   Latitude              9551 non-null   float64
 9   Cuisines              9542 non-null   object 
 10  Average Cost for two  9551 non-null   int64  
 11  Currency              9551 non-null   object 
 12  Has Table booking     9551 non-null   object 
 13  Has Online delivery   9551 non-null   object 
 14  Is delivering now     9551 non-null   object 
 15  Switch to order menu 

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",...,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.58445,"Japanese, Korean",...,Botswana Pula(P),Yes,No,No,No,4,4.8,Dark Green,Excellent,229


Step-3: Preprocessing
1. Drop ID and non-predictive columns
2. Drop rows with missing target
3. Fill missing values with mode or median
4. Encode Categorical Features
5. Define input(X) and output(y)

In [57]:
df.drop(columns=["Restaurant ID","Restaurant Name","Address","Locality","Locality Verbose"],inplace=True)

In [59]:
df["Cuisines"]=df["Cuisines"].fillna(df["Cuisines"].mode()[0],inplace=True)

In [60]:
df=pd.get_dummies(df,columns=["City","Cuisines","Currency","Rating text",
                              "Has Table booking","Has Online delivery","Is delivering now","Switch to order menu","Rating color","Rating text"],drop_first=True)

In [61]:
X=df.drop("Aggregate rating",axis=1)
y=df["Aggregate rating"]

Step-4: splitting and training model
1. train/test split
2. Train a model (Decision Tree Regrssor)
3. Prediction
4. Evaluate
5. Analyzing most influential

In [62]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

In [63]:
model=DecisionTreeRegressor()
model.fit(X_train,y_train)

In [None]:
y_pred=model.predict(X_test)
mse=mean_squared_error(y_test,y_pred)
r2= r2_score(y_test,y_pred)
print("Mean Square Error",mse)
print("R^2 Score:",r2)

Mean Square Error 0.029144747776033462
R^2 Score: 0.9871953637105617


In [67]:
importances=pd.Series(model.feature_importances_,index=X.columns)
important_features=importances.sort_values(ascending=False)
print(important_features.head(10))

Rating text_Not rated      0.896661
Rating color_Orange        0.051527
Rating text_Poor           0.022198
Rating color_Yellow        0.013084
Votes                      0.003979
Longitude                  0.003575
Latitude                   0.003449
Rating text_Very Good      0.002580
Average Cost for two       0.001769
Has Online delivery_Yes    0.000278
dtype: float64


Step-5: Predicting with Names for Display
   1. Predict for full dataset(or just test set)
   2. Add back restaurant names(for display)
   3. View Predictions

In [68]:
df_predictions=X_test.copy()
df_predictions["Predicted Rating"]=model.predict(X_test)

In [69]:
df_predictions["Restaurant Name"]=restaurant_names.iloc[X_test.index].values

In [70]:
print(df_predictions[["Restaurant Name","Predicted Rating"]].head(20))

                Restaurant Name  Predicted Rating
4731                 Wah Ji Wah               2.4
1468        19 Flavours Biryani               4.1
9037         Andaaz E Paranthas               2.6
7866                     Tony's               4.3
5570                 Yummy Adda               3.5
5613                  Tea Point               0.0
7751                   Rainbows               3.2
1662           Bansiwala Sweets               0.0
8592           Green Restaurant               3.7
2164               Me Kong Bowl               4.3
1426              Drifters Cafe               3.7
7403        Jiya Amritsari Naan               0.0
2115         Tandoori KnockOuts               4.1
9100         Chocolate Fountain               0.0
3872              Kay's Chik-In               3.0
1732            Chinar Junction               3.0
2498             Tea Villa Cafe               4.1
5338             Veggie-Licious               3.7
908          The Grillz & Gravy               0.0
