**Restaurant Rating Prediction - Brief Report**

*Author: Lokchandar*

**Introduction:**
The "Restaurant Rating Prediction Web Application" is a Python script that preprocesses restaurant data, trains a Random Forest Regressor model to predict ratings based on various features, and demonstrates how to use the trained model for prediction. This report provides an overview of the application's purpose, features, and implementation.Data set used in this model is a zomato restaurent dataset -bangalore.

**Purpose:**
The primary goal of this application is to predict restaurant ratings based on selected features. The application showcases the implementation of a machine learning model to facilitate rating predictions for restaurants.

**Features:**
1. Data Loading and Preprocessing: The script loads restaurant data from a CSV file, performs data cleaning, and preprocesses the data for training.

2. Model Training: A Random Forest Regressor model is trained using preprocessed data. The model learns patterns from historical restaurant data to predict ratings based on input features.

3. Evaluation: The model's performance is evaluated using Mean Squared Error (MSE), a common metric for regression tasks. The lower the MSE, the better the model's prediction accuracy.

4. Prediction: The script includes an example input with features such as online ordering, table booking, location, rest type, cuisines, cost for two people, and restaurant type. The trained model is used to predict the rating based on this input.

**Implementation Overview:**
- The script uses libraries such as pandas, scikit-learn, and numpy.
- It loads restaurant data from a CSV file and performs data cleaning, including handling missing values.
- Categorical features are preprocessed using one-hot encoding to convert them into numerical format.
- The data is split into training and testing sets to train and evaluate the model.
- A Random Forest Regressor model with 100 trees is trained on the preprocessed data.
- The model's performance is evaluated using Mean Squared Error on the test set.
- An example input is provided to demonstrate how to preprocess input data and predict ratings using the trained model.

**Conclusion:**
The "Restaurant Rating Prediction Web Application" script showcases the complete process of data preprocessing, model training, evaluation, and prediction. It serves as a useful example of how to build a machine learning model for rating prediction based on various restaurant features. By following this example, developers can learn how to implement similar applications to predict outcomes based on historical data.



In [3]:
import pandas as pd
import numpy as np
df = pd.read_csv(r"C:\Users\lokch\OneDrive\Desktop\postGre\never open\zomato.csv")
df = df.drop(['url', 'address', 'phone', 'menu_item', 'dish_liked', 'reviews_list'], axis = 1)
df.drop_duplicates(inplace = True)
def handlerate(value):
    if(value=='NEW' or value=='-'):
        return np.nan
    else:
        value = str(value).split('/')
        value = value[0]
        return float(value)    
df['rate'] = df['rate'].apply(handlerate)
df['rate'].fillna(df['rate'].mean(), inplace = True)
df.dropna(inplace = True)
df.rename(columns = {'approx_cost(for two people)':'Cost2plates', 'listed_in(type)':'Type'}, inplace = True)
df = df.drop(['listed_in(city)'], axis = 1)
df['Cost2plates'].unique()
def handlecomma(value):
    value = str(value)
    if ',' in value:
        value = value.replace(',', '')
        return float(value)
    else:
        return float(value)   
df['Cost2plates'] = df['Cost2plates'].apply(handlecomma)

rest_types = df['rest_type'].value_counts(ascending  = False)

rest_types_lessthan1000 = rest_types[rest_types<1000]

def handle_rest_type(value):
    if(value in rest_types_lessthan1000):
        return 'others'
    else:
        return value
        
df['rest_type'] = df['rest_type'].apply(handle_rest_type)

location = df['location'].value_counts(ascending  = False)

location_lessthan300 = location[location<300]


def handle_location(value):
    if(value in location_lessthan300):
        return 'others'
    else:
        return value
        
df['location'] = df['location'].apply(handle_location)

cuisines = df['cuisines'].value_counts(ascending  = False)

cuisines_lessthan100 = cuisines[cuisines<100]

def handle_cuisines(value):
    if(value in cuisines_lessthan100):
        return 'others'
    else:
        return value
        
df['cuisines'] = df['cuisines'].apply(handle_cuisines)

df.head()



Unnamed: 0,name,online_order,book_table,rate,votes,location,rest_type,cuisines,Cost2plates,Type
0,Jalsa,Yes,Yes,4.1,775,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,Buffet
1,Spice Elephant,Yes,No,4.1,787,Banashankari,Casual Dining,others,800.0,Buffet
2,San Churro Cafe,Yes,No,3.8,918,Banashankari,others,others,800.0,Buffet
3,Addhuri Udupi Bhojana,No,No,3.7,88,Banashankari,Quick Bites,"South Indian, North Indian",300.0,Buffet
4,Grand Village,No,No,3.8,166,Basavanagudi,Casual Dining,others,600.0,Buffet


In [4]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestRegressor

# Split data into features (X) and target (y)
X = df[['online_order', 'book_table', 'location', 'rest_type', 'cuisines', 'Cost2plates', 'Type']]
y = df['rate']

# Convert categorical columns to numerical using one-hot encoding
categorical_columns = ['online_order', 'book_table', 'location', 'rest_type', 'cuisines', 'Type']
preprocessor = ColumnTransformer(transformers=[('cat', OneHotEncoder(), categorical_columns)],
                                 remainder='passthrough')
X_preprocessed = preprocessor.fit_transform(X)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_preprocessed, y, test_size=0.2, random_state=42)

# Create and train a Random Forest Regressor model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model using mean squared error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Example input for prediction
example_input = {
    'online_order': 'Yes',
    'book_table': 'Yes',
    'location': 'Banashankari',
    'rest_type': 'Casual Dining',
    'cuisines': 'North Indian, Mughlai, Chinese',
    'Cost2plates': 800.0,
    'Type': 'Buffet'
}

# Preprocess the example input using the same preprocessor
example_input_preprocessed = preprocessor.transform(pd.DataFrame([example_input]))
# Predict the rate for the example input
predicted_rate = model.predict(example_input_preprocessed)
print(f"Predicted Rate: {predicted_rate[0]:.2f}")


Mean Squared Error: 0.05
Predicted Rate: 4.10
