**Task 1: Predictive Modeling**

**Task list:**

1. Build a regression model to predict the aggregate rating of a restaurant based on available features.
   - Split the dataset into training and testing sets and evaluate the model's performance using appropriate metrics.

2. Experiment with different algorithms (e.g., linear regression, decision trees, random forest) and compare their performance.


**Requirements:**

The following will be required for building the different predictive models

a.   **Data Preparation for feeding the model parameters neded to build the models**: train_test_split from sklearn.model_selection, StandardSCaler from sklearn.preprocessing, LabelEncoder from sklearn.preprocessing

b.  **Linear regression**: LinearRegression from sklearn.linear_model, 'mean_squared_error, r2_score' from sklearn.metrics

c. **Decision Tree Regression**: DecisionTreeRegressor from sklearn.tree

d.  **Random Forest Regression**: RandomForestRegressor from sklearn.ensemble

e. Comparing Performance

**1.  Building a regression model to predict the aggregate rating of a restaurant based on Price range, Has Online delivery, and Has Table booking**

a. Data Preparation


In [2]:
# Importing libraries associated with data preparation for model building
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split

# Importing the dataset
new_data = pd.read_csv("new_data.csv")

In [1]:
# Note when creating the features and target
'''
- To create a model we need to apply feature variable(s) on a target variable.

- The target variable is the variable that is to be predicted.

- The feature variable is the variable that is used to make the prediction on the target variable.

- Aggregate rating is the target variable.

- Price range, Has Online delivery, and Has Table booking are the Feature variables.

- All variables must be numeric.  If they are not numeric, they must be encoded
'''

'\n- To create a model we need to apply feature variable(s) on a target variable.\n\n- The target variable is the variable that is to be predicted.\n\n- The feature variable is the variable that is used to make the prediction on the target variable.\n\n- Aggregate rating is the target variable.\n\n- Price range, Has Online delivery, and Has Table booking are the Feature variables.\n\n- All variables must be numeric.  If they are not numeric, they must be encoded\n'

In [6]:
# Exploring the dataset with only the features and target variables in view
feat_and_target = new_data[['Aggregate rating','Price range', 'Has Online delivery', 'Has Table booking']]
feat_and_target

Unnamed: 0,Aggregate rating,Price range,Has Online delivery,Has Table booking
0,4.8,3,No,Yes
1,4.5,3,No,Yes
2,4.4,4,No,Yes
3,4.9,4,No,No
4,4.8,4,No,Yes
...,...,...,...,...
9546,4.1,3,No,No
9547,4.2,3,No,No
9548,3.7,4,No,No
9549,4.0,4,No,No


In [8]:
# Data encoding

'''
The feature variables (Has Online delivery and Has Table booking) are categorical and needs to be encoded to become numerical variables
'''

feat_and_target['Has Online Delivery Encoded'] = LabelEncoder().fit_transform(feat_and_target['Has Online delivery'])

feat_and_target['Has Table Booking Encoded'] = LabelEncoder().fit_transform(feat_and_target['Has Table booking'])

feat_and_target

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  feat_and_target['Has Online Delivery Encoded'] = LabelEncoder().fit_transform(feat_and_target['Has Online delivery'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  feat_and_target['Has Table Booking Encoded'] = LabelEncoder().fit_transform(feat_and_target['Has Table booking'])


Unnamed: 0,Aggregate rating,Price range,Has Online delivery,Has Table booking,Has Online Delivery Encoded,Has Table Booking Encoded
0,4.8,3,No,Yes,0,1
1,4.5,3,No,Yes,0,1
2,4.4,4,No,Yes,0,1
3,4.9,4,No,No,0,0
4,4.8,4,No,Yes,0,1
...,...,...,...,...,...,...
9546,4.1,3,No,No,0,0
9547,4.2,3,No,No,0,0
9548,3.7,4,No,No,0,0
9549,4.0,4,No,No,0,0


In [10]:
# Selecting only the features and target variables
X = feat_and_target[['Price range', 'Has Online Delivery Encoded', 'Has Table Booking Encoded']] # Feature variables
y = feat_and_target['Aggregate rating'] # Target variables

# Splitting the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Data preparation done.  Next is to create a model, example, linear, decision tree, and random forest.

b. Building a Linear Regression model

In [11]:
# Importing libraries associated with building linear regression models
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Initialize and train model
lr = LinearRegression() # Model initialization
lr.fit(X_train_scaled, y_train) # Training the model

# Making the Predictions
y_pred_lr = lr.predict(X_test_scaled)

# Performing metrics
mse_lr = mean_squared_error(y_test, y_pred_lr)
r2_lr = r2_score(y_test, y_pred_lr)

print(f'Linear Regression MSE: {mse_lr:.2f}')
print(f'Linear Regression R^2: {r2_lr:.2f}')


Linear Regression MSE: 1.78
Linear Regression R^2: 0.25


c. Buliding a Decision Tree Regression Model

In [12]:
# Importing libraries necessary for building a decision tree regression model
from sklearn.tree import DecisionTreeRegressor

# Initialize and train model
dt = DecisionTreeRegressor(random_state=0)
dt.fit(X_train, y_train)

# Predictions
y_pred_dt = dt.predict(X_test)

# Performance metrics
mse_dt = mean_squared_error(y_test, y_pred_dt)
r2_dt = r2_score(y_test, y_pred_dt)

print(f'Decision Tree MSE: {mse_dt:.2f}')
print(f'Decision Tree R^2: {r2_dt:.2f}')

Decision Tree MSE: 1.73
Decision Tree R^2: 0.27


d. Building a random forest regression model

In [13]:
# Importing libraries necessary for building a random forest regression model
from sklearn.ensemble import RandomForestRegressor

# Initialize and train model
rf = RandomForestRegressor(n_estimators=100, random_state=0)
rf.fit(X_train, y_train)

# Prediction
y_pred_rf = rf.predict(X_test)

# Performance metrics
mse_rf = mean_squared_error(y_test, y_pred_rf)
r2_rf = r2_score(y_test, y_pred_rf)

print(f'Random Forest MSE: {mse_rf:.2f}')
print(f'Random Forest R^2: {r2_rf:.2f}')

Random Forest MSE: 1.73
Random Forest R^2: 0.27


e. Comparing Performance

The performance of the various models built is observed from Mean Squared Error (MSE) and R-squared (R^2) values.

Lower MSE and higher R^2 indicate better model performance.

In [16]:
# Results of the models
print(f"Linear Regression MSE_lr: {mse_lr:.2f}")
print(f"Linear Regression R^2_lr: {r2_lr:.2f}")
print(f"Decision Tree Regression MSE_dt:{mse_dt:.2f}")
print(f"Decision Tree Regression R^2_dt: {r2_dt:.2f}")
print(f"Random Forest Regression MSE_rf: {mse_rf:.2f}")
print(f"Ranmdom Forest Regression R^2_rf: {r2_rf:.2f}")

Linear Regression MSE_lr: 1.78
Linear Regression R^2_lr: 0.25
Decision Tree Regression MSE_dt:1.73
Decision Tree Regression R^2_dt: 0.27
Random Forest Regression MSE_rf: 1.73
Ranmdom Forest Regression R^2_rf: 0.27


From the above result 1.73 on the Mean Square Error section is the lowest and 0.27 on the R-squared section is the highest.  This suggests that both the Decision Tree Regression model and the Random Forest Regression model has the best predictive performance than the Linear Regression model.