Q1. In order to predict house price based on several characteristics, such as location, square footage,
number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this
situation would be the best to employ?

In order to predict house price based on several characteristics, an SVM regression model can be developed. The regression metric that would be best to employ in this situation would be the mean squared error (MSE) or root mean squared error (RMSE).

MSE calculates the average squared difference between the predicted and actual values. RMSE is the square root of MSE and is a popular metric as it scales the error to the same units as the target variable. Both metrics measure the accuracy of the regression model by quantifying the differences between predicted and actual values. A lower value of MSE or RMSE indicates a better regression model.

In the case of the benguluru house dataset, MSE or RMSE can be used to evaluate the accuracy of the SVM regression model in predicting the house prices based on various features.

Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as
your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price
of a house as accurately as possible?

If the goal is to predict the actual price of a house as accurately as possible, then the evaluation metric that should be used is MSE (Mean Squared Error). MSE measures the average squared difference between the predicted and actual values, thus providing a measure of the magnitude of errors made by the model. Lower values of MSE indicate better predictive performance, indicating a better fit of the model to the data.

R-squared, on the other hand, is a measure of how much of the variance in the dependent variable is explained by the independent variables in the model. While R-squared can be useful in understanding the overall performance of the model, it does not provide a direct measure of the accuracy of the predicted values. Therefore, in this situation, MSE would be more appropriate as it directly measures the accuracy of the predicted house prices.

Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate
regression metric to use with your SVM model. Which metric would be the most appropriate in this
scenario?


In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [2]:
data = pd.read_csv('/content/Bengaluru_House_Data.csv')

In [3]:
data.head()

Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price
0,Super built-up Area,19-Dec,Electronic City Phase II,2 BHK,Coomee,1056,2.0,1.0,39.07
1,Plot Area,Ready To Move,Chikka Tirupathi,4 Bedroom,Theanmp,2600,5.0,3.0,120.0
2,Built-up Area,Ready To Move,Uttarahalli,3 BHK,,1440,2.0,3.0,62.0
3,Super built-up Area,Ready To Move,Lingadheeranahalli,3 BHK,Soiewre,1521,3.0,1.0,95.0
4,Super built-up Area,Ready To Move,Kothanur,2 BHK,,1200,2.0,1.0,51.0


In [9]:
data = pd.get_dummies(data, columns=[ 'availability','location', 'size', 'society'], drop_first=True)


In [13]:
# Drop the original area column
data.drop('area_type', axis=1, inplace=True)

In [15]:
X = data.drop(['price'], axis=1)
y = data['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [16]:
svm_reg = SVR(kernel='linear')


In [None]:
data = pd.get_dummies(data, columns=['location', 'size', 'availability', 'society'], drop_first=True)

# Convert the area column to a numerical value
data['area_num'] = data['area'].apply(lambda x: float(x.split()[0]) if x.split()[1] == 'Sq. Meter' else float(x.split()[0])*9.0)

# Drop the original area column
data.drop('area', axis=1, inplace=True)

# Split the dataset into training and testing sets
X = data.drop(['price'], axis=1)
y = data['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the SVM regression model
svm_reg = SVR(kernel='linear')

# Fit the model on the training data
svm_reg.fit(X_train, y_train)

# Predict the house prices on the test data
y_pred = svm_reg.predict(X_test)

# Calculate the regression metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

# Print the results
print("MAE:", mae)
print("MSE:", mse)
print("RMSE:", rmse)
print("R-Squared:", r2)