**Q1. In order to predict house price based on several characteristics, such as location, square footage, number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this situation would be the best to employ?**

In the context of house price prediction, Mean Squared Error (MSE) is a commonly used regression metric. MSE measures the average squared difference between predicted and actual values. For house prices, where the goal is to minimize the difference between predicted and actual prices, MSE provides a meaningful and interpretable measure of model performance.

**Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price of a house as accurately as possible?**

For predicting the actual price of a house as accurately as possible, MSE is more appropriate. MSE directly measures the average squared difference between predicted and actual values, providing a clear indication of how well the model is performing in terms of prediction accuracy.

**Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate regression metric to use with your SVM model. Which metric would be the most appropriate in this scenario?**

In the presence of outliers, Mean Absolute Error (MAE) can be a more robust metric compared to MSE. MAE measures the average absolute difference between predicted and actual values, and it is less sensitive to extreme values. This makes MAE a suitable choice when dealing with datasets that have a significant number of outliers.

**Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values are very close. Which metric should you choose to use in this case?**

Since both MSE and RMSE are very close, you can choose either of them as a measure of model performance. RMSE (Root Mean Squared Error) is simply the square root of MSE and provides a measure of the average magnitude of errors in the same units as the target variable. If you prefer the errors to be in the same units as your target variable, you might choose RMSE.

**Q5. You are comparing the performance of different SVM regression models using different kernels (linear, polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most appropriate if your goal is to measure how well the model explains the variance in the target variable?**

R-squared (coefficient of determination) is a suitable metric if your goal is to measure how well the model explains the variance in the target variable. R-squared provides a proportion of the variance in the dependent variable that is predictable from the independent variables. A higher R-squared indicates a better-explained variance, and it is often used for assessing the goodness of fit in regression models.

In [2]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

# Load the dataset
file_path = "C://Programming//coding//Pwskills/Excel files//Bengaluru_House_Data.csv"
data = pd.read_csv(file_path)


In [3]:
data.head()

Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price
0,Super built-up Area,19-Dec,Electronic City Phase II,2 BHK,Coomee,1056,2.0,1.0,39.07
1,Plot Area,Ready To Move,Chikka Tirupathi,4 Bedroom,Theanmp,2600,5.0,3.0,120.0
2,Built-up Area,Ready To Move,Uttarahalli,3 BHK,,1440,2.0,3.0,62.0
3,Super built-up Area,Ready To Move,Lingadheeranahalli,3 BHK,Soiewre,1521,3.0,1.0,95.0
4,Super built-up Area,Ready To Move,Kothanur,2 BHK,,1200,2.0,1.0,51.0


In [4]:
data.dtypes

area_type        object
availability     object
location         object
size             object
society          object
total_sqft       object
bath            float64
balcony         float64
price           float64
dtype: object

In [6]:
data.isnull().sum()

area_type          0
availability       0
location           1
size              16
society         5502
total_sqft         0
bath              73
balcony          609
price              0
dtype: int64

In [7]:
data.shape

(13320, 9)

In [17]:
data.nunique()

area_type          4
availability      81
location        1305
size              31
society         2688
total_sqft      2117
bath              19
balcony            4
price           1994
dtype: int64

In [18]:
data['area_type'].unique()

array(['Super built-up  Area', 'Plot  Area', 'Built-up  Area',
       'Carpet  Area'], dtype=object)

In [13]:
data['size'].unique()

array(['2 BHK', '4 Bedroom', '3 BHK', '4 BHK', '6 Bedroom', '3 Bedroom',
       '1 BHK', '1 RK', '1 Bedroom', '8 Bedroom', '2 Bedroom',
       '7 Bedroom', '5 BHK', '7 BHK', '6 BHK', '5 Bedroom', '11 BHK',
       '9 BHK', nan, '9 Bedroom', '27 BHK', '10 Bedroom', '11 Bedroom',
       '10 BHK', '19 BHK', '16 BHK', '43 Bedroom', '14 BHK', '8 BHK',
       '12 Bedroom', '13 BHK', '18 Bedroom'], dtype=object)

In [16]:
data['balcony'].unique()

array([ 1.,  3., nan,  2.,  0.])