# SVM-3


Q1. In order to predict house price based on several characteristics, such as location, square footage,
number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this
situation would be the best to employ?

Dataset link: https://drive.google.com/file/d/1Z9oLpmt6IDRNw7IeNcHYTGeJRYypRSC0/view?usp=share_link

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
%matplotlib inline
warnings.filterwarnings('ignore')

In [2]:
url = 'https://drive.google.com/file/d/1Z9oLpmt6IDRNw7IeNcHYTGeJRYypRSC0/view?usp=share_link'
file_id=url.split('/')[-2]
dwn_url='https://drive.google.com/uc?id=' + file_id
df = pd.read_csv(dwn_url)
df.head()

Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price
0,Super built-up Area,19-Dec,Electronic City Phase II,2 BHK,Coomee,1056,2.0,1.0,39.07
1,Plot Area,Ready To Move,Chikka Tirupathi,4 Bedroom,Theanmp,2600,5.0,3.0,120.0
2,Built-up Area,Ready To Move,Uttarahalli,3 BHK,,1440,2.0,3.0,62.0
3,Super built-up Area,Ready To Move,Lingadheeranahalli,3 BHK,Soiewre,1521,3.0,1.0,95.0
4,Super built-up Area,Ready To Move,Kothanur,2 BHK,,1200,2.0,1.0,51.0


When developing an SVM (Support Vector Machine) regression model to predict house prices based on various characteristics like location, square footage, number of bedrooms, etc., we typically want to choose an appropriate regression metric to evaluate the model's performance. The choice of metric depends on the specific goals and requirements of our project. 

MAE (Mean Absolute Error) and RMSE (Root Mean Square Error) can be used as they provide easily interpretable measures of prediction error in the same units as the target variable.


Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as
your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price
of a house as accurately as possible?

Ans. If our primary goal is to predict the actual price of a house as accurately as possible, then Mean Squared Error (MSE) would be the more appropriate evaluation metric to use.

MSE directly measures the accuracy of predictions: MSE quantifies the average squared difference between our model's predictions and the actual prices. It heavily penalizes larger errors because it squares the differences. This means that minimizing MSE encourages our SVM regression model to make predictions that are as close as possible to the actual prices.

On the other hand, R-squared (R²) measures the proportion of variance in the target variable that is explained by the model. While R-squared provides valuable information about the goodness of fit and the ability of your model to explain the variation in the data, it may not directly convey the level of accuracy in predicting individual house prices. A high R-squared value does not necessarily mean that the model is making precise price predictions.


Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate
regression metric to use with your SVM model. Which metric would be the most appropriate in this
scenario?

Ans. When we have a dataset with a significant number of outliers, it's often more appropriate to use Mean Absolute Error (MAE) as our regression metric rather than Mean Squared Error (MSE) or Root Mean Squared Error (RMSE). Here's why MAE is a better choice in the presence of outliers:

1. **Robustness to Outliers**: MAE is less sensitive to outliers compared to MSE and RMSE. In MSE and RMSE, the squared differences between predicted and actual values can give disproportionately high weight to outliers, leading to inflated error values. In contrast, MAE treats all errors (including those from outliers) equally, providing a more robust measure of prediction accuracy.

2. **Interpretability**: MAE is directly interpretable in the same units as the target variable. This makes it easier to understand the magnitude of errors in the context of your problem, which is useful when dealing with house prices or similar real-world applications.


Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best
metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values
are very close. Which metric should you choose to use in this case?

Ans. When we have calculated both Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) and found that both values are very close, it suggests that the choice of metric may not significantly impact your evaluation in this particular case. Both MSE and RMSE are closely related, with RMSE being the square root of MSE. The main difference between them is that RMSE provides an interpretation of error in the same units as the target variable.

In such a situation where both MSE and RMSE are similar and neither one provides a clear advantage over the other, we can choose either metric based on our preference or convenience:

1. **MSE**: Choose MSE if we prefer to work with a metric that quantifies the average squared error, even if it doesn't have the same units as the target variable. MSE is often used because it simplifies mathematical operations.

2. **RMSE**: Choose RMSE if we prefer to have an error metric that is interpretable in the same units as the target variable. RMSE is beneficial when we want to provide stakeholders with an intuitive understanding of prediction errors.


Q5. You are comparing the performance of different SVM regression models using different kernels (linear,
polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most
appropriate if your goal is to measure how well the model explains the variance in the target variable?

Ans. If our goal is to measure how well the SVM regression models explain the variance in the target variable, the most appropriate evaluation metric to use is the coefficient of determination, often denoted as R-squared (R²). R-squared quantifies the proportion of variance in the target variable that is explained by the model. Here's why R-squared is suitable for this purpose:

1. **Measuring Explained Variance**: R-squared provides a clear indication of the amount of variance in the target variable that is captured by the model. A higher R-squared value indicates that a larger proportion of the variance is explained by the model, which suggests a better fit.

2. **Interpretability**: R-squared is easily interpretable. It ranges from 0 to 1, where 0 means that the model does not explain any variance, and 1 means that the model perfectly explains all of the variance. This makes it easy to understand and compare the explanatory power of different models.

3. **Comparing Models**: When you're comparing SVM regression models with different kernels (linear, polynomial, RBF), R-squared allows you to directly assess how well each model captures the variance in the target variable. It provides a consistent and intuitive way to rank the models in terms of their explanatory performance.

