In [2]:
# Q1. In order to predict house price based on several characteristics, such as location, square footage,
# number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this
# situation would be the best to employ?
# Dataset link:

from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Example code (assuming data preprocessing and splitting)
# X_train, X_test, y_train, y_test = train_test_split(...)

# Initialize SVR model
svr = SVR(kernel='rbf', C=1.0, gamma='scale')

# Train SVR model
svr.fit(X_train, y_train)

# Predict house prices
y_pred = svr.predict(X_test)

# Calculate MSE
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.2f}")


In [5]:
# Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as
# your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price
# of a house as accurately as possible?

from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score

# Example code (assuming data preprocessing and splitting)
# X_train, X_test, y_train, y_test = train_test_split(...)

# Initialize SVR model
svr = SVR(kernel='rbf', C=1.0, gamma='scale')

# Train SVR model
svr.fit(X_train, y_train)

# Predict house prices
y_pred = svr.predict(X_test)

# Calculate MSE
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.2f}")

# Calculate R-squared
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2:.2f}")


In [None]:
# Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate
# regression metric to use with your SVM model. Which metric would be the most appropriate in this
# scenario?

# When dealing with a dataset that contains a significant number of outliers, the most appropriate regression metric to use with your SVM model would be the **Mean Absolute Error (MAE)**.

# ### Reasoning:

# 1. **Mean Absolute Error (MAE)**:
#    - **Definition**: MAE measures the average absolute difference between predicted values and actual values.
#    - **Robustness to Outliers**: MAE is less sensitive to outliers compared to other metrics like Mean Squared Error (MSE) because it does not involve squaring the differences.
#    - **Interpretation**: MAE provides a more balanced view of prediction errors across the dataset, giving equal weight to all errors regardless of their magnitude.
#    - **Suitability for Outliers**: Since outliers can significantly affect MSE (due to squaring), MAE is preferred when the dataset contains outliers as it provides a more reliable measure of prediction accuracy.

# 2. **Alternative Metrics**:
#    - **Mean Squared Error (MSE)**: MSE squares the differences between predicted and actual values, making it more sensitive to outliers.
#    - **R-squared (Coefficient of Determination)**: R-squared is a measure of how well the model fits the data but doesn't directly account for outlier effects on prediction accuracy.

# 3. **Application to SVM Regression**:
#    - SVM regression models aim to minimize prediction errors. When outliers are present, using MAE ensures that the evaluation metric remains robust and does not overly penalize the model for predictions that deviate due to outliers.
#    - MAE provides a clearer indication of how well the SVM model predicts the majority of data points, reflecting its robustness in the presence of outliers.

### Example Usage:

```python
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error

# Example code (assuming data preprocessing and splitting)
# X_train, X_test, y_train, y_test = train_test_split(...)

# Initialize SVR model
svr = SVR(kernel='rbf', C=1.0, gamma='scale')

# Train SVR model
svr.fit(X_train, y_train)

# Predict house prices
y_pred = svr.predict(X_test)

# Calculate Mean Absolute Error (MAE)
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error (MAE): {mae:.2f}")
```

# In summary, when working with a dataset that includes significant outliers, choosing MAE as the regression metric for evaluating your SVM model ensures a more robust assessment of prediction accuracy, mitigating the impact of outliers on the evaluation process.

In [None]:
# Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best
# metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values
# are very close. Which metric should you choose to use in this case?

# When both Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) values are very close after evaluating an SVM regression model with a polynomial kernel, **RMSE** would be the more appropriate metric to choose. Here's why:

# ### Reasoning:

# 1. **Root Mean Squared Error (RMSE)**:
#    - **Definition**: RMSE is the square root of the average of squared differences between predicted values and actual values.
#    - **Interpretation**: RMSE is directly interpretable in the same units as the target variable (house prices in this case), which makes it easier to understand in practical terms.
#    - **Preferability**: While MSE and RMSE often yield similar values, RMSE provides a clearer indication of the average magnitude of errors in the predicted values due to its square root transformation.
#    - **Sensitivity to Large Errors**: RMSE is more sensitive to large errors compared to MSE, which might be crucial for understanding the impact of larger prediction errors in a polynomial SVM model.

# 2. **Mean Squared Error (MSE)**:
#    - **Definition**: MSE is the average of squared differences between predicted values and actual values.
#    - **Interpretation**: MSE quantifies the average squared deviation between predicted and actual values, but its squared nature makes it less directly interpretable in the context of prediction errors.

# 3. **Application to SVM Regression**:
#    - SVM regression models aim to minimize the error between predicted and actual values. When using a polynomial kernel, which can sometimes lead to non-linear relationships between features and targets, RMSE provides a more intuitive measure of how well the model's predictions align with the actual values.
#    - RMSE's ability to directly indicate the typical size of errors in prediction (in the same units as the target variable) makes it preferable for assessing the performance of SVM regression models.

### Example Usage:

# ```python
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Example code (assuming data preprocessing and splitting)
# X_train, X_test, y_train, y_test = train_test_split(...)

# Initialize SVR model with polynomial kernel
svr_poly = SVR(kernel='poly', degree=3, C=1.0, epsilon=0.1)

# Train SVR model
svr_poly.fit(X_train, y_train)

# Predict house prices
y_pred = svr_poly.predict(X_test)

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.2f}")

# Calculate Root Mean Squared Error (RMSE)
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
```

# In this scenario, even though MSE and RMSE are close, RMSE would provide a slightly better perspective on the average prediction error due to its direct interpretation in the units of the target variable. Therefore, RMSE should be chosen to evaluate the performance of your SVM regression model with a polynomial kernel.

In [None]:
# Q5. You are comparing the performance of different SVM regression models using different kernels (linear,
# polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most
# appropriate if your goal is to measure how well the model explains the variance in the target variable?

If your goal is to measure how well the SVM regression models explain the variance in the target variable across different kernels (linear, polynomial, and RBF), the most appropriate evaluation metric would be **R-squared (Coefficient of Determination)**.

### Reasoning:

1. **R-squared (Coefficient of Determination)**:
   - **Definition**: R-squared is a statistical measure that represents the proportion of the variance in the dependent variable (target variable) that is explained by the independent variables (features) in the model.
   - **Interpretation**: R-squared values range from 0 to 1, where:
     - 0 indicates that the model does not explain any of the variance in the target variable.
     - 1 indicates that the model perfectly explains the variance in the target variable.
   - **Suitability**: R-squared is particularly useful when comparing models with different kernels (linear, polynomial, RBF) as it provides a standardized measure of how well each model fits the data.
   - **Comparative Analysis**: Higher R-squared values indicate better model fit and better ability of the model to explain the variance in the target variable.

2. **Alternative Metrics**:
   - **Mean Squared Error (MSE)**: Measures the average squared difference between predicted values and actual values. It doesn't directly measure how well the model explains variance but rather the magnitude of prediction errors.
   - **Root Mean Squared Error (RMSE)**: Similar to MSE but interpretable in the same units as the target variable. Like MSE, it focuses on prediction error magnitude rather than variance explanation.
   - **Mean Absolute Error (MAE)**: Measures the average absolute difference between predicted values and actual values. Similar to MSE and RMSE, it focuses on error magnitude rather than variance explanation.

3. **Application to SVM Regression**:
   - SVM regression models aim to minimize prediction errors. When comparing different kernels, R-squared helps in understanding which kernel type (linear, polynomial, RBF) provides the best fit to the variance in the target variable.
   - R-squared is especially valuable when interpreting complex relationships captured by non-linear kernels like polynomial and RBF, as it directly quantifies the amount of variance explained beyond simple linear relationships.

### Example Usage:

```python
from sklearn.svm import SVR
from sklearn.metrics import r2_score

# Example code (assuming data preprocessing and splitting)
# X_train, X_test, y_train, y_test = train_test_split(...)

# Initialize SVM regression models with different kernels
svr_linear = SVR(kernel='linear', C=1.0)
svr_poly = SVR(kernel='poly', degree=3, C=1.0, epsilon=0.1)
svr_rbf = SVR(kernel='rbf', C=1.0, gamma='scale')

# Train SVM models
svr_linear.fit(X_train, y_train)
svr_poly.fit(X_train, y_train)
svr_rbf.fit(X_train, y_train)

# Predict house prices for each model
y_pred_linear = svr_linear.predict(X_test)
y_pred_poly = svr_poly.predict(X_test)
y_pred_rbf = svr_rbf.predict(X_test)

# Calculate R-squared for each model
r2_linear = r2_score(y_test, y_pred_linear)
r2_poly = r2_score(y_test, y_pred_poly)
r2_rbf = r2_score(y_test, y_pred_rbf)

print(f"R-squared (Linear Kernel): {r2_linear:.2f}")
print(f"R-squared (Polynomial Kernel): {r2_poly:.2f}")
print(f"R-squared (RBF Kernel): {r2_rbf:.2f}")
```

In conclusion, when evaluating how well SVM regression models explain the variance in the target variable across different kernels, R-squared provides a comprehensive and standardized metric to assess model performance. It helps in comparing and selecting the kernel type that best fits the data and explains the variance in the target variable effectively.