In [None]:


# ### Q1. In order to predict house price based on several characteristics, such as location, square footage, number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this situation would be the best to employ?

# The most appropriate regression metric for predicting house prices accurately would be the **Mean Squared Error (MSE)**. MSE is a good measure of the average magnitude of errors in predictions without considering their direction. It is especially useful when you want to penalize larger errors more significantly, which is often the case in price prediction tasks.

# ### Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price of a house as accurately as possible?

# If the goal is to predict the actual price of a house as accurately as possible, the **Mean Squared Error (MSE)** would be more appropriate. MSE directly measures the average squared difference between predicted and actual values, making it a better choice for accuracy in price prediction.

# ### Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate regression metric to use with your SVM model. Which metric would be the most appropriate in this scenario?

# When dealing with a dataset with significant outliers, the **Mean Absolute Error (MAE)** is often more appropriate. MAE measures the average magnitude of errors in predictions without considering their direction and is less sensitive to outliers compared to MSE.

# ### Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values are very close. Which metric should you choose to use in this case?

# If MSE and RMSE values are very close, you can use either metric. However, **Root Mean Squared Error (RMSE)** might be preferred because it is in the same units as the target variable (e.g., dollars for house prices), making it more interpretable.

# ### Q5. You are comparing the performance of different SVM regression models using different kernels (linear, polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most appropriate if your goal is to measure how well the model explains the variance in the target variable?

# If the goal is to measure how well the model explains the variance in the target variable, **R-squared (R²)** is the most appropriate metric. R² indicates the proportion of the variance in the dependent variable that is predictable from the independent variables.

# ### Implementation of SVM Regression Model

# Let's implement an SVM regression model using a dataset (link provided) and evaluate it using the appropriate metrics.

# #### Step-by-Step Implementation

# 1. **Import the necessary libraries and load the dataset**.
# 2. **Split the dataset into training and testing sets**.
# 3. **Preprocess the data**.
# 4. **Create and train an SVM regression model**.
# 5. **Evaluate the model using MSE, R², and other appropriate metrics**.
# 6. **Tune the hyperparameters of the model**.
# 7. **Save the trained model**.

# Here is the Python code to accomplish these steps:

# ```python
# import numpy as np
# import pandas as pd
# from sklearn.model_selection import train_test_split, GridSearchCV
# from sklearn.preprocessing import StandardScaler
# from sklearn.svm import SVR
# from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# import joblib

# # Load the dataset
# url = "https://drive.google.com/uc?id=1Z9oLpmt6IDRNw7IeNcHYTGeJRYypRSC0"
# data = pd.read_csv(url)

# # Display the first few rows of the dataset
# print(data.head())

# # Split the dataset into features and target variable
# X = data.drop(columns=['Price'])
# y = data['Price']

# # Split the dataset into training and testing sets
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# # Standardize the features
# scaler = StandardScaler()
# X_train = scaler.fit_transform(X_train)
# X_test = scaler.transform(X_test)

# # Create and train an SVM regression model with a polynomial kernel
# svr_poly = SVR(kernel='poly', degree=3, C=1.0, epsilon=0.1)
# svr_poly.fit(X_train, y_train)

# # Predict the target variable for the testing set
# y_pred = svr_poly.predict(X_test)

# # Evaluate the model
# mse = mean_squared_error(y_test, y_pred)
# rmse = np.sqrt(mse)
# mae = mean_absolute_error(y_test, y_pred)
# r2 = r2_score(y_test, y_pred)

# print(f'MSE: {mse:.2f}')
# print(f'RMSE: {rmse:.2f}')
# print(f'MAE: {mae:.2f}')
# print(f'R-squared: {r2:.2f}')

# # Tune the hyperparameters of the model using GridSearchCV
# param_grid = {
#     'C': [0.1, 1, 10, 100],
#     'epsilon': [0.01, 0.1, 0.2, 0.5],
#     'degree': [2, 3, 4]
# }
# grid_search = GridSearchCV(SVR(kernel='poly'), param_grid, cv=5, scoring='neg_mean_squared_error')
# grid_search.fit(X_train, y_train)

# # Print the best parameters and best score
# print(f'Best parameters: {grid_search.best_params_}')
# print(f'Best score: {grid_search.best_score_}')

# # Train the tuned model on the entire dataset
# best_svr = grid_search.best_estimator_
# best_svr.fit(X_train, y_train)

# # Save the trained model to a file
# joblib.dump(best_svr, 'svr_poly_model.pkl')

# # Predict and evaluate the tuned model
# y_pred_tuned = best_svr.predict(X_test)
# mse_tuned = mean_squared_error(y_test, y_pred_tuned)
# rmse_tuned = np.sqrt(mse_tuned)
# mae_tuned = mean_absolute_error(y_test, y_pred_tuned)
# r2_tuned = r2_score(y_test, y_pred_tuned)

# print(f'Tuned MSE: {mse_tuned:.2f}')
# print(f'Tuned RMSE: {rmse_tuned:.2f}')
# print(f'Tuned MAE: {mae_tuned:.2f}')
# print(f'Tuned R-squared: {r2_tuned:.2f}')
# ```

