
### Q1: Preprocess the Dataset

1. **Load the Data**

```python
import pandas as pd

# Load the dataset
url = 'https://drive.google.com/uc?id=1bGoIE4Z2kG5nyh-fGZAJ7LH0ki3UfmSJ'
data = pd.read_csv(url)
```

2. **Handle Missing Values**

```python
# Check for missing values
print(data.isnull().sum())

# Fill missing values or drop them
data.fillna(method='ffill', inplace=True)  # or use other imputation methods
```

3. **Encode Categorical Variables**

```python
from sklearn.preprocessing import LabelEncoder

# Initialize label encoder
label_encoder = LabelEncoder()

# List of categorical columns
categorical_cols = ['sex', 'cp', 'restecg', 'slope', 'thal']

# Apply encoding
for col in categorical_cols:
    data[col] = label_encoder.fit_transform(data[col])
```

4. **Scale Numerical Features (if necessary)**

```python
from sklearn.preprocessing import StandardScaler

# Initialize the scaler
scaler = StandardScaler()

# List of numerical columns
numerical_cols = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak']

# Apply scaling
data[numerical_cols] = scaler.fit_transform(data[numerical_cols])
```

### Q2: Split the Dataset

```python
from sklearn.model_selection import train_test_split

# Separate features and target
X = data.drop('target', axis=1)  # 'target' is the column with labels
y = data['target']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```

### Q3: Train a Random Forest Classifier

```python
from sklearn.ensemble import RandomForestClassifier

# Initialize the model
rf_clf = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)

# Train the model
rf_clf.fit(X_train, y_train)
```

### Q4: Evaluate the Model

```python
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Make predictions
y_pred = rf_clf.predict(X_test)

# Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')
print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')
print(f'F1 Score: {f1:.2f}')

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print(f'Confusion Matrix:\n{conf_matrix}')
```

### Q5: Feature Importance and Visualization

```python
import matplotlib.pyplot as plt

# Get feature importances
importances = rf_clf.feature_importances_
features = X.columns

# Create a DataFrame for visualization
feature_importances = pd.DataFrame({'Feature': features, 'Importance': importances})
feature_importances = feature_importances.sort_values(by='Importance', ascending=False)

# Plot the top 5 features
top_5_features = feature_importances.head(5)
plt.figure(figsize=(10, 6))
plt.barh(top_5_features['Feature'], top_5_features['Importance'])
plt.xlabel('Importance')
plt.title('Top 5 Most Important Features')
plt.gca().invert_yaxis()
plt.show()
```

### Q6: Hyperparameter Tuning

```python
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(estimator=RandomForestClassifier(random_state=42),
                           param_grid=param_grid,
                           cv=5,
                           scoring='accuracy',
                           n_jobs=-1,
                           verbose=2)

# Fit the model
grid_search.fit(X_train, y_train)

# Get the best parameters
best_params = grid_search.best_params_
print(f'Best Parameters: {best_params}')

# Evaluate the best model
best_rf_clf = grid_search.best_estimator_
y_pred_best = best_rf_clf.predict(X_test)

# Performance metrics
accuracy_best = accuracy_score(y_test, y_pred_best)
precision_best = precision_score(y_test, y_pred_best)
recall_best = recall_score(y_test, y_pred_best)
f1_best = f1_score(y_test, y_pred_best)

print(f'Accuracy (Tuned): {accuracy_best:.2f}')
print(f'Precision (Tuned): {precision_best:.2f}')
print(f'Recall (Tuned): {recall_best:.2f}')
print(f'F1 Score (Tuned): {f1_best:.2f}')
```

### Q7: Report the Best Set of Hyperparameters

The output from `grid_search.best_params_` will provide the best set of hyperparameters found. Compare the performance metrics of the tuned model with the default model to assess improvements.

### Q8: Interpret the Model

To visualize decision boundaries, you need to reduce the feature dimensions. Here’s an example using two important features:

```python
import numpy as np
from matplotlib.colors import ListedColormap

# Select two important features
features_to_plot = ['age', 'chol']
X_plot = X[features_to_plot]
X_train_plot, X_test_plot, y_train_plot, y_test_plot = train_test_split(X_plot, y, test_size=0.3, random_state=42)

# Train a new model for visualization
rf_clf_plot = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf_clf_plot.fit(X_train_plot, y_train_plot)

# Define plot limits
x_min, x_max = X_plot[features_to_plot[0]].min() - 1, X_plot[features_to_plot[0]].max() + 1
y_min, y_max = X_plot[features_to_plot[1]].min() - 1, X_plot[features_to_plot[1]].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))
Z = rf_clf_plot.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot decision boundaries
plt.contourf(xx, yy, Z, alpha=0.3, cmap=ListedColormap(['#FF0000', '#00FF00']))
plt.scatter(X_test_plot[features_to_plot[0]], X_test_plot[features_to_plot[1]], c=y_test_plot, edgecolor='k', cmap=ListedColormap(['#FF0000', '#00FF00']))
plt.xlabel(features_to_plot[0])
plt.ylabel(features_to_plot[1])
plt.title('Decision Boundary of Random Forest Classifier')
plt.show()
```

**Insights and Limitations**:
- **Insights**: The decision boundary plot helps understand how the model distinguishes between different classes based on the selected features.
- **Limitations**: Random Forests, while powerful, can be less interpretable compared to simpler models, and visualizing decision boundaries for high-dimensional data can be challenging.

