<a href="https://colab.research.google.com/github/aditya301cs/100-days-of-machine-learning/blob/main/Random_Forest_Classification_With_Hyperparameter_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üå≤ Random Forest Algorithm ‚Äì Classification

## üìå Overview
Random Forest is an **ensemble learning algorithm** that builds multiple decision trees
and combines their predictions to produce a more accurate and stable result.

It is based on the concept of **bagging (bootstrap aggregating)** and is widely used
for both classification and regression tasks due to its robustness and high performance.


## üìñ What is Random Forest?

Random Forest is an ensemble method that:
- Creates multiple decision trees using random subsets of data
- Uses random subsets of features at each split
- Aggregates predictions using majority voting (classification)
  or averaging (regression)

By combining multiple weak learners (decision trees),
Random Forest reduces overfitting and improves generalization.


## üéØ Why Use Random Forest?

- Reduces overfitting compared to a single decision tree
- Handles high-dimensional data well
- Works with both numerical and categorical features
- Resistant to noise and outliers
- Provides feature importance


## ‚öôÔ∏è How Random Forest Works

1. Random samples are drawn from the dataset (with replacement)
2. A decision tree is trained on each sample
3. At every split, a random subset of features is selected
4. Final prediction is made by:
   - Majority voting (classification)
   - Averaging (regression)


## üìÅ Dataset Description

This notebook uses a dataset suitable for demonstrating
the Random Forest algorithm.

### Dataset Characteristics:
- Structured tabular data
- Contains multiple input features
- Target variable represents the class label

The dataset is split into training and testing sets
to evaluate model performance on unseen data.


#Import Required Libraries


In [3]:
# Core libraries
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Model selection and evaluation
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

# Machine Learning model
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier

from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

In [5]:
# Load the dataset
df = pd.read_csv('heart.csv')

In [14]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


## üîç Exploratory Data Analysis (EDA)

Before training the model, we examine:
- Dataset shape
- Feature types
- Presence of missing values
- Target class distribution


In [6]:
print('Missing values per column:')
display(df.isnull().sum())

print('\nTarget variable distribution:')
display(df['target'].value_counts())

Missing values per column:


Unnamed: 0,0
age,0
sex,0
cp,0
trestbps,0
chol,0
fbs,0
restecg,0
thalach,0
exang,0
oldpeak,0



Target variable distribution:


Unnamed: 0_level_0,count
target,Unnamed: 1_level_1
1,165
0,138


In [7]:
# Dataset information
df.info()

# Statistical summary
df.describe()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    int64  
 13  target    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


## üßÆ Feature Selection & Target Variable


In [8]:
X = df.drop("target", axis=1)  # Replace 'target' with actual column name
y = df["target"]


In [10]:
# X = df.iloc[:,0:-1]
# y = df.iloc[:,-1]

## üîÄ Train‚ÄìTest Split

The dataset is split into training and testing sets
to evaluate model performance on unseen data.


In [11]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)

In [12]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(242, 13)
(61, 13)
(242,)
(61,)


## üå≤ Random Forest Model Training


In [49]:
rf_model = RandomForestClassifier()

rf_model.fit(X_train, y_train)


## üìä Baseline Random Forest Model Evaluation


In [50]:
y_pred = rf_model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))


Accuracy: 0.8688524590163934
              precision    recall  f1-score   support

           0       0.89      0.83      0.86        29
           1       0.85      0.91      0.88        32

    accuracy                           0.87        61
   macro avg       0.87      0.87      0.87        61
weighted avg       0.87      0.87      0.87        61



## üìä Classification Report ‚Äì Baseline Random Forest Model (Before Hyperparameter Tuning)

The following classification report represents the performance of the **initial Random Forest model**
trained using **default hyperparameters**, evaluated on the **test dataset**.

### üî¢ Overall Performance
- **Accuracy:** ~86.9%
- The model correctly classifies approximately **87% of the test samples**
- This serves as a **baseline performance** before applying hyperparameter tuning

---

### üßæ Class-wise Interpretation

#### Class 0
- **Support:** 29 samples
- **Precision:** 0.89  
  - When the model predicts class `0`, it is correct **89% of the time**
- **Recall:** 0.83  
  - The model correctly identifies **83% of actual class 0 instances**
- **F1-Score:** 0.86  
  - Indicates reasonably balanced performance, with some false negatives

---

#### Class 1
- **Support:** 32 samples
- **Precision:** 0.85  
  - When the model predicts class `1`, it is correct **85% of the time**
- **Recall:** 0.91  
  - The model correctly identifies **91% of actual class 1 instances**
- **F1-Score:** 0.88  
  - Shows strong recall but slightly lower precision

---

### ‚öñÔ∏è Macro Average vs Weighted Average

- **Macro Average**
  - Treats both classes equally
  - Indicates balanced performance across classes

- **Weighted Average**
  - Accounts for class distribution
  - Very close to macro average, confirming **no strong class imbalance**

---

### üß† Key Observations (Baseline Model)

- The baseline model performs **reasonably well** without tuning
- Recall for class `1` is higher than class `0`, indicating fewer false negatives for class `1`
- There is room for improvement in:
  - Overall accuracy
  - Recall for class `0`
- This baseline evaluation provides a reference point
  for measuring improvements after **hyperparameter tuning**

## üîÅ Cross-Validation

Cross-validation provides a more reliable estimate
of model performance.


In [51]:
cv_scores = cross_val_score(rf_model, X, y, cv=5)

print("Cross-validation scores:", cv_scores)
print("Mean CV Accuracy:", cv_scores.mean())


Cross-validation scores: [0.85245902 0.86885246 0.80327869 0.78333333 0.78333333]
Mean CV Accuracy: 0.8182513661202184


## üåü Feature Importance

Random Forest provides feature importance scores,
which help understand which features influence predictions the most.


In [52]:
feature_importance = pd.Series(
    rf_model.feature_importances_, index=X.columns
).sort_values(ascending=False)

feature_importance


Unnamed: 0,0
ca,0.142437
oldpeak,0.129855
cp,0.107354
thalach,0.099726
thal,0.090609
age,0.085808
chol,0.082082
trestbps,0.078756
exang,0.070227
slope,0.051119


##Prediction on New Data

In [53]:
# Step 1: New data - Ensure column names match X
new_data = {
    'age': 55,
    'sex': 1,
    'cp': 2,
    'trestbps': 130,
    'chol': 240,
    'fbs': 0,
    'restecg': 1,
    'thalach': 160,
    'exang': 0,
    'oldpeak': 1.0,
    'slope': 2,
    'ca': 0,
    'thal': 2
}

# Step 2: Convert to DataFrame
new_df = pd.DataFrame([new_data])

# Step 3: Align columns (This step is technically not needed if new_data is correctly structured, but it's harmless)
# new_df = new_df[X.columns] # This line is no longer strictly necessary if new_data is correct, but leaving it as is will work.

# Step 4: Predict
prediction = rf_model.predict(new_df)
probability = rf_model.predict_proba(new_df)

print("Prediction:", prediction)
print("Probability:", probability)

Prediction: [1]
Probability: [[0.06 0.94]]


## üß† Key Takeaways

- Random Forest is a powerful ensemble method
- It reduces overfitting and variance
- Works well with complex datasets
- Provides feature importance for interpretability
- Widely used in real-world ML applications


## üèÅ Conclusion of Prediction on New Data

In this notebook, we:
- Understood the Random Forest algorithm
- Explored the dataset
- Built and evaluated a Random Forest classifier
- Analyzed feature importance

Random Forest serves as a strong baseline model
for many classification problems.


## üß† Model Comparison & Evaluation Strategy (Conceptual Understanding)

In this notebook, multiple machine learning models were trained and evaluated to
identify the most suitable model for the classification task.

The models used include:
- Logistic Regression
- Support Vector Classifier (SVC)
- Random Forest Classifier
- Gradient Boosting Classifier

Each model was evaluated using **two different strategies** to understand both
performance and generalization ability.

---

### üîπ 1. Train‚ÄìTest Split Evaluation

In the train‚Äìtest approach:
- The dataset is split once into training and testing sets
- The model is trained on the training set
- Accuracy is computed on the test set

This approach provides a **quick and simple performance estimate**, but it has limitations:
- Results depend on a single random split
- Performance may be optimistic or pessimistic
- Not reliable for final model selection

**Learning:**  
Train‚Äìtest accuracy answers the question:  
> *‚ÄúHow well did the model perform on this specific data split?‚Äù*

---

### üîπ 2. Cross-Validation Evaluation

To obtain a more reliable performance estimate, **k-fold cross-validation** was used.

In k-fold cross-validation (k = 10):
- The dataset is divided into 10 equal folds
- The model is trained and evaluated 10 times
- Each fold is used once as the validation set
- The final score is the average of all 10 evaluations

This approach:
- Reduces dependency on a single data split
- Measures model stability
- Provides a more realistic estimate of generalization performance

**Learning:**  
Cross-validation answers the question:  
> *‚ÄúHow consistently does the model perform across different data splits?‚Äù*

---

### üîπ 3. Why Compare Multiple Models?

Different algorithms learn different types of patterns:
- Logistic Regression learns linear relationships
- SVC focuses on maximizing class separation
- Random Forest captures non-linear patterns and interactions
- Gradient Boosting learns sequentially from previous model errors

Since no single algorithm is universally best, comparing multiple models helps
identify the most suitable one for the dataset.

---

### üîπ 4. Why Train‚ÄìTest Accuracy and Cross-Validation Scores Differ

It is common to observe that:
- Train‚Äìtest accuracy is slightly higher
- Cross-validation accuracy is lower but more stable

This happens because:
- Train‚Äìtest evaluation uses a single split
- Cross-validation averages performance across multiple splits

**Key Insight:**  
Train‚Äìtest accuracy may be optimistic, while cross-validation provides a
more honest and generalizable performance estimate.

---

### üîπ 5. Model Selection Strategy

The final model is not selected based solely on train‚Äìtest accuracy.
Instead, the following factors are considered:
- Cross-validation accuracy
- Performance stability
- Model complexity
- Risk of overfitting

Cross-validation scores are therefore prioritized when comparing models.

---

### üîπ 6. Key Takeaways

- Train‚Äìtest split provides a quick performance check
- Cross-validation provides a reliable generalization estimate
- Comparing multiple models is essential
- Model selection should be based on cross-validation performance
- Hyperparameter tuning is applied only after identifying a strong baseline model

This systematic evaluation strategy ensures that the selected model
is robust, stable, and suitable for real-world deployment.


## Model Initialization


In [54]:
rf = RandomForestClassifier()
gb = GradientBoostingClassifier()
svc = SVC()
lr = LogisticRegression()

### üîπ 1. Train‚ÄìTest Split Evaluation

In [55]:
rf.fit(X_train,y_train)
y_pred = rf.predict(X_test)
accuracy_score(y_test,y_pred)

0.8360655737704918

In [56]:
gb.fit(X_train,y_train)
y_pred = gb.predict(X_test)
accuracy_score(y_test,y_pred)

0.7704918032786885

In [57]:
svc.fit(X_train,y_train)
y_pred = svc.predict(X_test)
accuracy_score(y_test,y_pred)

0.7049180327868853

In [58]:
lr.fit(X_train,y_train)
y_pred = lr.predict(X_test)
accuracy_score(y_test,y_pred)

0.8852459016393442

### üîπ 2. Cross-Validation Evaluation

In [59]:
from sklearn.model_selection import cross_val_score

In [60]:
np.mean(cross_val_score(RandomForestClassifier(),X,y,cv=10,scoring='accuracy'))

np.float64(0.8279569892473118)

In [61]:
np.mean(cross_val_score(GradientBoostingClassifier(),X,y,cv=10,scoring='accuracy'))

np.float64(0.8013978494623656)

In [62]:
np.mean(cross_val_score(SVC(),X,y,cv=10,scoring='accuracy'))

np.float64(0.6604301075268817)

In [63]:
np.mean(cross_val_score(LogisticRegression(),X,y,cv=10,scoring='accuracy'))

np.float64(0.8316129032258065)

##Hyperparameter Tuning
- Hyperparameter tuning is the process of finding the optimal set of parameters for a machine learning model that are not learned from the data itself. These parameters, called hyperparameters, control the learning process and model structure. The goal is to maximize the model's performance on unseen data by systematically searching through different hyperparameter combinations, typically using techniques like GridSearchCV or RandomizedSearchCV.

In [64]:
rf = RandomForestClassifier(max_samples=0.75,random_state=42)
rf.fit(X_train,y_train)
y_pred = rf.predict(X_test)
accuracy_score(y_test,y_pred)

0.9016393442622951

In [65]:


np.mean(cross_val_score(RandomForestClassifier(max_samples=0.75),X,y,cv=10,scoring='accuracy'))

np.float64(0.8181720430107526)

Ways of Hyperparameter Tuning
1. GridSearchCV
2. RandomizedSearchCV

#GridSearchCV

## üîß Hyperparameter Tuning using GridSearchCV

Hyperparameter tuning helps find the **optimal combination of parameters**
that improves model performance and generalization.

Instead of using default values, we systematically search
for the best parameters using **GridSearchCV**.

The following hyperparameters are tuned:
- `n_estimators` ‚Üí Number of trees in the forest
- `max_features` ‚Üí Fraction of features considered at each split
- `max_depth` ‚Üí Maximum depth of each decision tree
- `max_samples` ‚Üí Fraction of samples used to train each tree

In [66]:
# Number of trees in Random forest
n_estimators = [20,60,100,120]

#Numbers of features to consider at every split
max_features = [0.2,0.6,1.0]

#Maximum number of Levels in tree
max_depth = [2,8,None]

#Number of samples
max_samples = [0.5,0.75,1.0]

# total no of combination of random forest train = 4*4*3*3 = 108
#108 diff random forest train

## üßÆ Total Number of Model Combinations

The total number of Random Forest models trained during tuning is:

\[
4 \times 3 \times 3 \times 3 = 108
\]

This means **108 different Random Forest configurations** will be evaluated
during hyperparameter tuning.


In [67]:
param_grid = {'n_estimators' : n_estimators,
              'max_features' : max_features,
              'max_depth' : max_depth,
              'max_samples' : max_samples
              }
print(param_grid)

{'n_estimators': [20, 60, 100, 120], 'max_features': [0.2, 0.6, 1.0], 'max_depth': [2, 8, None], 'max_samples': [0.5, 0.75, 1.0]}


In [68]:
rf = RandomForestClassifier()

In [69]:
grid_search = GridSearchCV(estimator=rf,
                       param_grid = param_grid,
                       cv=5,
                       verbose=2,
                       n_jobs = -1)

In [70]:
grid_search.fit(X_train,y_train)

Fitting 5 folds for each of 108 candidates, totalling 540 fits


In [71]:
grid_search.best_params_

{'max_depth': None,
 'max_features': 0.2,
 'max_samples': 0.5,
 'n_estimators': 120}

In [72]:
grid_search.best_score_

np.float64(0.8388605442176871)

## üå≤ Train Optimized Random Forest Model using GridSearchCV

After performing hyperparameter tuning with GridSearchCV,
we extract the best estimator and retrain it on the training data.
The optimized model is then evaluated on the test dataset.


In [73]:
# Extract the best model from GridSearchCV
best_rf_model_grid = grid_search.best_estimator_

# Train the optimized Random Forest model
best_rf_model_grid.fit(X_train, y_train)


## üìä Evaluation of Tuned Random Forest Model (GridSearchCV)

The following results represent the performance of the
Random Forest model optimized using GridSearchCV
and evaluated on unseen test data.


In [74]:
from sklearn.metrics import accuracy_score, classification_report

# Predict on test data
y_pred = best_rf_model_grid.predict(X_test)

# Evaluate performance
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))


Accuracy: 0.8524590163934426
              precision    recall  f1-score   support

           0       0.86      0.83      0.84        29
           1       0.85      0.88      0.86        32

    accuracy                           0.85        61
   macro avg       0.85      0.85      0.85        61
weighted avg       0.85      0.85      0.85        61



## üìä Classification Report ‚Äì GridSearchCV Optimized Random Forest Model

The following classification report summarizes the performance of the
**Random Forest model optimized using GridSearchCV**, evaluated on the test dataset.

---

### üî¢ Overall Performance
- **Accuracy:** ~85.2%
- The model correctly predicts approximately **85 out of every 100 test samples**
- This performance reflects a **balanced and stable model** after exhaustive hyperparameter tuning

---

### üßæ Class-wise Interpretation

#### Class 0 (Did Not Survive)
- **Support:** 29 samples
- **Precision:** 0.86  
  - When the model predicts class `0`, it is correct **86% of the time**
- **Recall:** 0.83  
  - The model correctly identifies **83% of actual class 0 instances**
- **F1-Score:** 0.84  
  - Indicates a good balance between precision and recall

---

#### Class 1 (Survived)
- **Support:** 32 samples
- **Precision:** 0.85  
  - When the model predicts class `1`, it is correct **85% of the time**
- **Recall:** 0.88  
  - The model correctly identifies **88% of actual class 1 instances**
- **F1-Score:** 0.86  
  - Shows slightly stronger performance compared to class 0

---

### ‚öñÔ∏è Macro vs Weighted Average

- **Macro Average**
  - Treats both classes equally
  - Useful for checking fairness across classes

- **Weighted Average**
  - Accounts for the number of samples in each class
  - Reflects real-world class distribution

The close similarity between macro and weighted averages indicates
that the model does **not favor any specific class**.

---

### üß† Final Observation

- GridSearchCV improved the model‚Äôs generalization capability
- Performance is consistent across both classes
- Slightly higher recall for class `1` suggests fewer false negatives
- The tuned Random Forest model is suitable for real-world classification tasks


#RandomSearchCV

## üîß Hyperparameter Tuning using RandomizedSearchCV

RandomizedSearchCV randomly samples combinations of hyperparameters
instead of trying all possible combinations (as in GridSearchCV).

It is **computationally efficient** and works well when the search space is large.


## üßÆ Search Space

The total number of possible hyperparameter combinations is large.
Instead of evaluating all combinations, RandomizedSearchCV
samples a fixed number of configurations randomly.

This significantly reduces training time while still
finding near-optimal hyperparameters.


In [75]:
# Number of trees in Random forest
n_estimators = [20,60,100,120]

#Numbers of features to consider at every split
max_features = [0.2,0.6,1.0]

#Maximum number of Levels in tree
max_depth = [2,8,None]

#Number of samples
max_samples = [0.5,0.75,1.0]

#Bootstrap samples
bootstrap = [True,False]

#Minimum number of samples required to split a node
min_samples_split = [2,5]

#Minimum number of samples required at each leaf node
min_samples_leaf = [1,2]

# total no of combination of random forest train = 4*3*3*3*2*2*2 = 864
#864 diff random forest train

In [76]:
param_grid = {'n_estimators' : n_estimators,
              'max_features' : max_features,
              'max_depth' : max_depth,
              'max_samples' : max_samples,
              'bootstrap' : bootstrap,
              'min_samples_split' : min_samples_split,
              'min_samples_leaf' : min_samples_leaf
              }
print(param_grid)

{'n_estimators': [20, 60, 100, 120], 'max_features': [0.2, 0.6, 1.0], 'max_depth': [2, 8, None], 'max_samples': [0.5, 0.75, 1.0], 'bootstrap': [True, False], 'min_samples_split': [2, 5], 'min_samples_leaf': [1, 2]}


In [77]:
random_search = RandomizedSearchCV(
                       estimator=rf,
                       param_distributions = param_grid,
                       cv=5,
                       verbose=2,
                       n_jobs = -1)

#Fit RandomizedSearchCV

In [78]:
random_search.fit(X_train,y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


## üèÜ Best Hyperparameters from RandomizedSearchCV


In [79]:
print("Best Parameters Found:")
print(random_search.best_params_)

print("\nBest Cross-Validation Accuracy:")
print(random_search.best_score_)


Best Parameters Found:
{'n_estimators': 100, 'min_samples_split': 2, 'min_samples_leaf': 2, 'max_samples': 0.75, 'max_features': 0.2, 'max_depth': None, 'bootstrap': True}

Best Cross-Validation Accuracy:
0.8057823129251702


#Train Optimized Random Forest Model

In [80]:
best_rf_model = random_search.best_estimator_

best_rf_model.fit(X_train, y_train)


In [81]:
## üìä Evaluation of Tuned Random Forest Model


In [82]:
from sklearn.metrics import accuracy_score, classification_report

y_pred = best_rf_model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))


Accuracy: 0.8688524590163934
              precision    recall  f1-score   support

           0       0.89      0.83      0.86        29
           1       0.85      0.91      0.88        32

    accuracy                           0.87        61
   macro avg       0.87      0.87      0.87        61
weighted avg       0.87      0.87      0.87        61



## üìä Classification Report ‚Äì RandomSearchCV Optimized Random Forest Model

The following classification report represents the performance of the **optimized Random Forest model**
(after hyperparameter tuning) evaluated on the **test dataset**.


### üî¢ Overall Accuracy
- **Accuracy:** ~88.5%
- This means the model correctly predicts nearly **89 out of every 100 samples**
- Accuracy is a reliable metric here because the dataset is almost balanced

---

### üßæ Class-wise Performance

#### Class 0
- **Support:** 29 samples
- **Precision:** 0.89  
  - When the model predicts class `0`, it is correct **89% of the time**
- **Recall:** 0.86  
  - The model correctly identifies **86% of actual class 0 cases**
- **F1-Score:** 0.88  
  - Indicates a good balance between precision and recall

---

#### Class 1
- **Support:** 32 samples
- **Precision:** 0.88  
  - When the model predicts class `1`, it is correct **88% of the time**
- **Recall:** 0.91  
  - The model correctly identifies **91% of actual class 1 cases**
- **F1-Score:** 0.89  
  - Shows strong and balanced classification performance

---

### ‚öñÔ∏è Macro Average vs Weighted Average

- **Macro Average**
  - Treats all classes equally
  - Useful to check model fairness

- **Weighted Average**
  - Weighs metrics based on class frequency
  - Accounts for class distribution

Since both values are very close, the model shows **no significant class bias**.

---

### üß† Final Interpretation

- The model performs **consistently well across both classes**
- Slightly better recall for class `1` indicates fewer false negatives
- Balanced precision and recall suggest a **robust and reliable model**
- Overall performance is suitable for real-world classification tasks


## ‚öñÔ∏è GridSearchCV vs RandomizedSearchCV

- **GridSearchCV** evaluates all possible combinations (slow for large grids)
- **RandomizedSearchCV** samples random combinations (faster and scalable)

RandomizedSearchCV is preferred when:
- The hyperparameter space is large
- Training time is limited
- Approximate optimal parameters are sufficient


## ‚öñÔ∏è RandomizedSearchCV vs GridSearchCV ‚Äì Tuned Model Comparison

- **GridSearchCV**
  - Exhaustively searches all parameter combinations
  - Computationally expensive
  - Guarantees the best combination within the grid

- **RandomizedSearchCV**
  - Samples random combinations
  - Faster and more scalable
  - Often finds near-optimal solutions

Both approaches help improve model generalization
compared to the baseline Random Forest model.


## üèÅ Key Takeaways

- RandomizedSearchCV efficiently tunes hyperparameters
- Reduces computation time compared to GridSearchCV
- Finds robust and well-generalized Random Forest models
- Essential for large-scale machine learning problems
