Steps
Import libraries

1.Load / create dataset (X features, y labels)

2.Split or make evaluation plan
Bootstrap → resample many times
Stratified K-Fold → cross-validation
Train/Test split → normal evaluation

3.Create model → RandomForestClassifier(...)

4.Train → model.fit(X_train, y_train)

5.Predict → y_pred = model.predict(X_test)

6.Evaluate
accuracy_score
classification_report (precision, recall, f1)

7.Print results / average score

In [33]:
#import library
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score,classification_report
from sklearn.datasets import make_classification,load_breast_cancer
from sklearn.model_selection import StratifiedKFold,train_test_split

In [34]:
# Q-22 dataset
X, y = make_classification(
    n_samples=150,
    n_features=10,
    n_informative=5,
    n_classes=3,
    random_state=42
)

# Q-25 dataset (imbalanced)
X1, y1 = make_classification(
    n_samples=200,
    n_features=15,
    n_informative=5,
    n_classes=4,
    weights=[0.60, 0.20, 0.15, 0.05],
    random_state=42
)

# Q-31 dataset (Breast Cancer)
data = load_breast_cancer()
X2 = data.data
y2 = data.target

Q-22 Step-wise: Bootstrap Sampling + Random Forest
Step 3) Decide number of bootstrap samples

In [35]:
n_bootstrap = 10
rng = np.random.RandomState(42)
accuracies = []


In [36]:
from sklearn.utils import resample

# 2. Bootstrap Sampling
n_iterations = 10
stats = []
for i in range(n_iterations):
 # Create bootstrap sample
 X_sample, y_sample = resample(X, y, n_samples=len(X), random_state=i)

 # Train
 model = RandomForestClassifier()
 model.fit(X_sample, y_sample)

 # Evaluate on ORIGINAL full dataset (OOB validation concept)
 # Or typically evaluated on out-of-bag samples, here testing on X for simplicity
 predictions = model.predict(X)
 score = accuracy_score(y, predictions)
 stats.append(score)
 print(f"Bootstrap {i+1} Accuracy: {score:.4f}")
print(f"Average Accuracy: {np.mean(stats):.4f}")

Bootstrap 1 Accuracy: 0.9133
Bootstrap 2 Accuracy: 0.9000
Bootstrap 3 Accuracy: 0.8733
Bootstrap 4 Accuracy: 0.8133
Bootstrap 5 Accuracy: 0.8800
Bootstrap 6 Accuracy: 0.8600
Bootstrap 7 Accuracy: 0.8800
Bootstrap 8 Accuracy: 0.8333
Bootstrap 9 Accuracy: 0.8600
Bootstrap 10 Accuracy: 0.8600
Average Accuracy: 0.8673


Q-25 Step-wise: Stratified K-Fold (5 folds) + Random Forest + Report

In [46]:
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
fold_acc = []


In [47]:
fold_acc = []

for fold, (train_index, test_index) in enumerate(skf.split(X,y), 1):
    
    # Split data
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Train model
    model = RandomForestClassifier(n_estimators=300, random_state=42)
    model.fit(X_train, y_train)

    # Predict on validation/test fold
    y_pred = model.predict(X_test)

    # Accuracy
    acc = accuracy_score(y_test, y_pred)
    fold_acc.append(acc)

    print(f"Fold {fold} Accuracy: {acc:.4f}")

    # Save last fold model for report
    final_model = model
    final_X_test = X_test
    final_y_test = y_test


Fold 1 Accuracy: 0.8000
Fold 2 Accuracy: 0.6667
Fold 3 Accuracy: 0.7667
Fold 4 Accuracy: 0.5667
Fold 5 Accuracy: 0.6000


In [49]:
np.mean(fold_acc)

0.6799999999999999

In [22]:
print("\nClassification Report (Final fold):")
print(classification_report(final_y_test, final_model.predict(final_X_test)))



Classification Report (Final fold):
              precision    recall  f1-score   support

           0       0.78      0.78      0.78         9
           1       0.92      0.79      0.85        14
           2       0.44      0.57      0.50         7

    accuracy                           0.73        30
   macro avg       0.71      0.71      0.71        30
weighted avg       0.76      0.73      0.74        30



Q-31 Step-wise: Breast Cancer Random Forest + Accuracy, Precision, Recall, F1

In [23]:
X_train, X_test, y_train, y_test = train_test_split(
    X2, y2,
    test_size=0.20,
    random_state=42,
    stratify=y2
)


In [24]:
rf = RandomForestClassifier(n_estimators=300, random_state=42, class_weight="balanced")
rf.fit(X_train, y_train)


In [25]:
y_pred = rf.predict(X_test)


In [26]:
acc  = accuracy_score(y_test, y_pred)
prec = precision_score(y_test, y_pred)   # positive label = 1
rec  = recall_score(y_test, y_pred)
f1   = f1_score(y_test, y_pred)

print("Accuracy :", round(acc, 4))
print("Precision:", round(prec, 4))
print("Recall   :", round(rec, 4))
print("F1-score :", round(f1, 4))


Accuracy : 0.9474
Precision: 0.9583
Recall   : 0.9583
F1-score : 0.9583


In [27]:
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))



Classification Report:
              precision    recall  f1-score   support

   malignant       0.93      0.93      0.93        42
      benign       0.96      0.96      0.96        72

    accuracy                           0.95       114
   macro avg       0.94      0.94      0.94       114
weighted avg       0.95      0.95      0.95       114

