# Ans:-1

  ## Ensemble Learning?
Ensemble Learning is a technique in machine learning where we combine multiple models to make better predictions than any single model alone.

 #### Key Idea: “Many weak models together can make a strong model.”

## Key Ideas Of Ensemble Learning?

   * Single models can make mistakes

   * Different models may perform well on different parts of the data

   * Combining them helps reduce errors, increase accuracy, and improve stability

## Ensemble Techniques
   * Bagging (Bootstrap Aggregating):-	Trains multiple models on different random subsets of the data and combines their predictions (e.g., Random Forest).
   * Boosting:- 	Trains models sequentially, each one correcting the errors of the previous (e.g., AdaBoost, XGBoost).
   * Stacking:- 	Combines predictions from different models using a meta-model that learns how to best combine them.

#Ans:-2

They combine multiple models to improve overall accuracy, but they do it in very different ways.

| Feature          | **Bagging**                         | **Boosting**                                          |
| ---------------- | ----------------------------------- | ----------------------------------------------------- |
| Training style   | **Parallel** (independent models)   | **Sequential** (one model at a time)                  |
| Focus            | Reduce **variance**                 | Reduce **bias**                                       |
| Data sampling    | Random subsets **with replacement** | Full data, but focus shifts to errors                 |
| Model dependency | Models are **independent**          | Models are **dependent** (each corrects the previous) |
| Example          | Random Forest                       | AdaBoost, XGBoost                                     |
| Risk             | Less overfitting                    | Can overfit if not tuned well                         |


#Ans:-3

##Bootstrap Sampling
Bootstrap Sampling is a technique where, You randomly pick samples from the original dataset With replacement means the same data point can be picked more than once.

### Example:
Original data = [A, B, C, D]

Bootstrap sample (random, with replacement) might be:
[B, C, C, D] or [A, A, B, D]


## Bagging (Bootstrap Aggregating) means:

  Bootstrap the dataset

  Train separate models (e.g., decision trees) on each sample

  Combine their predictions (vote or average)

## Role in Random Forest

  * Each tree is trained on a different bootstrap sample

  * This adds diversity among the trees

  * Trees make independent predictions

  * Final prediction = Majority vote (classification) or Average (regression)

## Why Bootstrap helps:
  * Reduces overfitting (by making trees slightly different)

  * Reduces variance

  * Improves model stability and accuracy.

#Ans:-4

## Out-of-Bag (OOB) Samples

When using bootstrap sampling (like in Bagging or Random Forest), Each tree is trained on a random sample with replacement. Because of replacement, some data points are left out of the sample. These left-out data points are called Out-of-Bag (OOB) samples

### Example:

Original dataset: 100 rows

Bootstrap sample for Tree 1: 70 rows (some repeated)

 The remaining 30 rows were not used to train Tree 1

 These are OOB samples for Tree 1

They act like a free test set! You can use OOB samples to evaluate how well each tree performs on data it has not seen.

## OOB Score?
The OOB score is an estimate of the model's accuracy using only the OOB samples.

 **How it works:**

  * Each tree makes predictions on its OOB samples

  * For each data point, combine predictions from all trees where it was OOB

  * Compare those predictions to the true labels

  * Calculate accuracy, This is the OOB score.


#Ans:-5

## 1. Feature Importance in a Single Decision Tree
 **How it works:**

Each feature's importance is based on how much it reduces impurity (like Gini or Entropy) at each split.

The more a feature is used near the root, and the more it improves splits, the higher its importance.

**Pros:**

Easy to compute and visualize

Good for quick interpretation

**Cons:**

Can be unstable (small changes in data may change the tree)

May be biased toward features with more levels (especially with categorical data)

## 2. Feature Importance in a Random Forest
**How it works:**

A Random Forest is an ensemble of many decision trees.

It calculates importance by averaging the feature importances from all the trees.

Each tree sees different data (because of bootstrap sampling), so the importances are more generalized.

**Pros:**

More reliable and stable than a single tree

Reduces overfitting

Works well on high-dimensional data

**Cons:**

Harder to interpret compared to a single tree

Slower to compute (since many trees are involved).


* Single Decision Tree: Tells you which features are important in one specific model

* Random Forest: Tells you which features are consistently important across many models, making it more trustworthy

#Ans:-6

In [60]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

In [61]:
from sklearn.datasets import load_breast_cancer
data= load_breast_cancer()
print(data)

{'data': array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
        1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
        8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
        8.758e-02],
       ...,
       [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
        7.820e-02],
       [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
        1.240e-01],
       [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
        7.039e-02]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
 

In [62]:
x= data.data
y= data.target

In [63]:
x

array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
        1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
        8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
        8.758e-02],
       ...,
       [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
        7.820e-02],
       [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
        1.240e-01],
       [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
        7.039e-02]])

In [64]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,

In [65]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.3, random_state= 1)

In [66]:
from sklearn.ensemble import RandomForestClassifier
model= RandomForestClassifier(n_estimators=100, random_state=2)
model.fit(x,y)

In [67]:
importances = model.feature_importances_
feature_names = data.feature_names

In [68]:
feature_importance_table = pd.DataFrame({
    'Feature': feature_names,
    'Importance': importances
})

top5_features = feature_importance_table.sort_values(by='Importance', ascending=False).head(5)
print(top5_features)

                 Feature  Importance
20          worst radius    0.154410
23            worst area    0.128149
27  worst concave points    0.118846
7    mean concave points    0.097132
22       worst perimeter    0.078753


#Ans:-7

In [69]:
from sklearn.datasets import load_iris
data1= load_iris()
print(data1)

{'data': array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
     

In [70]:
x1= data1.data
y1= data1.target

In [71]:
x1

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [72]:
y1

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [73]:
from sklearn.model_selection import train_test_split
x1_train, x1_test, y1_train, y1_test= train_test_split(x1, y1, test_size= 0.3, random_state= 1)
x1_train.shape, x1_test.shape, y1_train.shape, y1_test.shape

((105, 4), (45, 4), (105,), (45,))

In [74]:
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier(random_state=2)
dt_model.fit(x1_train, y1_train)

In [75]:
y1_prd= dt_model.predict(x1_test)
y1_prd

array([0, 1, 1, 0, 2, 1, 2, 0, 0, 2, 1, 0, 2, 1, 1, 0, 1, 1, 0, 0, 1, 1,
       2, 0, 2, 1, 0, 0, 1, 2, 1, 2, 1, 2, 2, 0, 1, 0, 1, 2, 2, 0, 1, 2,
       1])

In [76]:
from sklearn.metrics import accuracy_score
accuracy_score(y1_test, y1_prd)

0.9555555555555556

In [77]:
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=2)
rf_model.fit(x1_train, y1_train)

In [78]:
from sklearn.ensemble import BaggingClassifier
Model = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=50, random_state=2)
Model.fit(x1_train, y1_train)
bagging_preds = Model.predict(x1_test)
bagging_accuracy = accuracy_score(y1_test, bagging_preds)

In [79]:
print(bagging_accuracy)

0.9555555555555556


#Ans:-8

In [80]:
from sklearn.datasets import load_iris
data2= load_iris()
print(data2)

{'data': array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
     

In [81]:
x2= data2.data
y2= data2.target

In [82]:
from sklearn.model_selection import train_test_split
x2_train, x2_test, y2_train, y2_test= train_test_split(x2, y2, test_size= 0.3, random_state= 1)

In [83]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier


In [84]:
param_grid = {
    'n_estimators': [10, 50, 100],
    'max_depth': [None, 3, 5, 7]
}

In [None]:
rf = RandomForestClassifier(random_state=42)
grid_search = GridSearchCV(rf, param_grid, cv=5, verbose=0)
grid_search.fit(x2_train, y2_train)

In [None]:
best_model= grid_search.predict(x2_test)
best_model

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(y2_test, best_model)

In [None]:
grid_search.best_index_

In [None]:
grid_search.best_params_

In [None]:
grid_search.best_score_

In [None]:
grid_search.best_estimator_

#Ans:-9

In [None]:
from sklearn.datasets import fetch_california_housing
data3= fetch_california_housing()
print(data3)

In [None]:
x3= data3.data
y3= data3.target

In [None]:
from sklearn.model_selection import train_test_split
x3_train, x3_test, y3_train, y3_test= train_test_split(x3, y3, test_size= 0.3, random_state= 1)

In [None]:
from sklearn.ensemble import BaggingRegressor
bagging_model = BaggingRegressor(random_state=2)
bagging_model.fit(x3_train, y3_train)

In [None]:
y3_prd= bagging_model.predict(x3_test)
y3_prd

In [None]:
from sklearn.metrics import mean_squared_error
MSE= mean_squared_error(y3_test, y3_prd)
MSE

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf_model = RandomForestRegressor(n_estimators=100, random_state=2)
rf_model.fit(x3_train, y3_train)

In [None]:
rf_prd= rf_model.predict(x3_test)
rf_prd

In [None]:
from sklearn.metrics import mean_squared_error
rf_MSE= mean_squared_error(y3_test, rf_prd)
rf_MSE