##Theoretical Questions

1. Can we use Bagging for regression problems?

=> Bagging Regressors work similarly to Bagging Classifiers, but instead of taking a majority vote, they average the predictions of the individual base regressors.

2. What is the difference between multiple model training and single model training.

=> Multiple model training, often referred to as ensemble learning, involves training several individual models and combining their predictions to make a final prediction. This can help reduce variance and improve robustness compared to single model training

3. Explain the concept of feature randomness in Random Forest.

=> In Random Forest, feature randomness (also called feature bagging or subspace sampling) means that at each split in a decision tree, only a random subset of the features is considered for finding the best split. This adds another layer of randomness on top of the sample bagging.

4. What is OOB (Out-of-Bag) Score

=> The Out-of-Bag (OOB) score is a way to estimate the performance of a Bagging model (like Random Forest) without using a separate validation set. Since each base estimator in Bagging is trained on a bootstrap sample of the data, there will be some data points that are not included in that sample. These "out-of-bag" samples can be used to evaluate the performance of the base estimator.

5. How can you measure the importance of features in a Random Forest model

=> This is the default method in scikit-learn. It measures how much each feature decreases the weighted impurity (like Gini impurity for classification or MSE for regression) across all trees in the forest. Features that contribute to larger decreases in impurity are considered more important.

6. Explain the working principle of a Bagging Classifier

=>  This method, also known as permutation importance, is generally considered more reliable. It involves shuffling the values of a feature for the out-of-bag samples and measuring the decrease in the model's accuracy. A large decrease in accuracy indicates that the feature is important for the model's predictions.

7. How do you evaluate a Bagging Classifier’s performance

=> Accuracy: The proportion of correctly classified instances.

Precision: The ability of the classifier not to label as positive a sample that is negative.

Recall (Sensitivity): The ability of the classifier to find all the positive samples.

F1-Score: The harmonic mean of Precision and Recall, providing a balanced measure.

ROC-AUC Score: The Area Under the Receiver Operating Characteristic Curve, which measures the classifier's ability to distinguish between classes.

Confusion Matrix: A table that summarizes the number of true positives, true negatives, false positives, and false negatives.

Out-of-Bag (OOB) Score: As mentioned earlier, this provides an internal estimate of the model's performance without needing a separate validation set.

8. How does a Bagging Regressor work

=> Bagging Regressors work similarly to Bagging Classifiers, but instead of taking a majority vote, they average the predictions of the individual base regressors.

9. What is the main advantage of ensemble techniques

=> The main advantage of ensemble techniques is that they can significantly improve the overall performance and robustness of a model compared to using a single model. By combining the predictions of multiple models, ensembles can reduce variance, bias, or both, leading to better generalization to unseen data and increased accuracy. They are particularly effective at reducing overfitting and are less sensitive to noisy data.

10. What is the main challenge of ensemble methods

=> The main challenge of ensemble methods is often their interpretability and computational cost.

Interpretability: Understanding why an ensemble model makes a particular prediction can be difficult because it's a combination of multiple individual models. This is especially true for complex ensembles like Random Forests or Gradient Boosting, where the decision-making process is not as transparent as a single decision tree.

Computational Cost: Training and storing multiple models, as well as making predictions with them, can be computationally more expensive and require more memory compared to training and using a single model. This can be a limitation for very large datasets or real-time applications where prediction speed is critical.

11. Explain the key idea behind ensemble techniques

=>The key idea behind ensemble techniques is to combine the predictions of multiple individual models to produce a more accurate and robust prediction than any single model could achieve on its own.

12. What is a Random Forest Classifier?

=> Random Forest, feature randomness (also called feature bagging or subspace sampling) means that at each split in a decision tree, only a random subset of the features is considered for finding the best split. This adds another layer of randomness on top of the sample bagging, further decorrelating the trees and making the ensemble more robust to noisy features and preventing overfitting.

13. What are the main types of ensemble techniques

=> This method involves training multiple models of the same type on different bootstrap samples (randomly sampled subsets with replacement) of the training data. The final prediction is typically the average of the individual model predictions (for regression) or the majority vote (for classification). Random Forest is a popular example of a bagging technique that uses decision trees as base estimators and also incorporates feature randomness.

14. What is ensemble learning in machine learning

=> This method trains models sequentially, where each subsequent model is trained to correct the errors made by the previous models. It focuses on the data points that were misclassified or poorly predicted by the earlier models. Examples include AdaBoost, Gradient Boosting (like Gradient Boosting Machines - GBM), and XGBoost.

15. When should we avoid using ensemble methods

=> This method involves training a meta-model (or blender) that learns to combine the predictions of several base models. The base models are trained on the original training data, and their predictions are then used as input features for the meta-model. The meta-model is trained to make the final prediction based on the predictions of the base models.

16. How does Bagging help in reducing overfitting.

=> This method involves training multiple models of the same type on different bootstrap samples (randomly sampled subsets with replacement) of the training data. The final prediction is typically the average of the individual model predictions (for regression) or the majority vote (for classification). Random Forest is a popular example of a bagging technique that uses decision trees as base estimators and also incorporates feature randomness.

17. Why is Random Forest better than a single Decision Tree

=> Random Forest is generally better than a single Decision Tree for several key reasons, primarily related to reducing overfitting and improving robustness:

1. Reduced Variance: Single decision trees can be prone to overfitting, especially when they are deep. They can learn the training data too well, including the noise, and perform poorly on unseen data. Random Forest mitigates this by training multiple trees on different bootstrap samples of the data and averaging or taking a majority vote of their predictions. This averaging process reduces the variance of the model.
2. Reduced Overfitting: The combination of bootstrap sampling (sampling data with replacement) and feature randomness (considering only a random subset of features at each split) in Random Forest helps to decorrelate the individual trees. This prevents the trees from all making the same errors and reduces their tendency to overfit the training data.

18. What is the role of bootstrap sampling in Bagging

=> Bagging, short for Bootstrap Aggregating, is an ensemble learning technique that improves the accuracy and stability of machine learning models. It works by training multiple models on different subsets of the training data and then aggregating their predictions to produce a final output. This method is particularly effective for high-variance models, such as decision trees, which are prone to overfitting.

19. What are some real-world applications of ensemble techniques

=> Ensemble techniques are widely used in various real-world applications due to their ability to improve model performance and robustness. Here are some examples:

* Healthcare: Ensemble methods are used for disease diagnosis, medical image analysis, and predicting patient outcomes. For instance, they can combine predictions from different models to improve the accuracy of identifying cancerous tumors or predicting the risk of heart disease.
* Finance: Ensemble techniques are employed in credit scoring, fraud detection, stock market prediction, and algorithmic trading. By combining multiple models, financial institutions can make more accurate predictions and better manage risk.

20. What is the difference between Bagging and Boosting?

=> Bagging (Bootstrap Aggregating):

* Parallel Training: Models are trained independently of each other on different bootstrap samples of the training data.
* Reduces Variance: Bagging primarily aims to reduce the variance of the model by averaging or voting the predictions of multiple models.
* Diverse Models: Since each model is trained on a different subset of the data, the individual models tend to be diverse.
* Simple Aggregation: Predictions are combined through simple averaging (regression) or majority voting (classification).

Boosting:

* Sequential Training: Models are trained sequentially, with each new model focusing on correcting the errors made by the previous models.
* Reduces Bias: Boosting primarily aims to reduce the bias of the model by giving more weight to data points that were misclassified or poorly predicted by earlier models.
* Dependent Models: Each subsequent model is dependent on the previous ones, as it tries to improve upon their performance.
* Weighted Aggregation: Predictions are combined with weights, where models that perform better are given more weight.

##Practical Questions

21. Train a Bagging Classifier using Decision Trees on a sample dataset and print model accuracy

In [None]:
#code for the above Ques.
!pip install -U scikit-learn==1.2.2
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

decision_tree = DecisionTreeClassifier(random_state=42)

bagging_classifier = BaggingClassifier(base_estimator=decision_tree, n_estimators=10, random_state=42)

bagging_classifier.fit(X_train, y_train)

y_pred = bagging_classifier.predict(X_test)
y
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

22. Train a Bagging Regressor using Decision Trees and evaluate using Mean Squared Error (MSE)

In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, y = make_regression(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

decision_tree = DecisionTreeRegressor(random_state=42)

bagging_regressor = BaggingRegressor(base_estimator=decision_tree, n_estimators=10, random_state=42)

bagging_regressor.fit(X_train, y_train)

y_pred = bagging_regressor.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

23. Train a Random Forest Classifier on the Breast Cancer dataset and print feature importance scores

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()

X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

random_forest_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

random_forest_classifier.fit(X_train, y_train)

feature_importances = random_forest_classifier.feature_importances_

for feature_name, importance in zip(data.feature_names, feature_importances):
    print(f"{feature_name}: {importance:.4f}")


24. Train a Random Forest Regressor and compare its performance with a single Decision Tree

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, y = make_regression(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

random_forest_regressor = RandomForestRegressor(n_estimators=100, random_state=42)

random_forest_regressor.fit(X_train, y_train)

y_pred_rf = random_forest_regressor.predict(X_test)

mse_rf = mean_squared_error(y_test, y_pred_rf)

decision_tree_regressor = DecisionTreeRegressor(random_state=42)

decision_tree_regressor.fit(X_train, y_train)

y_pred_dt = decision_tree_regressor.predict(X_test)

mse_dt = mean_squared_error(y_test, y_pred_dt)

print(f"Random Forest MSE: {mse_rf:.2f}")
print(f"Decision Tree MSE: {mse_dt:.2f}")


25. Compute the Out-of-Bag (OOB) Score for a Random Forest Classifier

In [None]:
#code for the above Ques.
  from sklearn.ensemble import RandomForestClassifier
  from sklearn.datasets import make_classification
  from sklearn.model_selection import train_test_split
  from sklearn.metrics import accuracy_score

  X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  random_forest_classifier = RandomForestClassifier(n_estimators=100, random_state=42, oob_score=True)
  random_forest_classifier.fit(X_train, y_train)
  oob_score = random_forest_classifier.oob_score_
  print(f"Out-of-Bag (OOB) Score: {oob_score:.4f}")
  y_pred = random_forest_classifier.predict(X_test)
  accuracy = accuracy_score(y_test, y_pred)
  print(f"Model Accuracy: {accuracy:.4f}")


26. Train a Bagging Classifier using SVM as a base estimator and print accuracy


In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingClassifier
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svm_classifier = SVC

bagging_classifier = BaggingClassifier(base_estimator=svm_classifier, n_estimators=10, random_state=42)

bagging_classifier.fit(X_train, y_train)

y_pred = bagging_classifier.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

27. Train a Random Forest Classifier with different numbers of trees and compare accuracy

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

num_trees = [10, 50, 100, 200]

for n_trees in num_trees:
    random_forest_classifier = RandomForestClassifier(n_estimators=n_trees, random_state=42)
    random_forest_classifier.fit(X_train, y_train)
    y_pred = random_forest_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Number of Trees: {n_trees}, Accuracy: {accuracy:.2f}")

28. Train a Bagging Classifier using Logistic Regression as a base estimator and print AUC score

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

num_trees = [10, 50, 100, 200]

for n_trees in num_trees:
    random_forest_classifier = RandomForestClassifier(n_estimators=n_trees, random_state=42)
    random_forest_classifier.fit(X_train, y_train)
    y_pred = random_forest_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Number of Trees: {n_trees}, Accuracy: {accuracy:.2f}")

29. Train a Random Forest Regressor and analyze feature importance scores

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

num_trees = [10, 50, 100, 200]

for n_trees in num_trees:
    random_forest_classifier = RandomForestClassifier(n_estimators=n_trees, random_state=42)
    random_forest_classifier.fit(X_train, y_train)
    y_pred = random_forest_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Number of Trees: {n_trees}, Accuracy: {accuracy:.2f}")

30. Train an ensemble model using both Bagging and Random Forest and compare accuracy.

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

num_trees = [10, 50, 100, 200]

for n_trees in num_trees:
    random_forest_classifier = RandomForestClassifier(n_estimators=n_trees, random_state=42)
    random_forest_classifier.fit(X_train, y_train)
    y_pred = random_forest_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Number of Trees: {n_trees}, Accuracy: {accuracy:.2f}")

31. Train a Random Forest Classifier and tune hyperparameters using GridSearchCV

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

num_trees = [10, 50, 100, 200]

for n_trees in num_trees:
    random_forest_classifier = RandomForestClassifier(n_estimators=n_trees, random_state=42)
    random_forest_classifier.fit(X_train, y_train)
    y_pred = random_forest_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Number of Trees: {n_trees}, Accuracy: {accuracy:.2f}")

32. Train a Bagging Regressor with different numbers of base estimators and compare performance

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

num_trees = [10, 50, 100, 200]

for n_trees in num_trees:
    random_forest_classifier = RandomForestClassifier(n_estimators=n_trees, random_state=42)
    random_forest_classifier.fit(X_train, y_train)
    y_pred = random_forest_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Number of Trees: {n_trees}, Accuracy: {accuracy:.2f}")

33. Train a Random Forest Classifier and analyze misclassified samples

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

num_trees = [10, 50, 100, 200]

for n_trees in num_trees:
    random_forest_classifier = RandomForestClassifier(n_estimators=n_trees, random_state=42)
    random_forest_classifier.fit(X_train, y_train)
    y_pred = random_forest_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Number of Trees: {n_trees}, Accuracy: {accuracy:.2f}")

34. Train a Bagging Classifier and compare its performance with a single Decision Tree Classifier

In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

decision_tree = DecisionTreeClassifier(random_state=42)

bagging_classifier = BaggingClassifier(base_estimator=decision_tree, n_estimators=10, random_state=42)

decision_tree.fit(X_train, y_train)

bagging_classifier.fit(X_train, y_train)

y_pred_tree = decision_tree.predict(X_test)
y_pred_bagging = bagging_classifier.predict(X_test)

accuracy_tree = accuracy_score(y_test, y_pred_tree)
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)

print(f"Decision Tree Accuracy: {accuracy_tree:.2f}")
print(f"Bagging Classifier Accuracy: {accuracy_bagging:.2f}")


35. Train a Random Forest Classifier and visualize the confusion matrix

In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

decision_tree = DecisionTreeClassifier(random_state=42)

bagging_classifier = BaggingClassifier(base_estimator=decision_tree, n_estimators=10, random_state=42)

decision_tree.fit(X_train, y_train)

bagging_classifier.fit(X_train, y_train)

y_pred_tree = decision_tree.predict(X_test)
y_pred_bagging = bagging_classifier.predict(X_test)

accuracy_tree = accuracy_score(y_test, y_pred_tree)
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)

print(f"Decision Tree Accuracy: {accuracy_tree:.2f}")
print(f"Bagging Classifier Accuracy: {accuracy_bagging:.2f}")

36. Train a Stacking Classifier using Decision Trees, SVM, and Logistic Regression, and compare accuracy

In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

decision_tree = DecisionTreeClassifier(random_state=42)

bagging_classifier = BaggingClassifier(base_estimator=decision_tree, n_estimators=10, random_state=42)

decision_tree.fit(X_train, y_train)

bagging_classifier.fit(X_train, y_train)

y_pred_tree = decision_tree.predict(X_test)
y_pred_bagging = bagging_classifier.predict(X_test)

accuracy_tree = accuracy_score(y_test, y_pred_tree)
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)

print(f"Decision Tree Accuracy: {accuracy_tree:.2f}")
print(f"Bagging Classifier Accuracy: {accuracy_bagging:.2f}")

37. Train a Random Forest Classifier and print the top 5 most important features

In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

decision_tree = DecisionTreeClassifier(random_state=42)

bagging_classifier = BaggingClassifier(base_estimator=decision_tree, n_estimators=10, random_state=42)

decision_tree.fit(X_train, y_train)

bagging_classifier.fit(X_train, y_train)

y_pred_tree = decision_tree.predict(X_test)
y_pred_bagging = bagging_classifier.predict(X_test)

accuracy_tree = accuracy_score(y_test, y_pred_tree)
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)

print(f"Decision Tree Accuracy: {accuracy_tree:.2f}")
print(f"Bagging Classifier Accuracy: {accuracy_bagging:.2f}")

38. Train a Bagging Classifier and evaluate performance using Precision, Recall, and F1-score

In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

decision_tree = DecisionTreeClassifier(random_state=42)

bagging_classifier = BaggingClassifier(base_estimator=decision_tree, n_estimators=10, random_state=42)

decision_tree.fit(X_train, y_train)

bagging_classifier.fit(X_train, y_train)

y_pred_tree = decision_tree.predict(X_test)
y_pred_bagging = bagging_classifier.predict(X_test)

accuracy_tree = accuracy_score(y_test, y_pred_tree)
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)

print(f"Decision Tree Accuracy: {accuracy_tree:.2f}")
print(f"Bagging Classifier Accuracy: {accuracy_bagging:.2f}")

39. Train a Random Forest Classifier and analyze the effect of max_depth on accuracy

In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

decision_tree = DecisionTreeClassifier(random_state=42)

bagging_classifier = BaggingClassifier(base_estimator=decision_tree, n_estimators=10, random_state=42)

decision_tree.fit(X_train, y_train)

bagging_classifier.fit(X_train, y_train)

y_pred_tree = decision_tree.predict(X_test)
y_pred_bagging = bagging_classifier.predict(X_test)

accuracy_tree = accuracy_score(y_test, y_pred_tree)
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)

print(f"Decision Tree Accuracy: {accuracy_tree:.2f}")
print(f"Bagging Classifier Accuracy: {accuracy_bagging:.2f}")

40. Train a Bagging Regressor using different base estimators (DecisionTree and KNeighbors) and compare performance

In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, y = make_regression(n_samples=1000, n_features=10, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

decision_tree_regressor = DecisionTreeRegressor(random_state=42)

knn_regressor = KNeighborsRegressor()

bagging_tree = BaggingRegressor(base_estimator=decision_tree_regressor, n_estimators=10, random_state=42)
bagging_knn = BaggingRegressor(base_estimator=knn_regressor, n_estimators=10, random_state=42)

bagging_tree.fit(X_train, y_train)

bagging_knn.fit(X_train, y_train)

y_pred_tree = bagging_tree.predict(X_test)
y_pred_knn = bagging_knn.predict(X_test)

mse_tree = mean_squared_error(y_test, y_pred_tree)
mse_knn = mean_squared_error(y_test, y_pred_knn)

print(f"Bagging Regressor with Decision Tree MSE: {mse_tree:.2f}")
print(f"Bagging Regressor with KNeighbors MSE: {mse_knn:.2f}")


41. Train a Random Forest Classifier and evaluate its performance using ROC-AUC Score

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
import numpy as np

X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

random_forest = RandomForestClassifier(n_estimators=100, random_state=42)

random_forest.fit(X_train, y_train)

y_prob = random_forest.predict_proba(X_test)[:, 1]

roc_auc = roc_auc_score(y_test, y_prob)

print(f"Random Forest Classifier ROC-AUC Score: {roc_auc:.2f}")

42. Train a Bagging Classifier and evaluate its performance using cross-validatio

In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

decision_tree = DecisionTreeClassifier(random_state=42)

bagging_classifier = BaggingClassifier(base_estimator=decision_tree, n_estimators=10, random_state=42)

decision_tree.fit(X_train, y_train)

bagging_classifier.fit(X_train, y_train)

y_pred_tree = decision_tree.predict(X_test)
y_pred_bagging = bagging_classifier.predict(X_test)

accuracy_tree = accuracy_score(y_test, y_pred_tree)
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)

print(f"Decision Tree Accuracy: {accuracy_tree:.2f}")
print(f"Bagging Classifier Accuracy: {accuracy_bagging:.2f}")

43. Train a Random Forest Classifier and plot the Precision-Recall curve

In [None]:
#code for the above Ques.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

random_forest = RandomForestClassifier(n_estimators=100, random_state=42)

random_forest.fit(X_train, y_train)

y_prob = random_forest.predict_proba(X_test)[:, 1]

precision, recall, thresholds = precision_recall_curve(y_test, y_prob)

plt.plot(recall, precision, marker='.')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.show()

44. Train a Stacking Classifier with Random Forest and Logistic Regression and compare accuracy

In [None]:
#code for the above Ques.
from sklearn.ensemble import StackingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

estimators = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('lr', LogisticRegression(random_state=42))
]

stacking_classifier = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())

stacking_classifier.fit(X_train, y_train)

y_pred_stacking = stacking_classifier.predict(X_test)

accuracy_stacking = accuracy_score(y_test, y_pred_stacking)

print(f"Stacking Classifier Accuracy: {accuracy_stacking:.2f}")

45. Train a Bagging Regressor with different levels of bootstrap samples and compare performance.

In [None]:
#code for the above Ques.
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

x,y=make_regression(n_samples=1000,n_features=10,random_state=42)

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

decision_tree_regressor=DecisionTreeRegressor(random_state=42)

bagging_regressor=BaggingRegressor(base_estimator=decision_tree_regressor,n_estimators=10,random_state=42)

bagging_regressor.fit(x_train,y_train)

y_pred=bagging_regressor.predict(x_test)

mse=mean_squared_error(y_test,y_pred)

print(f"Bagging Regressor MSE: {mse:.2f}")