In [None]:
#1)
Fundamental idea: combine multiple base learners to produce a single stronger model (reduce variance, bias, or
improve predictions) by aggregating diverse hypotheses.
Bagging (Bootstrap Aggregating): trains base learners independently on bootstrap samples and aggregates (e.g.,
voting/averaging). Objective: reduce variance and prevent overfitting. Works well with high-variance models (e.g.,
decision trees).
Boosting: trains learners sequentially, where each new learner focuses on correcting errors of previous ones by
reweighting samples (or gradients). Objective: reduce bias and build a strong learner from weak learners (e.g.,
AdaBoost, Gradient Boosting).

#2)
RF uses many decorrelated trees built on bootstrap samples and at each split considers a random subset of
features, averaging their predictions to reduce variance and avoid the single-tree overfit.
Key hyperparameters:
1. n_estimators (number of trees): larger reduces variance but increases cost.
2. max_features (features considered per split): smaller values increase decorrelation between trees, reducing
overfitting; common choices: sqrt(n_features) for classification.

#3)
Stacking (stacked generalization): trains several base-level models (level-0), then trains a meta-model (level-1) on
the base models' out-of-fold predictions to learn how to combine them.
Difference from bagging/boosting:
Bagging averages independent models; boosting sequentially corrects mistakes.
Stacking learns an optimal combination of heterogeneous models using a meta-learner.
Example use case: blend logistic regression, random forest, and XGBoost as level-0 models and train a simple
meta-model (e.g., logistic regression) to combine their outputs for better classification.

#4)
OOB (Out-Of-Bag) score: for each tree, samples not included in that tree's bootstrap sample (about ~36% of data)
are used as a validation set. Aggregating predictions across trees for their OOB samples yields an OOB estimate.
Usefulness: provides an unbiased estimate of generalization performance without needing a separate validation
set or cross-validation—convenient for quick model evaluation and hyperparameter tuning.

#5)
AdaBoost: reweights training samples—misclassified samples receive higher weight; weak learners focus on hard
examples.
Gradient Boosting: fits each new learner to the negative gradient (residuals) of the loss function — a functional
gradient descent viewpoint.
Weight adjustment:
AdaBoost: explicit sample-weight updates based on previous learners' errors.
Gradient Boosting: no explicit sample weights; uses residuals/gradients to guide the next learner.
Typical use cases:
AdaBoost: simpler boosting for binary classification problems; works with decision stumps.
Gradient Boosting (e.g., XGBoost, LightGBM, CatBoost): powerful for tabular data, regression & classification;
often wins ML competitions.

#6)
CatBoost uses target-based statistics (ordered target encoding / permutations) and combinations of categorical
features internally to avoid target leakage and overfitting.
It applies efficient encoding and symmetric tree construction that naturally handles categorical variables without
extensive one-hot encoding or preprocessing, reducing overfitting and runtime memory.

#7)
1. Load dataset: sklearn.datasets.load_wine()
2. Split: train_test_split(test_size=0.30, random_state=42, stratify=y)
3. Train KNN (k=5) without scaling; evaluate accuracy + classification_report.
4. Apply StandardScaler on features; retrain KNN; compare metrics.
5. GridSearchCV over n_neighbors=1..20 and metric in ['minkowski' (p=2 Euclidean), 'manhattan'] (use sklearn's 'p'
parameter or specify metrics).
6. Train optimized KNN and compare.
Expected concise findings (typical outcomes):
Unscaled KNN often performs worse because KNN is distance-based and sensitive to feature scales.
After StandardScaler, accuracy and F1 usually improve.
Grid search typically picks a small-to-moderate k (e.g., 3–9) and often Euclidean distance; optimized model usually
beats unscaled baseline.
Concise example code snippet (summarized):
```


In [1]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=42,stratify=y)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train,y_train)
print(classification_report(y_test, knn.predict(X_test)))
scaler = StandardScaler().fit(X_train)
Xtr_s = scaler.transform(X_train); Xte_s = scaler.transform(X_test)
knn_s = KNeighborsClassifier(n_neighbors=5).fit(Xtr_s,y_train)
print(classification_report(y_test, knn_s.predict(Xte_s)))
params = {'n_neighbors': list(range(1,21)), 'metric': ['minkowski','manhattan'], 'p':[2,1]}
gs = GridSearchCV(KNeighborsClassifier(), params, cv=5).fit(Xtr_s, y_train)
best = gs.best_estimator_
print(gs.best_params_, accuracy_score(y_test, best.predict(Xte_s)))

              precision    recall  f1-score   support

           0       0.89      0.89      0.89        18
           1       0.78      0.67      0.72        21
           2       0.50      0.60      0.55        15

    accuracy                           0.72        54
   macro avg       0.72      0.72      0.72        54
weighted avg       0.74      0.72      0.73        54

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      0.86      0.92        21
           2       0.83      1.00      0.91        15

    accuracy                           0.94        54
   macro avg       0.94      0.95      0.94        54
weighted avg       0.95      0.94      0.94        54

{'metric': 'minkowski', 'n_neighbors': 4, 'p': 1} 0.9814814814814815


In [None]:
#8)
1. Load breast cancer dataset.
2. Standardize features.
3. Apply PCA, compute explained_variance_ratio_ and plot scree plot.
4. Choose number of components to retain 95% variance (use cumulative sum).
5. Transform data to reduced dimensions.
6. Train KNN on original standardized data and on PCA-transformed data; compare accuracy.
7. Plot first two principal components scatter colored by class.
Expected concise findings:
Scree plot shows first few components explain most variance (often 2–5 components dominant).
Keeping 95% variance typically reduces dimensionality significantly.
KNN on PCA data may have similar or slightly lower accuracy but runs faster and mitigates curse of dimensionality.
Scatter plot of PC1 vs PC2 often shows class separation for breast cancer dataset.
Concise example code snippet (summarized):
```
from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# load, scale, PCA, keep components for 95% cumulative variance, train KNN and compare.

#9)
1. make_regression(n_samples=500, n_features=10, noise=0.1, random_state=42)
2. Train KNNRegressor with k=5 for Euclidean (p=2) and Manhattan (p=1); compare MSE.
3. Evaluate for k in [1,5,10,20,50] and plot k vs MSE to observe bias-variance tradeoff.
Expected concise findings:
Manhattan vs Euclidean: differences depend on data distribution; Euclidean often slightly better for Gaussian-like
features.
K small (1) yields low bias, high variance (low train MSE, high test MSE). Large K increases bias, lowers variance.
Plot shows U-shaped test MSE vs K, optimal K in middle.
Concise example code snippet:
```
from sklearn.datasets import make_regression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
# generate data, split, test different metrics and k values, plot results.

#10)
1. Load diabetes CSV (provided URL).
2. Replace zeros in certain columns with NaN if dataset uses 0 to denote missing (e.g., glucose, BMI), then use
KNNImputer to impute.
3. Compare KNeighborsClassifier with algorithm='brute', 'kd_tree', 'ball_tree' (note: kd_tree/ball_tree require
appropriate metric and numeric data).
4. Measure training time (use time.time()) and accuracy on a test split.
5. For decision boundary, select two most important features (e.g., via univariate feature importance or model
coefficients), train best method and plot 2D decision boundary.
Expected concise findings:
- KD-Tree and Ball Tree are faster than brute for larger datasets with low-to-moderate dimensions; brute can be
competitive for small datasets.
- Imputation with KNNImputer helps restore missing values and can improve downstream accuracy.
- Decision boundary plot (2 features) visualizes regions of predicted class.
Concise example KNNImputer snippet:
```
from sklearn.impute import KNNImputer
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/MasteriNeuron/datasets/refs/heads/main/diabetes.csv')