#### Introduction
Gradient Boosting is an ensemble learning algorithm that builds models sequentially, each new model correcting errors made by the previous ones.

#### How Gradient Boosting Works
1. **Initialize Model**: Start with an initial prediction, typically the mean of the target variable.
2. **Calculate Residuals**: Compute the errors (residuals) of the current model.
3. **Train Weak Learner**: Train a new model on the residuals.
4. **Update Model**: Add the new model to the ensemble to correct errors.

#### Advantages
- High predictive accuracy.
- Robust to overfitting with proper tuning.
- Handles various types of data.

#### Disadvantages
- Computationally intensive.
- Requires careful tuning of parameters.
- Sensitive to noisy data.

#### Steps to Build a Gradient Boosting Model
1. **Data Preparation**: Clean the data and handle missing values.
2. **Train-Test Split**: Split the dataset into training and test sets.
3. **Model Training**: Train the gradient boosting model using the training data.
4. **Model Evaluation**: Assess the performance using appropriate metrics.
5. **Hyperparameter Tuning**: Optimize parameters like learning rate, number of estimators, and max depth.


In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize Gradient Boosting model
gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# Train the model
gb.fit(X_train, y_train)

# Make predictions
y_pred = gb.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))



Accuracy: 95.91%

Classification Report:
              precision    recall  f1-score   support

   malignant       0.95      0.94      0.94        63
      benign       0.96      0.97      0.97       108

    accuracy                           0.96       171
   macro avg       0.96      0.95      0.96       171
weighted avg       0.96      0.96      0.96       171


Confusion Matrix:
[[ 59   4]
 [  3 105]]


In [None]:
you
