<a href="https://colab.research.google.com/github/ABBAS-37405/PYTHON-AND-DATA-SCIENCE/blob/main/XGBoost_ML_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **XGBoost Classifier**

XGBoost (eXtreme Gradient Boosting) is a powerful and popular open-source machine learning algorithm that is known for its speed and performance. It's an implementation of gradient-boosted decision trees, designed to be highly efficient, flexible, and portable. Here are some key aspects:

Gradient Boosting: XGBoost is built on the concept of gradient boosting, where new models are created that predict the residuals or errors of previous models and then added together to make the final prediction.
Decision Trees: The 'base learners' in XGBoost are typically decision trees, which are simple, tree-like structures used for classification and regression.
Regularization: It includes various regularization techniques (L1 and L2 regularization) to prevent overfitting, which helps in improving generalization performance.
Parallel Processing: XGBoost is optimized for parallel processing, making it significantly faster than other gradient boosting implementations, especially on large datasets.
Flexibility: It can handle various types of data and problems, including classification, regression, and ranking.
In essence, XGBoost combines many weak prediction models (decision trees) to create a stronger, more accurate predictive model, while also incorporating techniques to make the process fast and prevent the model from becoming too complex.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the dataset
df = pd.read_csv("healthcare_data_10000.csv")

# Encode target: Low = 0, High = 1
df['health_risk_category'] = df['health_risk_category'].map({'Low': 0, 'High': 1})

# Features and target
X = df.drop(columns=['health_risk_category'])
y = df['health_risk_category']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
pip install xgboost



In [None]:
from xgboost import XGBClassifier
from sklearn.metrics import classification_report\

model = XGBClassifier(n_estimators = 100, learning_rate = 0.1, eval_metric = 'logloss', random_state = 42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

classification_report = classification_report(y_pred, y_test, target_names= ['Low', "High"])
print(classification_report)

              precision    recall  f1-score   support

         Low       1.00      1.00      1.00      1736
        High       1.00      0.99      1.00       264

    accuracy                           1.00      2000
   macro avg       1.00      1.00      1.00      2000
weighted avg       1.00      1.00      1.00      2000



# **XGBoost Regressor**

In [None]:
# Step 2: Select numeric columns
numeric_cols = [
    'age', 'bmi', 'systolic_bp', 'diastolic_bp',
    'cholesterol_level', 'glucose_level',
    'exercise_mins_per_week', 'alcohol_units_per_week', 'medications_count'
]
target = 'heart_rate'

# Step 3: Feature matrix (X) and target vector (y)
X = df[numeric_cols]
y = df[target]

# Step 4: Train-test split
X_train, X_test, y_train_reg, y_test_reg = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 5: Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

model = XGBRegressor(n_estimators = 100, learning_rate = 0.1, random_state = 42)
model.fit(X_train_scaled, y_train_reg)

y_pred_reg = model.predict(X_test_scaled)

mean_squared_error = mean_squared_error(y_pred_reg, y_test_reg)
RMSE = np.sqrt(mean_squared_error)
print(RMSE)

10.361777831016063
