<a href="https://colab.research.google.com/github/ABBAS-37405/PYTHON-AND-DATA-SCIENCE/blob/main/Trad_GBM_ML_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Trad GBM Classifier**

A Gradient Boosting Machine (GBM) is a powerful machine learning algorithm that builds a predictive model in the form of an ensemble of weak prediction models, typically decision trees. It works by iteratively adding new models that correct the errors of previous models. In essence, it focuses on the misclassified examples (or residuals in regression) from the prior steps, gradually improving the overall model's accuracy. It's widely used for both classification and regression tasks due to its high accuracy.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the dataset
df = pd.read_csv("healthcare_data_10000.csv")

# Encode target: Low = 0, High = 1
df['health_risk_category'] = df['health_risk_category'].map({'Low': 0, 'High': 1})

# Features and target
X = df.drop(columns=['health_risk_category'])
y = df['health_risk_category']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [2]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report

model = GradientBoostingClassifier(n_estimators= 100, learning_rate= 0.1, random_state= 42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

classification_report = classification_report(y_pred, y_test, target_names = ['Low', 'High'])
print(classification_report)

              precision    recall  f1-score   support

         Low       1.00      1.00      1.00      1738
        High       1.00      1.00      1.00       262

    accuracy                           1.00      2000
   macro avg       1.00      1.00      1.00      2000
weighted avg       1.00      1.00      1.00      2000



# **Trad GBM Regressor**

In [3]:
# Step 2: Select numeric columns
numeric_cols = [
    'age', 'bmi', 'systolic_bp', 'diastolic_bp',
    'cholesterol_level', 'glucose_level',
    'exercise_mins_per_week', 'alcohol_units_per_week', 'medications_count'
]
target = 'heart_rate'

# Step 3: Feature matrix (X) and target vector (y)
X = df[numeric_cols]
y = df[target]

# Step 4: Train-test split
X_train, X_test, y_train_reg, y_test_reg = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 5: Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [4]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

model = GradientBoostingRegressor(n_estimators= 100, learning_rate= 0.1, random_state= 42)
model.fit(X_train_scaled, y_train_reg)

y_pred_reg = model.predict(X_test_scaled)

mean_squared_error = mean_squared_error(y_pred_reg, y_test_reg)
RMSE = np.sqrt(mean_squared_error)
print(RMSE)

10.211193623306475
