<a href="https://colab.research.google.com/github/ABBAS-37405/PYTHON-AND-DATA-SCIENCE/blob/main/LightGBM_ML_Models%2B(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **LightGBM Classifier**

LightGBM (Light Gradient Boosting Machine) is an open-source, distributed, high-performance gradient boosting framework based on decision tree algorithms, used for ranking, classification, and other machine learning tasks. It was developed by Microsoft. Here are some of its key features:

High Performance: LightGBM is designed to be very fast and efficient, especially on large datasets. It uses techniques like Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) to speed up training.
Low Memory Usage: Compared to other gradient boosting frameworks like XGBoost, LightGBM consumes less memory.
Accuracy: Despite its speed, LightGBM often achieves state-of-the-art accuracy on many tasks.
Scalability: It can be used for distributed training, making it suitable for very large datasets and complex models.
Decision Tree Based: It builds decision trees sequentially, with each new tree trying to correct the errors of the previous ones. It uses a leaf-wise (best-first) tree growth strategy, as opposed to level-wise, which can converge faster and lead to more complex models on the same number of leaves.
In essence, LightGBM is a powerful and efficient algorithm that is widely used in competitive machine learning and real-world applications.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the dataset
df = pd.read_csv("healthcare_data_10000.csv")

# Encode target: Low = 0, High = 1
df['health_risk_category'] = df['health_risk_category'].map({'Low': 0, 'High': 1})

# Features and target
X = df.drop(columns=['health_risk_category'])
y = df['health_risk_category']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [2]:
from lightgbm import LGBMClassifier
from sklearn.metrics import classification_report

model = LGBMClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

classification_report = classification_report(y_pred, y_test, target_names=['Low', 'High'])
print(classification_report)

[LightGBM] [Info] Number of positive: 1047, number of negative: 6953
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002311 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1003
[LightGBM] [Info] Number of data points in the train set: 8000, number of used features: 20
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.130875 -> initscore=-1.893244
[LightGBM] [Info] Start training from score -1.893244
              precision    recall  f1-score   support

         Low       1.00      1.00      1.00      1738
        High       1.00      1.00      1.00       262

    accuracy                           1.00      2000
   macro avg       1.00      1.00      1.00      2000
weighted avg       1.00      1.00      1.00      2000





# **LightGBM Regressor**

In [3]:
# Step 2: Select numeric columns
numeric_cols = [
    'age', 'bmi', 'systolic_bp', 'diastolic_bp',
    'cholesterol_level', 'glucose_level',
    'exercise_mins_per_week', 'alcohol_units_per_week', 'medications_count'
]
target = 'heart_rate'

# Step 3: Feature matrix (X) and target vector (y)
X = df[numeric_cols]
y = df[target]

# Step 4: Train-test split
X_train, X_test, y_train_reg, y_test_reg = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 5: Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [5]:
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

model = LGBMRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X_train_scaled, y_train_reg)

y_pred_reg = model.predict(X_test_scaled)

mean_squared_error = mean_squared_error(y_pred_reg, y_test_reg)
rmse = np.sqrt(mean_squared_error)
print(rmse)

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000736 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 940
[LightGBM] [Info] Number of data points in the train set: 8000, number of used features: 9
[LightGBM] [Info] Start training from score 71.446375
10.432615721000513


