<a href="https://colab.research.google.com/github/ABBAS-37405/PYTHON-AND-DATA-SCIENCE/blob/main/Adaboost_ML_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **AdaBoost Classification Model**

AdaBoost (Adaptive Boosting) is an ensemble machine learning meta-algorithm that works by combining multiple weak learners (typically decision trees with one split, called decision stumps) to create a strong learner. It sequentially builds the model, where each subsequent weak learner is trained on data that was misclassified by the previous learners. It assigns higher weights to misclassified samples, forcing the next learner to focus more on those difficult cases. This iterative process allows AdaBoost to reduce bias and variance, often leading to improved accuracy compared to individual weak learners.

A classification report is a summary of the performance of a classification algorithm. It typically includes metrics like precision, recall, f1-score, and support for each class, as well as overall averages.

Here's a brief breakdown of what each means:

Precision: The ability of the classifier not to label as positive a sample that is negative. It's the ratio of correctly predicted positive observations to the total predicted positive observations.
Recall (Sensitivity): The ability of the classifier to find all the positive samples. It's the ratio of correctly predicted positive observations to all observations in actual class.
F1-score: The weighted average of Precision and Recall. It tries to find the balance between precision and recall, and it's a good measure when you have an uneven class distribution.
Support: The number of actual occurrences of the class in the specified dataset.
It provides a detailed breakdown of how well the model performed on each class, which is crucial for understanding its strengths and weaknesses, especially in imbalanced datasets.

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Step 1: Load the dataset
df = pd.read_csv("/content/drive/MyDrive/Datasets/healthcare_data_10000.csv")

df.head()

Unnamed: 0,age,bmi,systolic_bp,diastolic_bp,cholesterol_level,glucose_level,heart_rate,exercise_mins_per_week,alcohol_units_per_week,medications_count,...,has_diabetes,has_hypertension,has_heart_disease,has_kidney_disease,has_asthma,has_allergies,has_mental_health_issues,vaccinated,visited_doctor_last_year,health_risk_category
0,69,24.8,98,80,206,97,68,147,8,6,...,0,0,0,0,0,0,0,1,0,Low
1,32,29.5,123,99,187,98,77,169,3,2,...,0,0,0,0,0,0,0,1,1,Low
2,89,36.3,138,82,187,95,78,140,3,6,...,0,0,0,0,1,0,0,1,1,Low
3,78,24.7,143,78,205,108,65,170,4,1,...,0,0,0,0,0,0,0,1,0,Low
4,38,33.9,124,77,166,106,72,155,7,0,...,0,0,1,0,0,0,0,1,1,High


In [7]:
# Step 2: Encode target variable ('Low' = 0, 'High' = 1)
df['health_risk_category'] = df['health_risk_category'].map({'Low': 0, 'High': 1})

# Step 3: Separate features and target
X = df.drop(columns=['health_risk_category'])
y = df['health_risk_category']

# Step 4: Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Step 5: Scale numeric features (optional but recommended)
numeric_cols = [
    'age', 'bmi', 'systolic_bp', 'diastolic_bp',
    'cholesterol_level', 'glucose_level', 'heart_rate',
    'exercise_mins_per_week', 'alcohol_units_per_week', 'medications_count'
]

scaler = StandardScaler()
X_train[numeric_cols] = scaler.fit_transform(X_train[numeric_cols])
X_test[numeric_cols] = scaler.transform(X_test[numeric_cols])

In [8]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import classification_report


model = AdaBoostClassifier(n_estimators= 100, random_state= 42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

classification_report = classification_report(y_test, y_pred, target_names=["Low", "High"])
print(classification_report)

              precision    recall  f1-score   support

         Low       1.00      1.00      1.00      1738
        High       0.99      1.00      0.99       262

    accuracy                           1.00      2000
   macro avg       0.99      1.00      1.00      2000
weighted avg       1.00      1.00      1.00      2000



# **AdaBoost Regression Model**

In [9]:
# Step 2: Select numeric columns
numeric_cols = [
    'age', 'bmi', 'systolic_bp', 'diastolic_bp',
    'cholesterol_level', 'glucose_level',
    'exercise_mins_per_week', 'alcohol_units_per_week', 'medications_count'
]
target = 'heart_rate'

# Step 3: Feature matrix (X) and target vector (y)
X = df[numeric_cols]
y = df[target]

# Step 4: Train-test split
X_train, X_test, y_train_reg, y_test_reg = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 5: Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [10]:
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_squared_error


model = AdaBoostRegressor(n_estimators= 100, random_state= 42)
model.fit(X_train_scaled, y_train_reg)

y_pred_reg = model.predict(X_test_scaled)

mean_squared_error = mean_squared_error(y_test_reg, y_pred_reg)
print(mean_squared_error)

103.42942278985714
