<a href="https://colab.research.google.com/github/dionysusshan/ml/blob/main/Stacking_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import StackingClassifier
from sklearn.metrics import accuracy_score, classification_report

# Step 1: Load Data
file_path = '/content/sample_data/imputed_datasetknn.csv'  # Replace with your actual file path
df = pd.read_csv(file_path)

# Display the first few rows of the dataset
print("Original Data:")
print(df.head())

# Step 2: Preprocess Data
# Assume the last column is the target variable
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

# Check if the target variable y needs to be converted to categorical
if not np.issubdtype(y.dtype, np.integer):
    y = pd.cut(y, bins=3, labels=False)  # Example: Convert continuous to 3 categories

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features (optional but recommended for consistency across models)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 3: Define the Base Models
base_models = [
    ('lr', LogisticRegression(random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=5)),
    ('svc', SVC(kernel='rbf', probability=True, random_state=42))
]

# Step 4: Define the Meta-Model
meta_model = LogisticRegression(random_state=42)

# Step 5: Create the Stacking Classifier
stacking_model = StackingClassifier(estimators=base_models, final_estimator=meta_model, cv=5)

# Step 6: Train the Stacking Classifier
stacking_model.fit(X_train, y_train)

# Step 7: Make Predictions
y_pred = stacking_model.predict(X_test)

# Step 8: Evaluate the Predictions
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(f"Classification Report:\n{report}")

# Optional: Save the models
import joblib
joblib.dump(stacking_model, 'stacking_model.pkl')
print("Stacking model saved.")


Original Data:
    AQI-IN    PM25    PM10     PM1  Temp(cel)     Hum   Noise  TVOC(ppm)  \
0  114.358  64.170  85.939  60.400     22.607  98.277  48.345      0.009   
1   95.474  57.260  75.377  53.974     22.373  95.771  48.370      0.010   
2   78.380  47.045  59.341  44.598     24.292  85.416  48.078      0.008   
3   65.078  39.044  47.500  37.322     28.122  71.982  50.844      0.008   
4   59.369  35.598  42.358  34.196     30.972  63.498  50.855      0.010   

   CO(ppm)  CO2(ppm)  SO2(ppm)  NO2(ppm)  O3(ppm)  AQI-IN(F)     CI    VI  \
0    0.392   482.552     0.002     0.008    0.021    114.358  9.873  10.0   
1    0.454   486.747     0.002     0.008    0.023     95.474  9.006  10.0   
2    0.667   482.067     0.003     0.009    0.025     82.128  9.000  10.0   
3    0.680   462.433     0.002     0.009    0.026     77.250  9.000  10.0   
4    0.697   455.927     0.002     0.009    0.024     76.173  9.000  10.0   

   particle count(0 3)  particle count(0 5)  particle count(1 0) 

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Accuracy: 0.9948717948717949
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      1307
           1       0.98      0.88      0.92        49
           2       0.90      1.00      0.95         9

    accuracy                           0.99      1365
   macro avg       0.96      0.96      0.96      1365
weighted avg       0.99      0.99      0.99      1365

Stacking model saved.
