Make model

Overview with AI studio Auto-model
- Using Cleaned data to make a overview

![alt text](ImageResource/AI-overview.png)

From overview, select:
- general linear model
- Decision Tree
- Gradient Boost Trees

Generalize Linear model

In [25]:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegressionCV
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

# Load and encode data
data = pd.read_csv('CleanedData.csv')
data = data.apply(LabelEncoder().fit_transform)
data = data.sample(100)  # Random sampling

# Split features and target
X = data.drop('class', axis=1)
Y = data['class']

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.5, random_state=42)

# Define pipelines
pipeline_RFC = Pipeline([
    ('scaler', StandardScaler()),  # Scaling features
    ('classifier', RandomForestClassifier(random_state=42))
])

pipeline_DT = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', DecisionTreeClassifier(random_state=42))
])

pipeline_LR = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', LogisticRegressionCV(cv=5, random_state=42, max_iter=1000))
])

# Train models
pipeline_RFC.fit(X_train, y_train)
pipeline_DT.fit(X_train, y_train)
pipeline_LR.fit(X_train, y_train)

# Make predictions
preds_RFC = pipeline_RFC.predict(X_test)
preds_DT = pipeline_DT.predict(X_test)
preds_LR = pipeline_LR.predict(X_test)

# Evaluate models using classification metrics
def evaluate_model(y_test, preds, model_name):
    print(f"--- {model_name} ---")
    print(f"Accuracy: {accuracy_score(y_test, preds)}")
    print(f"Precision: {precision_score(y_test, preds, average='weighted')}")
    print(f"Recall: {recall_score(y_test, preds, average='weighted')}")
    print(f"F1 Score: {f1_score(y_test, preds, average='weighted')}")
    print(classification_report(y_test, preds))

# Evaluate each model
evaluate_model(y_test, preds_DT, "Decision Tree")
evaluate_model(y_test, preds_RFC, "Random Forest")
evaluate_model(y_test, preds_LR, "Logistic Regression")


--- Decision Tree ---
Accuracy: 0.98
Precision: 0.9807692307692308
Recall: 0.98
F1 Score: 0.9799919967987194
              precision    recall  f1-score   support

           0       0.96      1.00      0.98        25
           1       1.00      0.96      0.98        25

    accuracy                           0.98        50
   macro avg       0.98      0.98      0.98        50
weighted avg       0.98      0.98      0.98        50

--- Random Forest ---
Accuracy: 0.94
Precision: 0.9407051282051282
Recall: 0.94
F1 Score: 0.9399759903961583
              precision    recall  f1-score   support

           0       0.96      0.92      0.94        25
           1       0.92      0.96      0.94        25

    accuracy                           0.94        50
   macro avg       0.94      0.94      0.94        50
weighted avg       0.94      0.94      0.94        50

--- Logistic Regression ---
Accuracy: 0.88
Precision: 0.8824476650563607
Recall: 0.88
F1 Score: 0.8798076923076923
             

In case of data greater than 1000, rmse and r2 will nearly to be 0.00

In [26]:
import pickle

with open('model_LR.pkl', 'wb') as f:
    pickle.dump(pipeline_LR, f)