# MINOR PROJECT-
In the classes before the minor project , we learnt many topics like logistics regression , linear regression,pipelines,decision trees,random forest,etc.

So the main aim of this minor project is to integrate all of these different tools into one specific project.

In the class as a refrence we were taught **END TO END MACHINE LEARNING** for this minor project





# EXPLANATION OF ML MODEL-
We can understand the concept of ML model by taking an example of identifying whether its day or night

Dataset- images of day and night

Preprocessing- noticing patterns such as bright,dark,sun,moon etc

Model- distinguishing

Training- identifying

Testing- given 100 images the model should understand whether its day or night

Hyperparameters- cloudy,foggy,snowy etc

Parameters- what the model learns from the data

# DATASETS IN MY MODEL-
Wine Datset

Diabeties Datset

California housing data set


In [9]:


# STEP 1: IMPORT THE REQUIRED LIBRARIES


# Train-test split
from sklearn.model_selection import train_test_split

# Preprocessing
from sklearn.preprocessing import StandardScaler

# Pipeline
from sklearn.pipeline import Pipeline

# Classification & Regression Models
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

# Load built-in sklearn datasets
from sklearn.datasets import load_wine, load_diabetes, fetch_california_housing

# For handling data (if needed)
import numpy as np
import pandas as pd


In [10]:

# STEP 2: CREATE A LIST OF DATASETS

from sklearn.datasets import load_wine, load_diabetes, fetch_california_housing

datasets = {
    "Wine Dataset"          : load_wine(),                   # Classification (3 classes of wine)
    "Diabetes Dataset"      : load_diabetes(),               # Regression (disease progression)
    "California Housing"    : fetch_california_housing()     # Regression (house prices)
}

# All sklearn datasets follow the same structure:
#   dataset.data   -> features (X)
#   dataset.target -> labels / output (y)
#
# This allows us to loop through each dataset and apply
# our machine learning pipeline easily.



In [11]:

# STEP 3: DEFINE MODELS TO TEST


# We create another dictionary called "models".
# KEY   = model name (string)
# VALUE = the actual ML model wrapped inside a Pipeline.

models = {

    #  LOGISTIC REGRESSION PIPELINE  (For Classification)

    "Logistic Regression" : Pipeline([
        ('scaler', StandardScaler()),          # Step 1: Standardize features
        ('clf', LogisticRegression(max_iter=500))
    ]),



    #  DECISION TREE PIPELINE (Classifier + Regressor handle below)

    "Decision Tree (Classifier)" : Pipeline([
        ('scaler', StandardScaler()),          # Scaling included for consistency
        ('clf', DecisionTreeClassifier())      # Step 2: Tree classifier
    ]),

    "Decision Tree (Regressor)" : Pipeline([
        ('scaler', StandardScaler()),
        ('reg', DecisionTreeRegressor())       # Step 2: Tree regressor
    ]),



    # RANDOM FOREST PIPELINE (Classifier + Regressor)

    "Random Forest (Classifier)" : Pipeline([
        ('scaler', StandardScaler()),
        ('clf', RandomForestClassifier())
    ]),

    "Random Forest (Regressor)" : Pipeline([
        ('scaler', StandardScaler()),
        ('reg', RandomForestRegressor())
    ]),



    #  LINEAR REGRESSION PIPELINE (For Regression datasets)

    "Linear Regression" : Pipeline([
        ('scaler', StandardScaler()),
        ('reg', LinearRegression())
    ])
}




In [17]:

#  STEP 4: TEST ALL MODELS ON ALL DATASETS


for dataset_name, dataset in datasets.items():

    print("\n===================================================")
    print(" DATASET:", dataset_name)
    print("===================================================\n")


    # EXTRACT FEATURES (X) AND LABELS (y)

    X = dataset.data
    y = dataset.target

    # DETERMINE TASK TYPE (manually set)

    if dataset_name == "Wine Dataset":
        current_dataset_task_type = "classification"
    elif dataset_name in ["Diabetes Dataset", "California Housing"]:
        current_dataset_task_type = "regression"
    else:
        print("Unknown dataset type. Skipping...")
        continue


    # TRAIN‚ÄìTEST SPLIT

    if current_dataset_task_type == "classification":
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42, stratify=y
        )
    else:
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42
        )


    # VARIABLES TO TRACK BEST MODEL

    best_model_name = None
    best_score = -np.inf

    # LOOP THROUGH EACH MODEL

    for model_name, model in models.items():

        # detect model type
        if 'clf' in model.named_steps:
            model_task_type = "classification"
        elif 'reg' in model.named_steps:
            model_task_type = "regression"
        else:
            print(f"Cannot determine model type for {model_name}. Skipping.")
            continue

        # skip mismatched models
        if current_dataset_task_type != model_task_type:
            print(f"   ‚ö†Ô∏è Skipping {model_name} for {dataset_name} "
                  f"(task mismatch: dataset expects {current_dataset_task_type}, model is {model_task_type})")
            continue

        # train model
        print(f"üîπ Training Model ({model_task_type}): {model_name}")
        model.fit(X_train, y_train)

        # score model
        score = model.score(X_test, y_test)
        print(f"   ‚û§ Score = {score:.3f}")

        # update best model
        if score > best_score:
            best_score = score
            best_model_name = model_name


    # PRINT BEST MODEL

    print("\n‚≠ê‚≠ê BEST MODEL FOR", dataset_name, "‚≠ê‚≠ê")
    print(f"‚û°Ô∏è {best_model_name} with score {best_score:.3f}")



 DATASET: Wine Dataset

üîπ Training Model (classification): Logistic Regression
   ‚û§ Score = 0.972
üîπ Training Model (classification): Decision Tree (Classifier)
   ‚û§ Score = 0.944
   ‚ö†Ô∏è Skipping Decision Tree (Regressor) for Wine Dataset (task mismatch: dataset expects classification, model is regression)
üîπ Training Model (classification): Random Forest (Classifier)
   ‚û§ Score = 1.000
   ‚ö†Ô∏è Skipping Random Forest (Regressor) for Wine Dataset (task mismatch: dataset expects classification, model is regression)
   ‚ö†Ô∏è Skipping Linear Regression for Wine Dataset (task mismatch: dataset expects classification, model is regression)

‚≠ê‚≠ê BEST MODEL FOR Wine Dataset ‚≠ê‚≠ê
‚û°Ô∏è Random Forest (Classifier) with score 1.000

 DATASET: Diabetes Dataset

   ‚ö†Ô∏è Skipping Logistic Regression for Diabetes Dataset (task mismatch: dataset expects regression, model is classification)
   ‚ö†Ô∏è Skipping Decision Tree (Classifier) for Diabetes Dataset (task mismatch: data