<a href="https://colab.research.google.com/github/SatishAwal/Machine-Learning-Foundations-Midterm-1/blob/main/FinalProjectML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Online Food Order Prediction Model
## Problem Definition
I'll design the basic prediction model to determine whether a customer is likely to order food online based on their demographic characteristics. The target variable is "Output"(Yes/No), making this a binary classification problem.

##Dataset Selection
The dataset contains 388 records with 13 features including:

* Demographic: Age, Gender, Marital Status, Occupation, Monthly Income, Education, Family Size

* Geographic: Latitude, Longitude, Pin Code

* Behavioral: Feedback (Positive/Negative)

* Target: Output (Yes/No for online food ordering)


## Data Preprocessing

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load data
df = pd.read_csv("/content/onlinefoods.csv")

print(f"Number of records and features are: {df.shape}")

# Data cleaning
df.dropna(inplace=True)  # Remove any missing values
df.drop_duplicates(inplace=True)  # Remove duplicates

# Strip whitespace from 'Feedback' column
df['Feedback'] = df['Feedback'].str.strip()
# Strip "Rs." from 'Monthly Income'
df['Monthly Income'] = df['Monthly Income'].str.replace('Rs.', '', regex=False).str.strip()



# Encode categorical variables
label_encoders = {}
categorical_cols = ['Gender', 'Marital Status', 'Occupation', 'Monthly Income',
                   'Educational Qualifications', 'Feedback']
for col in categorical_cols:
    le = LabelEncoder()
    # Fit on the entire column to include all possible labels
    le.fit(df[col])
    df[col] = le.transform(df[col])
    label_encoders[col] = le

# Encode target variable
df['Output'] = df['Output'].map({'Yes': 1, 'No': 0})

# Feature selection
X = df.drop(['Output', 'latitude', 'longitude', 'Pin code', 'Unnamed: 12'], axis=1)
y = df['Output']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scale numerical features
scaler = StandardScaler()
X_train[['Age', 'Family size']] = scaler.fit_transform(X_train[['Age', 'Family size']])
X_test[['Age', 'Family size']] = scaler.transform(X_test[['Age', 'Family size']])

Number of records and features are: (388, 13)


## Model Implementation

I'll have implemented two models for comparison:
* Logistic Regression
* Decision Tree Classifier

In [None]:
# Model 1: Logistic Regression
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train, y_train)
y_pred_log = log_reg.predict(X_test)

# Model 2: Decision Tree
dtree = DecisionTreeClassifier(max_depth=5, random_state=42)
dtree.fit(X_train, y_train)
y_pred_dt = dtree.predict(X_test)

## Evaluation

In [None]:
# Evaluation metrics
print("Logistic Regression Performance:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_log):.2f}")
print(classification_report(y_test, y_pred_log))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_log))

print("\nDecision Tree Performance:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_dt):.2f}")
print(classification_report(y_test, y_pred_dt))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_dt))

Logistic Regression Performance:
Accuracy: 0.80
              precision    recall  f1-score   support

           0       0.67      0.38      0.48        21
           1       0.82      0.94      0.88        65

    accuracy                           0.80        86
   macro avg       0.75      0.66      0.68        86
weighted avg       0.79      0.80      0.78        86

Confusion Matrix:
[[ 8 13]
 [ 4 61]]

Decision Tree Performance:
Accuracy: 0.79
              precision    recall  f1-score   support

           0       0.60      0.43      0.50        21
           1       0.83      0.91      0.87        65

    accuracy                           0.79        86
   macro avg       0.72      0.67      0.68        86
weighted avg       0.77      0.79      0.78        86

Confusion Matrix:
[[ 9 12]
 [ 6 59]]


## Comparison and Justification
Performance: The Logistic Regression  slightly outperforms  Decision Tree with 80% vs 79% accuracy.

Interpretability:

*   Logistic Regression provides coefficients showing feature importance

*   Decision Tree offers visual rules for classification

Feature Importance (from Decision Tree):

*   Monthly Income is the most important predictor
*   Occupation and Education level are also significant
*   Geographic features (latitude/longitude) were less important










## Sample Input to check the Model

In [None]:
def predict_online_food_use(user_input, label_encoders, scaler, log_reg, dtree):
    import pandas as pd

    # Convert input dictionary to DataFrame
    sample = pd.DataFrame([user_input])

    # Encode categorical variables
    for col in ['Gender', 'Marital Status', 'Occupation', 'Monthly Income',
                'Educational Qualifications', 'Feedback']:
        sample[col] = label_encoders[col].transform(sample[col])

    # Scale numerical features
    sample[['Age', 'Family size']] = scaler.transform(sample[['Age', 'Family size']])

    # Predictions
    log_pred = log_reg.predict(sample)[0]
    tree_pred = dtree.predict(sample)[0]

    print("Predictions for Sample User:")
    print(f"Logistic Regression: {'Yes' if log_pred == 1 else 'No'}")
    print(f"Decision Tree: {'Yes' if tree_pred == 1 else 'No'}")


In [None]:
# Define a sample user
user_1= {
    'Age': 55,
    'Gender': 'Male',
    'Marital Status': 'Married',
    'Occupation': 'Self Employeed',
    'Monthly Income': 'Below 10000',
    'Educational Qualifications': 'Graduate',
    'Family size': 6,
    'Feedback': 'Negative'
}

user_2={
    'Age': 25,
    'Gender': 'Male',
    'Marital Status': 'Single',
    'Occupation': 'Student',
    'Monthly Income': 'Below 10000',
    'Educational Qualifications': 'Graduate',
    'Family size': 3,
    'Feedback': 'Positive'
}
# Call the prediction function
predict_online_food_use(user_1, label_encoders, scaler, log_reg, dtree
)
print("\n")
predict_online_food_use(user_2, label_encoders, scaler, log_reg, dtree
)

Predictions for Sample User:
Logistic Regression: No
Decision Tree: No


Predictions for Sample User:
Logistic Regression: Yes
Decision Tree: Yes
