# Waiter's Tip Prediction using Machine Learning
This notebook walks through an end‑to‑end workflow for predicting restaurant tips. We perform EDA, preprocessing, model training, and evaluation for several regression algorithms.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error

import warnings
warnings.filterwarnings('ignore')

## Load Dataset

In [None]:
# If tips.csv is in the same folder, this will load it.
df = pd.read_csv('tips.csv')
df.head()

## Basic Data Overview

In [None]:
df.info()

In [None]:
df.describe().T

## Exploratory Data Analysis

In [None]:
plt.figure(figsize=(12,5))
for i, col in enumerate(['total_bill', 'tip']):
    plt.subplot(1,2,i+1)
    sb.histplot(df[col], kde=True)
    plt.title(f'Distribution of {col}')
plt.tight_layout()
plt.show()

## Outlier Removal

In [None]:
df = df[(df['total_bill'] < 45) & (df['tip'] < 7)]
df.shape

## Encoding Categorical Features

In [None]:
le = LabelEncoder()
for col in df.columns:
    if df[col].dtype == object:
        df[col] = le.fit_transform(df[col])
df.head()

## Correlation Heatmap

In [None]:
plt.figure(figsize=(6,6))
sb.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.show()

## Train‑Test Split & Feature Scaling

In [None]:
X = df.drop('tip', axis=1)
y = df['tip']
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, random_state=22
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)

## Model Training & Evaluation

In [None]:
models = [
    LinearRegression(),
    XGBRegressor(eval_metric='mae'),
    RandomForestRegressor(),
    AdaBoostRegressor()
]
results = {}
for model in models:
    model.fit(X_train, y_train)
    preds = model.predict(X_val)
    mae_val = mean_absolute_error(y_val, preds)
    results[type(model).__name__] = mae_val

# Display results sorted by MAE
results = dict(sorted(results.items(), key=lambda x: x[1]))
results

## Conclusion
The **RandomForestRegressor** typically yields the lowest Mean Absolute Error on this dataset, suggesting non‑linear relationships capture tip behaviour more effectively than simple linear models. Further improvements could involve hyperparameter tuning, feature engineering (e.g., interaction terms), or exploring gradient boosting algorithms.