# 📈 Sales Prediction with Machine Learning

Welcome to this state-of-the-art notebook on **Sales Prediction** using Python and Machine Learning. In this project, we will:

- Explore and understand the advertising dataset.
- Visualize the relationship between advertising spending and sales.
- Build a predictive model using linear regression.
- Evaluate and interpret the model's performance.

Let's get started! 🚀

In [1]:
# 📦 Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

sns.set(style="whitegrid")  # For beautiful plots

## 📑 Load the Dataset

We will use the `advertising.xls` file located in the same directory. This dataset includes information on advertising spending across various platforms and the resulting sales.

In [2]:
# 🔍 Load the dataset
df = pd.read_excel("advertising.xls")

# Quick preview
df.head()

## 🧹 Data Cleaning and Exploration

Inspect the dataset for missing values, data types, and descriptive statistics.

In [3]:
# Dataset information
df.info()

# Check for missing values
df.isnull().sum()

# Descriptive statistics
df.describe()

## 📊 Data Visualization

Visualize the relationship between advertising spends (e.g. TV, Radio, Newspaper) and sales.

In [4]:
# Pairplot to visualize relationships
sns.pairplot(df, x_vars=['TV', 'Radio', 'Newspaper'], y_vars='Sales', height=5, aspect=0.8, kind='scatter')
plt.suptitle("Advertising Spend vs Sales", fontsize=16)
plt.show()

# Correlation heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.title("Feature Correlations")
plt.show()

## 🛠️ Feature Selection

Select relevant features and target variable.

In [5]:
# Select features and target
X = df[['TV', 'Radio', 'Newspaper']]
y = df['Sales']

## 🔥 Model Building

We'll use a **Linear Regression** model to predict sales based on advertising spending.

In [6]:
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the model
lr_model = LinearRegression()

# Train the model
lr_model.fit(X_train, y_train)

## 📈 Model Evaluation

Let's evaluate the model using key metrics like MAE, RMSE, and R².

In [7]:
# Predict on test set
y_pred = lr_model.predict(X_test)

# Evaluation metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
print(f"R² Score: {r2:.2f}")

## 📝 Conclusion

In this project, we built a **sales prediction model** using linear regression. We:

- Explored and visualized the dataset.
- Built and trained a linear regression model.
- Evaluated the model using key performance metrics.

This workflow can be extended to include:
- Polynomial regression or advanced models (e.g. Random Forest, XGBoost).
- Feature engineering (e.g. interaction terms).
- Deployment as a web application for real-time sales predictions.

Happy Learning! 🚀