# 📊 Sales Prediction using Advertising Data

## 🧩 Problem Statement
The goal of this project is to predict **product sales** based on advertising budgets across different platforms.
We’ll use a machine learning regression model to estimate sales based on how much is spent on **TV, Radio, and Newspaper ads**.
This type of model is useful for businesses to optimize advertising strategies and maximize ROI.

In [None]:
# 📥 Step 1: Import Libraries and Load Data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('Advertising.csv')
df.head()

In [None]:
# 🧹 Step 2: Explore and Clean Data
print(df.info())
print("\nMissing values:\n", df.isnull().sum())
print("\nSummary Statistics:\n", df.describe())

In [None]:
# 📊 Step 3: Visualize Relationships
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Matrix")
plt.show()

sns.pairplot(df, x_vars=['TV', 'Radio', 'Newspaper'], y_vars='Sales', height=4, aspect=1, kind='scatter')
plt.suptitle("Ad Spend vs Sales", y=1.02)
plt.show()

In [None]:
# ✂️ Step 4: Prepare Data for Machine Learning
from sklearn.model_selection import train_test_split

X = df[['TV', 'Radio', 'Newspaper']]
y = df['Sales']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [None]:
# 📈 Step 5: Train a Linear Regression Model
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

In [None]:
# 📊 Step 6: Evaluate the Model
print("📉 Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("📈 R² Score (Accuracy):", r2_score(y_test, y_pred))

plt.scatter(y_test, y_pred, alpha=0.7)
plt.xlabel("Actual Sales")
plt.ylabel("Predicted Sales")
plt.title("Actual vs Predicted Sales")
plt.show()

In [None]:
# 📦 Step 7: Predict on New Data
new_data = pd.DataFrame([[150, 20, 10]], columns=['TV', 'Radio', 'Newspaper'])
prediction = model.predict(new_data)
print("📈 Predicted Sales:", round(prediction[0], 2))

## ✅ Conclusion

- The model successfully predicts sales based on advertising spending.
- TV and Radio advertising have stronger correlations with sales than Newspaper.
- The model can be used by businesses to forecast sales and optimize ad budgets.

### 🚀 Next Steps
- Try Polynomial or Ridge Regression for improvement.
- Perform feature engineering (e.g., interaction terms).
- Deploy the model in a Streamlit or Flask app.