# Sales Prediction Based on Campaigning Expenses

This notebook loads `advertising.csv`, performs basic EDA, trains a Linear Regression (and a Random Forest for comparison), evaluates model performance, and saves the trained linear model coefficients to `campaign-sales-bot/public/model-coefs.json` so the frontend can load them directly.

Notes:
- The dataset columns are `TV`, `Radio`, `Newspaper`, and `Sales`. In this dataset the ad spend columns are expressed in the same numeric units as the historical data (commonly thousands). The frontend accepts dollar amounts; if you use dollar inputs in the frontend they should be converted to the dataset units (e.g. divide dollars by 1000). The project already includes automatic conversion in `src/lib/model.ts`.

In [None]:
# Imports
import pandas as pd  # type: ignore
import numpy as np  # type: ignore
import matplotlib.pyplot as plt  # type: ignore
import seaborn as sns  # type: ignore
from sklearn.model_selection import train_test_split  # type: ignore
from sklearn.linear_model import LinearRegression  # type: ignore
from sklearn.ensemble import RandomForestRegressor  # type: ignore
from sklearn.metrics import r2_score, mean_squared_error  # type: ignore
import json, os

sns.set(style="darkgrid")

In [None]:
# Load the dataset
df = pd.read_csv('advertising.csv')
df.head()

In [None]:
# Basic info and missing values check
print('Shape:', df.shape)
print('Info:')
print(df.info())
print('Missing values:', df.isnull().sum())
print('Summary statistics:')
print(df.describe())

In [None]:
# Correlation heatmap
plt.figure(figsize=(8,6))
sns.heatmap(df.corr(), annot=True, cmap='viridis', fmt='.3f')
plt.title('Feature correlation')
plt.show()

In [None]:
# Prepare features and target
X = df[['TV','Radio','Newspaper']].astype(float)
y = df['Sales'].astype(float)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f'Train samples: {X_train.shape[0]}, Test samples: {X_test.shape[0]}')

In [None]:
# Train Linear Regression
lr = LinearRegression()
lr.fit(X_train, y_train)
print('Linear Regression trained')
print('Intercept:', lr.intercept_)
print('Coefficients:', lr.coef_)

In [None]:
# Train a Random Forest for comparison
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
print('Random Forest trained')

In [None]:
# Evaluate both models on the test set
lr_preds = lr.predict(X_test)
rf_preds = rf.predict(X_test)

def metrics(y_true, y_pred):
    return {'r2': r2_score(y_true, y_pred), 'rmse': np.sqrt(mean_squared_error(y_true, y_pred))}

print('Linear Regression metrics:', metrics(y_test, lr_preds))
print('Random Forest metrics:', metrics(y_test, rf_preds))

In [None]:
# Save Linear Regression coefficients for the frontend (public/model-coefs.json)
intercept = float(lr.intercept_)
coefs = [float(c) for c in lr.coef_]  # TV, Radio, Newspaper
sumTV = X['TV'].sum()
sumRadio = X['Radio'].sum()
sumNews = X['Newspaper'].sum()
total = sumTV + sumRadio + sumNews
channel_shares = [float(sumTV/total), float(sumRadio/total), float(sumNews/total)]
out = {
  'intercept': intercept,
  'betas': coefs,
  'channelShares': channel_shares
}
out_path = os.path.join('campaign-sales-bot', 'public', 'model-coefs.json')
os.makedirs(os.path.dirname(out_path), exist_ok=True)
with open(out_path, 'w') as f:
    json.dump(out, f, indent=2)

print('Saved model coefficients to', out_path)
print(out)

In [None]:
# Quick example predictions
def predict_from_total(total_dollars, model_coefs=out, dollars=True):
    # convert dollars to dataset units (thousands) if needed
    value = total_dollars / 1000.0 if dollars and total_dollars > 1000 else total_dollars
    shares = model_coefs['channelShares']
    tv = value * shares[0]
    radio = value * shares[1]
    news = value * shares[2]
    pred = model_coefs['intercept'] + model_coefs['betas'][0]*tv + model_coefs['betas'][1]*radio + model_coefs['betas'][2]*news
    return pred

examples = [5000, 50000]
for e in examples:
    print('Total dollars:', e, '-> predicted sales:', round(predict_from_total(e),2))

## Notes & Next steps

- The Linear Regression model gives a simple, explainable mapping from ad spends (TV/Radio/Newspaper) to Sales. The Random Forest can capture nonlinear interactions and may perform better, but is less interpretable.
- The frontend now loads `model-coefs.json` to make predictions client-side. Ensure the frontend converts dollar inputs to the dataset units (division by 1000) â€” the repo `src/lib/model.ts` already performs this conversion by default.
- You can retrain the model by re-running the training cell and re-saving the coefficients. If you want the frontend to use the Random Forest instead, serialize the RF model (joblib/pickle) and serve it from an API or use a lightweight JS model serializer (more work).