## 📘 Feature Description – Monthly Sales Dataset

This dataset simulates the monthly sales performance of a **single product** across different time periods or markets. It includes five features that influence the number of units sold (`monthly_sales`), which is the target variable.

### 🔍 Features:

- **`ad_spend`**  
  Monthly advertising budget (in $1000s). Higher ad spend typically boosts sales.

- **`product_price`**  
  Selling price of one unit of the product (in $) in 300 months. It can vary due to promotions, seasonality, or market testing and discounts. The company is constantly adjusting the price of their same product for various business reasons.

- **`market_trend_index`**  
  Index reflecting overall market demand (scaled from 40 to 100). A higher value means stronger demand of customer.

- **`seasonality_index`**  
  Seasonal multiplier (range: 0.5 to 1.5). Shows how seasonal effects boost or reduce sales. 1.0 is normal, above 1.0 is a seasonal boost, below 1.0 is a seasonal dip.
  eg ==> Like how ice cream sells more in summer or coats sell more in winter

- **`social_media_mentions`**  
  Total number of times the product was mentioned on social platforms in that month. More mentions often indicate higher customer awareness or virality.

### 🎯 Target:

- **`monthly_sales`**  
  The number of units sold in a given month. This is the value you're predicting in a regression model.


In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [5]:
sales_data = pd.read_csv('./monthly_sales_dataset.csv')

sales_data.head()

Unnamed: 0,ad_spend,product_price,market_trend_index,seasonality_index,social_media_mentions,monthly_sales
0,19.77,63.91,64.43,1.06,3358,1573.46
1,24.53,20.34,43.96,0.86,2917,1626.73
2,8.99,18.45,60.93,1.16,8736,1850.34
3,14.93,91.87,46.66,0.74,3741,1046.01
4,31.92,70.23,88.49,0.69,4558,1862.77


In [3]:
sales_data['market_trend_index'].min()

40.01

In [4]:
sales_data['market_trend_index'].max()

99.69

In [8]:
sales_data.isna().sum()

ad_spend                 0
product_price            0
market_trend_index       0
seasonality_index        0
social_media_mentions    0
monthly_sales            0
dtype: int64

In [None]:
X = sales_data.drop(columns=['monthly_sales'])
y = sales_data['monthly_sales']


(300,)

In [15]:
# Standardization: (x - mean) / std
X_scaled = (X - X.mean()) / X.std()
X_scaled.describe()

Unnamed: 0,ad_spend,product_price,market_trend_index,seasonality_index,social_media_mentions
count,300.0,300.0,300.0,300.0,300.0
mean,3.611926e-16,1.776357e-16,3.671137e-16,3.13823e-16,-1.006602e-16
std,1.0,1.0,1.0,1.0,1.0
min,-1.696445,-1.746115,-1.651523,-1.754022,-1.651693
25%,-0.8649714,-0.7628258,-0.8204991,-0.8158564,-0.7473298
50%,0.0270494,-0.02852006,-0.04591619,0.05281524,-0.08050441
75%,0.9097407,0.7997917,0.9029022,0.8172463,0.8171046
max,1.682762,1.769456,1.700089,1.651171,1.82015


In [16]:
# Min-Max Scaling: (x - min) / (max - min)
X_scaled_minmax = (X - X.min()) / (X.max() - X.min())
X_scaled_minmax.describe()

Unnamed: 0,ad_spend,product_price,market_trend_index,seasonality_index,social_media_mentions
count,300.0,300.0,300.0,300.0,300.0
mean,0.502025,0.49668,0.492755,0.515102,0.47574
std,0.295927,0.284449,0.298364,0.293669,0.288031
min,0.0,0.0,0.0,0.0,0.0
25%,0.246056,0.279696,0.247947,0.27551,0.260485
50%,0.510029,0.488568,0.479055,0.530612,0.452552
75%,0.771242,0.72418,0.762148,0.755102,0.711091
max,1.0,1.0,1.0,1.0,1.0


In [17]:
X.columns

Index(['ad_spend', 'product_price', 'market_trend_index', 'seasonality_index',
       'social_media_mentions'],
      dtype='object')

In [26]:
# Standardization: (x - mean) / std
X_scaled = (X - X.mean()) / X.std()


In [35]:
# y = b0 + w1*x1 + w2*x2 + w3 * x3 + w4 * x4 + w5 * x5  --> prediction equation
# we get the prediction by the above equation

# initializee parameters (weights)
b0 = 0
w1 = 0
w2 = 0
w3 = 0
w4 = 0
w5 = 0
learning_rate = 0.00001
epochs = 1000


# optimizing the fuckinggg parameters using GRADIENT DESCENT
for epoch in range(epochs):
    # calculate predictions
    y_pred = (
        b0
        + w1 * X_scaled['ad_spend']
        + w2 * X_scaled['product_price']
        + w3 * X_scaled['market_trend_index']
        + w4 * X_scaled['seasonality_index']
        + w5 * X_scaled['social_media_mentions']
    )

    errors = y - y_pred

    # derivatives of loss function with respect to each parameter --> Gradients
    db0 = -2 * np.sum(errors)
    dw1 = -2 * np.sum(errors * X_scaled['ad_spend'])
    dw2 = -2 * np.sum(errors * X_scaled['product_price'])
    dw3 = -2 * np.sum(errors * X_scaled['market_trend_index'])
    dw4 = -2 * np.sum(errors * X_scaled['seasonality_index'])
    dw5 = -2 * np.sum(errors * X_scaled['social_media_mentions'])

    #calculating the fucking step size from previous points tells how much to move 
    # hou much to go down the slope
    step_size_intercept = learning_rate * db0
    step_size_w1 = learning_rate * dw1
    step_size_w2 = learning_rate * dw2
    step_size_w3 = learning_rate * dw3
    step_size_w4 = learning_rate * dw4
    step_size_w5 = learning_rate * dw5

    #updasingg the parametes
    b0 = b0 - step_size_intercept
    w1 = w1 - step_size_w1
    w2 = w2 - step_size_w2
    w3 = w3 - step_size_w3
    w4 = w4 - step_size_w4
    w5 = w5 - step_size_w5

    #checking parameters every 100 epochs
    if epoch % 100 == 0:  
        rss = np.mean(errors ** 2) # getting average of rss
        print(f'Epoch {epoch}: b0={b0:.3f}, w1={w1:.3f}, w2={w2:.3f}, w3={w3:.3f}, w4={w4:.3f}, w5={w5:.3f}, RSS={rss:.3f}')

print("\n🏁 Training completed!")
print(f"Final parameters:")
print(f"RSS= {rss:.4f}")
print(f"b0 (intercept)       = {b0:.4f}")
print(f"w1 (ad_spend)       = {w1:.4f}")
print(f"w2 (product_price)  = {w2:.4f}")
print(f"w3 (market_trend)   = {w3:.4f}")
print(f"w4 (seasonality)    = {w4:.4f}")
print(f"w5 (social_media)   = {w5:.4f}")

Epoch 0: b0=11.302, w1=2.259, w2=-0.837, w3=-0.053, w4=0.168, w5=1.537, RSS=3790223.030
Epoch 100: b0=857.972, w1=172.904, w2=-62.203, w3=1.632, w4=16.058, w5=119.380, RSS=1144184.190
Epoch 200: b0=1321.795, w1=268.509, w2=-94.095, w3=9.445, w4=28.796, w5=187.341, RSS=347080.079
Epoch 300: b0=1575.887, w1=322.449, w2=-110.491, w3=17.629, w4=38.087, w5=226.604, RSS=106595.171
Epoch 400: b0=1715.084, w1=353.092, w2=-118.807, w3=24.339, w4=44.513, w5=249.328, RSS=33917.345
Epoch 500: b0=1791.339, w1=370.613, w2=-122.949, w3=29.292, w4=48.811, w5=262.504, RSS=11910.083
Epoch 600: b0=1833.113, w1=380.692, w2=-124.964, w3=32.742, w4=51.620, w5=270.157, RSS=5231.071
Epoch 700: b0=1855.997, w1=386.525, w2=-125.912, w3=35.060, w4=53.428, w5=274.611, RSS=3198.722
Epoch 800: b0=1868.534, w1=389.917, w2=-126.335, w3=36.579, w4=54.576, w5=277.206, RSS=2578.411
Epoch 900: b0=1875.402, w1=391.900, w2=-126.508, w3=37.557, w4=55.300, w5=278.722, RSS=2388.409

🏁 Training completed!
Final parameters:
RSS

In [37]:
# training on unseen data points 

import pandas as pd

'''
40.76,69.27,63.55,0.87,2835,1977.13
6.56,78.95,85.26,1.42,9967,1759.88
31.23,86.13,95.11,1.08,8590,2292.98
49.79,65.22,97.06,1.04,1056,2142.89
43.51,17.97,74.63,0.77,9406,3016.54
28.47,53.89,61.43,0.87,6723,2144.47
7.86,16.99,87.25,1.4,1574,1181.81
42.41,46.68,55.06,1.17,4560,2277.59
'''

X_test_raw = pd.DataFrame([
    [40.76,69.27,63.55,0.87,2835],
    [6.56,78.95,85.26,1.42,9967],
    [31.23,86.13,95.11,1.08,8590],
    [49.79,65.22,97.06,1.04,1056],
    [43.51,17.97,74.63,0.77,9406],
    [7.86,16.99,87.25,1.4,1574],
    [28.47,53.89,61.43,0.87,6723],
    [42.41,46.68,55.06,1.17,4560]
], columns=X.columns)  # use same feature order as training set

# 2. Use same scaling as training set
X_test_scaled = (X_test_raw - X.mean()) / X.std()

# 3. used the best weight where loss function is minimized
y_test_pred = (
    1879.1366
    + 393.0553 * X_test_scaled['ad_spend']
    + -126.5663 * X_test_scaled['product_price']
    + 39.2070 * X_test_scaled['market_trend_index']
    + 55.7494 * X_test_scaled['seasonality_index']
    + 279.6020 * X_test_scaled['social_media_mentions']
)

print("🔮 Predicted Monthly Sales:")
print(y_test_pred)


🔮 Predicted Monthly Sales:
0    1960.876332
1    1743.079073
2    2266.680973
3    2183.477042
4    2947.445879
5    1266.606196
6    2046.059676
7    2331.024031
dtype: float64
