Problem Statement

For the dataset hsales, you need to forecast the values of year 1996 using seasonal indexing method.
1. Time Series : A series of numbers recorded over time (monthly, yearly, etc.)
Example: Monthly home sales.

2. Seasonality: A repeating pattern that happens at the same time each year.
Example: Sales may always rise during summer.

3. Seasonal Index: A number that tells how a particular month compares to the overall average.
If index = 1.20 → Month is 20% above normal
If index = 0.85 → Month is 15% below normal

We calculate 12 indexes for monthly data.

4. Deseasonalization: Removing seasonality effect.
Deseasonalized Value = Actual Value / Seasonal Index

5. Trend : Overall long-term direction of the data (upward or downward).

6. Forecasting Using Seasonal Indexing
Steps:
Compute monthly averages,
Compute overall average,
Compute seasonal index,
Remove seasonality,
Fit a trend line to deseasonalized data,
Predict 1996 trend values,
Apply seasonal index back

Forecast=Trend Value×Seasonal Index

2. Goal
Forecast 12 months of 1996 home sales for hsales dataset using Seasonal Indexing.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import pandas as pd
import numpy as np

### Load data

In [3]:
df = pd.read_csv('/content/drive/MyDrive/Forecasting_Seasonal_Indexing/hsales.csv')
df

Unnamed: 0,date,hsales
0,1/1/1973,55
1,2/1/1973,60
2,3/1/1973,68
3,4/1/1973,63
4,5/1/1973,65
...,...,...
270,7/1/1995,64
271,8/1/1995,63
272,9/1/1995,55
273,10/1/1995,54


### Ensure correct format

In [4]:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

### Extract year and month

In [5]:
df['Year'] = df.index.year
df['Month'] = df.index.month

### STEP 1: Calculate monthly average sales across all years

In [15]:
monthly_avg = df.groupby('Month')['hsales'].mean()
monthly_avg

Unnamed: 0_level_0,hsales
Month,Unnamed: 1_level_1
1,45.347826
2,51.130435
3,61.391304
4,60.26087
5,59.608696
6,57.0
7,53.782609
8,55.130435
9,50.826087
10,49.73913


### STEP 2: Calculate overall average

In [17]:
overall_avg = df['hsales'].mean()
overall_avg

np.float64(52.28727272727273)

### STEP 3: Calculate seasonal index

In [18]:
seasonal_index = monthly_avg / overall_avg
seasonal_index

Unnamed: 0_level_0,hsales
Month,Unnamed: 1_level_1
1,0.867282
2,0.977875
3,1.174116
4,1.152496
5,1.140023
6,1.090131
7,1.028598
8,1.054376
9,0.972055
10,0.951266


### STEP 4: Deseasonalize data

In [9]:
df['Deseasonalized'] = df['hsales'] / df['Month'].map(seasonal_index)

In [19]:
df['Deseasonalized']

Unnamed: 0_level_0,Deseasonalized
date,Unnamed: 1_level_1
1973-01-01,63.416491
1973-02-01,61.357514
1973-03-01,57.915931
1973-04-01,54.663967
1973-05-01,57.016391
...,...
1995-07-01,62.220586
1995-08-01,59.750992
1995-09-01,56.581180
1995-10-01,56.766427


### STEP 5: Fit Trend Line
### Create time index (1, 2, 3,...)

In [10]:
df['Time'] = np.arange(1, len(df) + 1)

### Fit linear trend model on deseasonalized data

In [20]:
coeff = np.polyfit(df['Time'], df['Deseasonalized'], 1)
trend_line = np.poly1d(coeff)
trend_line

poly1d([2.51895860e-03, 5.19396564e+01])

### STEP 6: Predict trend values for 1996

In [12]:
last_time = df['Time'].max()
future_time = np.arange(last_time + 1, last_time + 13)
trend_pred_1996 = trend_line(future_time)

### STEP 7: Apply seasonal index to get final forecasts

In [21]:
forecast_1996 = trend_pred_1996 * seasonal_index.values
forecast_1996

array([45.64930754, 51.47282341, 61.80536116, 60.67020519, 60.01647291,
       57.39267742, 54.15571226, 55.51554448, 51.18357751, 50.09137197,
       43.43795282, 39.82931695])

### Create forecast table

In [14]:
forecast_df = pd.DataFrame({
    'Month': range(1, 13),
    'TrendValue': trend_pred_1996,
    'SeasonalIndex': seasonal_index.values,
    'ForecastedSales_1996': forecast_1996
})

forecast_df

Unnamed: 0,Month,TrendValue,SeasonalIndex,ForecastedSales_1996
0,1,52.634889,0.867282,45.649308
1,2,52.637408,0.977875,51.472823
2,3,52.639927,1.174116,61.805361
3,4,52.642446,1.152496,60.670205
4,5,52.644965,1.140023,60.016473
5,6,52.647484,1.090131,57.392677
6,7,52.650003,1.028598,54.155712
7,8,52.652522,1.054376,55.515544
8,9,52.655041,0.972055,51.183578
9,10,52.65756,0.951266,50.091372


Final output :
“We believe passenger numbers follow a multiplicative model: level × seasonal × noise. We first estimate the smooth trend by taking a 12-month centered moving average. Then for each month we compute the ratio observed/trend — this ratio is our raw seasonal estimate. Averaging these ratios by month across years gives the seasonal index for each month; we normalize these so their average is 1. Next we remove seasonality (divide by seasonal index) and fit a trend model (linear or log-linear) to the deseasonalized series. To forecast 1960 we extend the trend model to the 12 months of 1960 and then reapply the monthly seasonal indices (multiply) to get final monthly forecasts. This method is simple, interpretable, and works well when seasonal fluctuations scale with level.”