## Demand Classification based on forcastability of products

### Forecast accuracy strongly depends upon the product forcastability. To determine this,we apply two coefficients:

* Average Demand Interval (ADI)- it measures the demand regularity in time by computing the average inerval between two demands.
* Square of Coefficient of variation(CV^2)- it measures the variation in quantities.

Based on these 2 dimensions, we can classify demand profiles into 4 categories:

a) Smooth demand (ADI < 1.32 and CV² < 0.49)- The demand is regular in time and in quantity. It is therefore easy to forecast with a low forecasting error level. 

b) Intermittent demand (ADI >= 1.32 and CV² < 0.49)- The demand history shows very little variation in demand quantity but a high variation in the interval between two demands. Though specific forecasting methods tackle intermittent demands, the forecast error margin is higher.

c) Erratic demand (ADI < 1.32 and CV² >= 0.49)-The demand has regular occurrences in time with high quantity variations. The forecast accuracy remains shaky.

d) Lumpy demand (ADI >= 1.32 and CV² >= 0.49). The demand is characterized by a large variation both in quantity and time. It is actually impossible to produce a reliable forecast, no matter which forecasting tools we use.


In [None]:
import pandas as pd

In [None]:
df=pd.read_csv('../data_raw/train.csv')

In [None]:
df.info()

In [None]:
df = df[df["sales"] != 0]

In [None]:
df['date']=pd.to_datetime(df['date'])

In [None]:
df['dayofweek']=df['date'].dt.dayofweek

In [None]:
df['dayofweek'].value_counts()

In [None]:
df['date']=df['date'].dt.strftime('%Y-%m-%d')

In [None]:
df['date']=pd.to_datetime(df['date'])

## Coefficient of Variance Squared (CV2)

In [None]:
## Grouping retail sku's to identify datewise sales

retail_grouped= df.groupby(['family','date']).agg(total_sale=('sales','sum')).reset_index()

In [None]:
# Calulating average and standard deviation

cv_data = retail_grouped.groupby('family').agg(average=('total_sale','mean'),
                                                    sd=('total_sale','std')).reset_index()

In [None]:
## Calculating CV_squared

cv_data['cv_sqr'] = (cv_data['sd']/cv_data['average'])**2
cv_data

## Average Demand Interval (ADI) per Product

In [None]:
prod_by_date= df.groupby(['family','date']).agg(count=('family','count')).reset_index()

In [None]:
skus=prod_by_date.family.value_counts()

In [None]:
## Product sku's list

skus

In [None]:
new_df= pd.DataFrame()

In [None]:
for i in range(len(skus.index)):
    a= prod_by_date[prod_by_date['family']==skus.index[i]]
    a['previous_date']=a['date'].shift(1)
    new_df=pd.concat([new_df,a],axis=0)

In [None]:
new_df.info()

In [None]:
new_df['duration']=new_df['date']- new_df['previous_date']

In [None]:
new_df['Duration']=new_df['duration'].astype(str).str.replace('days','')

In [None]:
new_df['Duration']=pd.to_numeric(new_df['Duration'],errors='coerce')

In [None]:
## Calculating ADI

ADI = new_df.groupby('family').agg(ADI = ('Duration','mean')).reset_index()

In [None]:
ADI

In [None]:
## Cross validation

adi_cv=pd.merge(ADI,cv_data)

In [None]:
adi_cv

In [None]:
## Defining a fuction for categorization

def category(df):
    a=0
    
    if((df['ADI']<=1.34) & (df['cv_sqr']<=0.49)):
        a='Smooth'
    if((df['ADI']>=1.34) & (df['cv_sqr']>=0.49)):  
        a='Lumpy'
    if((df['ADI']<1.34) & (df['cv_sqr']>0.49)):
        a='Erratic'
    if((df['ADI']>1.34) & (df['cv_sqr']<0.49)):
        a='Intermittent'
    return a

In [None]:
## Categorizing products based on their forcastability

adi_cv['category']=adi_cv.apply(category,axis=1)

## Conclusion: Final list of sku's categorized based on their forcastability.

In [None]:
## Categorized list

adi_cv.head()

In [None]:
import seaborn as sns

In [None]:
## Visualizing the categories

sns.scatterplot(x='cv_sqr',y='ADI',hue='category',data=adi_cv)

In [None]:
## Final category counts

adi_cv.category.value_counts()

# Sales Count

In [None]:
df = pd.read_csv('../data_raw/train.csv')

In [None]:
# Calculate total sales over a rolling window (e.g., 28 days) for each product family
window_size = 28  # Change this to your desired window (e.g., 7 for weekly)

df_sorted = df.sort_values(['family', 'date'])
df_sorted['date'] = pd.to_datetime(df_sorted['date'])
df_sorted['year'] = df_sorted['date'].dt.year
df_sorted['month'] = df_sorted['date'].dt.month

# Calculate rolling sum and assign to a new column
df_sorted['rolling_sales'] = (
    df_sorted.groupby("family")['sales']
    .rolling(window_size)
    .sum()
    .shift(window_size)
    .reset_index(level=0, drop=True)
)

# Plot rolling sales for each product family
import matplotlib.pyplot as plt

families = df_sorted['family'].unique()
num_families = len(families)
ncols = 5
nrows = (num_families + ncols - 1) // ncols

fig, axes = plt.subplots(nrows, ncols, figsize=(20, 4 * nrows), sharex=True)
axes = axes.flatten()

for idx, fam in enumerate(families):
    fam_data = df_sorted[df_sorted['family'] == fam]
    axes[idx].plot(fam_data['date'], fam_data['rolling_sales'], label='Rolling Sales', color='tab:blue')
    axes[idx].set_title(f'Family: {fam}')
    axes[idx].set_ylabel('Rolling Sales')
    axes[idx].legend()
    axes[idx].tick_params(axis='x', rotation=45)

# Hide any unused subplots
for j in range(idx + 1, len(axes)):
    fig.delaxes(axes[j])

plt.tight_layout()
plt.suptitle("28-Day Rolling Sales by Product Family", fontsize=18, y=1.02)
plt.show()

In [None]:
import plotly.express as px

window_size = 28  # or 7 for weekly

df_sorted = df.sort_values(['family', 'date'])
df_sorted['date'] = pd.to_datetime(df_sorted['date'])
df_sorted['year'] = df_sorted['date'].dt.year

# Calculate rolling sum and assign to a new column
df_sorted['rolling_sales'] = (
    df_sorted.groupby("family")['sales']
    .rolling(window_size)
    .sum()
    .shift(window_size)
    .reset_index(level=0, drop=True)
)

# Aggregate rolling sales by year for each family
yearly_rolling = (
    df_sorted.groupby(['family', 'year'])['rolling_sales']
    .sum()
    .reset_index()
)

# Plot
fig = px.line(
    yearly_rolling,
    x="year",
    y="rolling_sales",
    color="family",
    title="Yearly Rolling Sales by Family"
)
fig.show()

In [None]:
import plotly.express as px

window_size = 28  # or 7 for weekly

df_sorted = df.sort_values(['family', 'date'])
df_sorted['date'] = pd.to_datetime(df_sorted['date'])
df_sorted['month'] = df_sorted['date'].dt.to_period('M')

# Calculate rolling sum and assign to a new column
df_sorted['rolling_sales'] = (
    df_sorted.groupby("family")['sales']
    .rolling(window_size)
    .sum()
    .shift(window_size)
    .reset_index(level=0, drop=True)
)

# Plot
fig = px.line(
    df_sorted,
    x="date",
    y="rolling_sales",
    color="family",
    title="Monthly Rolling Sales by Family"
)
fig.show()