![](https://cdn.cheapism.com/images/Going_Out_of_Business_Sales.31c7340b.fill-1440x754.png)

### **This notebook will analyze the data and run Prophet for Time Series Forecasting**

### **Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.**
### **It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.**

### **Prophet is open source software released by Facebookâ€™s Core Data Science team**

### **Source Link :** [Prophet](https://facebook.github.io/prophet/)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import statsmodels.api as sm
import gc

pio.renderers.default = "notebook"

import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [None]:
train = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/train.csv")
test = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/test.csv")
stores = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/stores.csv")
transactions = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/transactions.csv").sort_values(["store_nbr", "date"])
oil = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/oil.csv")
holidays = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/holidays_events.csv")

### **The notebook works with 6 different datasets.**

### **Train and test datasets are for modelling.**

### **Stores dataset, Transactions dataset, Oil dataset and Holiday dataset are the other datasets that will be used to analyze and model**

# DATA VISUALIZATION AND EDA

In [None]:
train['date'] = pd.to_datetime(train['date'])
daily_sales = train.groupby('date')['sales'].sum().reset_index()
   
fig = px.line(daily_sales, x='date', y='sales', title='Daily Sales Over Time')
fig.show(renderer='iframe')

### **In the sales over time plot above, daily sales seem to have a slightly increasing trend by date.**

## STORE ANALYSIS

In [None]:
store_type_sales = train.merge(stores, on='store_nbr')
store_type_performance = store_type_sales.groupby('type')['sales'].agg(['mean', 'std', 'count'])

In [None]:
store_type_analysis = store_type_sales.groupby('type').agg({
    'sales': ['mean', 'std', 'count', 'sum'],
    'store_nbr': 'nunique'}).round(2)

store_type_analysis.columns = ['avg_sales', 'std_sales', 'transaction_count', 'total_sales', 'store_count']
store_type_analysis = store_type_analysis.reset_index()

fig1 = go.Figure()

fig1.add_trace(go.Bar(name='Average Sales', x=store_type_analysis['type'],
    y=store_type_analysis['avg_sales'], marker_color='skyblue'))

fig1.add_trace(go.Bar(name='Sales Variability (Std)', x=store_type_analysis['type'],
    y=store_type_analysis['std_sales'], marker_color='lightcoral'))

fig1.update_layout(title='Store Type Performance Analysis', barmode='group',
    xaxis_title='Store Type', yaxis_title='Sales', showlegend=True)

fig1.show(renderer='iframe')

### **In the plot above, it seems, standard deviation of sales values is greater than the average of sales values.**

### **This situation may indicate 3 different scenarios :**

### **High Variability: This means that there is a large fluctuation in sales values. In other words, some sales may be very high while others are very low. This means that sales are at very different levels from each other and the deviations from the mean are large.**

### **Unpredictable Situations: In this case, sales become less predictable. There is uncertainty about how much sales figures will vary, which can make planning difficult for the business.**

### **Data Scatter: The data has a wide distribution and much of the data may lie far outside the mean value. This can pose a risk to the business because sales of a certain product may be very low while sales of another product may be very high.**

In [None]:
store_type_analysis['sales_per_store'] = (store_type_analysis['total_sales'] / 
                                         store_type_analysis['store_count']).round(2)
store_type_analysis['coefficient_of_variation'] = (store_type_analysis['std_sales'] / 
                                                 store_type_analysis['avg_sales'] * 100).round(2)


monthly_type_sales = store_type_sales.groupby([pd.Grouper(key='date', freq='M'),'type'])['sales'].mean().reset_index()

fig2 = px.line(monthly_type_sales, x='date', 
              y='sales', color='type',
              title='Monthly Average Sales by Store Type')

fig2.show(renderer='iframe')

### **When the monthly sales amounts are examined, it is seen that the highest sales amounts are significantly different in type A stores.**
### **Type A stores are followed by type B, E, and D stores, while the lowest sales amounts are seen in type C stores**

## OIL PRICE IMPACT

In [None]:
oil['date'] = pd.to_datetime(oil['date'])
oil = oil.rename(columns={'dcoilwtico': 'oil_price'})
oil = oil.sort_values('date')

oil['oil_price'] = oil['oil_price'].ffill()

sales_oil = train.merge(oil, on='date', how='left')

daily_sales_oil = train.groupby('date')['sales'].sum().reset_index()
daily_sales_oil = daily_sales_oil.merge(oil, on='date', how='left')

fig = px.scatter(daily_sales_oil, x='oil_price', y='sales',
                trendline="ols", title='Correlation between Oil Prices and Total Sales')

fig.show(renderer='iframe')

In [None]:
corr = daily_sales_oil['oil_price'].corr(daily_sales_oil['sales'])

print("Correlation:", corr)

### **When the relationship between oil prices and sales amounts is examined, it is seen that there is a negative strong correlation about -0.7.**

### **An increase in oil prices causes a decrease in sales amounts as it increases transportation and freight costs.**

## 30 DAYS MOVING AVERAGES

In [None]:
median_oil_price = daily_sales_oil['oil_price'].median()
daily_sales_oil['oil_price'] = daily_sales_oil['oil_price'].fillna(median_oil_price)

window_size = 30
daily_sales_oil['sales_ma'] = daily_sales_oil['sales'].rolling(window=window_size).mean()
daily_sales_oil['oil_price_ma'] = daily_sales_oil['oil_price'].rolling(window=window_size).mean()

fig = go.Figure()
fig.add_trace(go.Scatter(x=daily_sales_oil['date'], y=daily_sales_oil['sales_ma'],
                        name='30-day MA Sales', line=dict(color='blue')))

fig.add_trace(go.Scatter(x=daily_sales_oil['date'], y=daily_sales_oil['oil_price_ma'],
                        name='30-day MA Oil Price', line=dict(color='red'), yaxis='y2'))

fig.update_layout(
    title='Sales and Oil Prices Over Time (30-day Moving Average)', 
    yaxis=dict(title='Sales', range=[daily_sales_oil['sales_ma'].min() - 10,
                                     daily_sales_oil['sales_ma'].max() + 10]), 
    yaxis2=dict(title='Oil Price', overlaying='y', side='right', range=[daily_sales_oil['oil_price_ma'].min() - 10,
                                                                        daily_sales_oil['oil_price_ma'].max() + 10]),
    showlegend=True)


fig.show(renderer='iframe')

### **Since December 2014 it seems oil prices became lower while Sales became higher.**

## LAG ANALYSIS FOR OIL PRICES

In [None]:
max_lags = 30
lag_correlations = []

for lag in range(max_lags):
    lagged_correlation = daily_sales_oil['sales'].corr(daily_sales_oil['oil_price'].shift(lag))
    lag_correlations.append(lagged_correlation)

fig = px.line(x=range(max_lags), y=lag_correlations,
              title='Sales-Oil Price Correlation by Lag Days',
              labels={'x': 'Lag (days)', 'y': 'Correlation Coefficient'})

fig.show(renderer='iframe')

In [None]:
daily_sales_oil['sales_pct_change'] = daily_sales_oil['sales'].pct_change()
daily_sales_oil['oil_price_pct_change'] = daily_sales_oil['oil_price'].pct_change()

fig1 = px.scatter(daily_sales_oil, x='oil_price_pct_change',
                y='sales_pct_change', trendline="ols",
                title='Daily Percentage Changes: Sales vs Oil Prices')

fig1.show(renderer='iframe')

### TRANSACTIONS ANALYSIS

In [None]:
transactions['date'] = pd.to_datetime(transactions['date'])

trans_store = transactions.merge(stores, on='store_nbr', how='left')

daily_trans = transactions.groupby('date')['transactions'].agg(['sum', 'mean', 'std']).reset_index()
daily_trans.columns = ['date', 'total_transactions', 'avg_transactions', 'std_transactions']

fig = go.Figure()

fig.add_trace(go.Scatter(x=daily_trans['date'], y=daily_trans['total_transactions'],
    name='Total Daily Transactions', line=dict(color='blue')))

fig.update_layout(title='Daily Transaction Volume Over Time', xaxis_title='Date',
    yaxis_title='Number of Transactions')

fig.show(renderer='iframe')

### **Number of transactions seem to be stable when compared by year from 2013 to 2017**

### WEEKLY TRANSACTIONS BY STORE TYPE

In [None]:
store_type_trans = trans_store.groupby('type').agg({'transactions': ['mean', 'std', 'sum'],
    'store_nbr': 'nunique'}).round(2)
store_type_trans.columns = ['avg_daily_trans', 'std_trans', 'total_trans', 'store_count']
store_type_trans['trans_per_store'] = (store_type_trans['total_trans'] / store_type_trans['store_count']).round(2)

trans_store['dayofweek'] = trans_store['date'].dt.dayofweek
weekly_pattern = trans_store.groupby(['type', 'dayofweek'])['transactions'].mean().reset_index()

fig = px.line(weekly_pattern, x='dayofweek', 
              y='transactions', color='type',
              title='Average Daily Transactions by Store Type',
              labels={'dayofweek': 'Day of Week', 'transactions': 'Average Transactions'})

fig.show(renderer='iframe')

### **As expected Store Type A has the highest daily transactions while Store Type C has the lowest**

### MONTHLY TRANSACTIONS BY STORE TYPE

In [None]:
monthly_trans = trans_store.groupby([pd.Grouper(key='date', freq='M'),'type'])['transactions'].mean().reset_index()

fig = px.line(monthly_trans, x='date', y='transactions',
              color='type', title='Monthly Transaction Trends by Store Type')

fig.show(renderer='iframe')

### **Again, as expected Store Type A has the highest daily transactions while Store Type C has the lowest**

### TRANSACTION DENSITY ANALYSIS

In [None]:
trans_density = trans_store.groupby('store_nbr').agg(mean_trans=('transactions', 'mean'),
    total_trans=('transactions', 'sum'), trans_days=('transactions', 'count')).reset_index()


fig = px.histogram(trans_density, x='mean_trans', title='Distribution of Average Daily Transactions per Store',
                  labels={'mean_trans': 'Average Daily Transactions'})

fig.show(renderer='iframe')

In [None]:
trans_sales = transactions.merge(train, on=['date', 'store_nbr'], how='inner')
trans_sales['sales_per_transaction'] = trans_sales['sales'] / trans_sales['transactions']

trans_sales = trans_sales.merge(stores, on='store_nbr', how='left')
spt_by_type = trans_sales.groupby('type')['sales_per_transaction'].agg([
    'mean', 'std', 'min', 'max'
]).round(2)

daily_metrics = trans_sales.groupby('date').agg({'sales': 'sum', 'transactions': 'sum'}).reset_index()

correlation = daily_metrics['sales'].corr(daily_metrics['transactions'])


fig = px.scatter(daily_metrics, x='transactions', y='sales',
                trendline="ols", title='Daily Sales vs Number of Transactions')

fig.show(renderer='iframe')

### **The trend line indicates the relationship between transactions and sales**

### VOLATILITY

In [None]:
daily_metrics['trans_pct_change'] = daily_metrics['transactions'].pct_change()
volatility = daily_metrics.groupby(pd.Grouper(key='date', freq='M'))['trans_pct_change'].std()

fig = px.line(volatility, title='Monthly Transaction Volatility',
              labels={'value': 'Transaction Volatility (Std Dev of % Change)'})

fig.show(renderer='iframe')

### HOLIDAY ANALYSIS

In [None]:
holidays['date'] = pd.to_datetime(holidays['date'])
holidays['year'] = holidays['date'].dt.year

sales_holidays = train.merge(holidays, on='date', how='left')

holiday_impact = sales_holidays.groupby('type')['sales'].agg(['mean', 'std', 'count']).round(2)

holiday_impact

In [None]:
sales_holidays['is_holiday'] = sales_holidays['type'].notna()

holiday_comparison = sales_holidays.groupby('is_holiday')['sales'].agg(['mean', 'std', 'count']).round(2)

holiday_comparison

In [None]:
holiday_transfer = sales_holidays[sales_holidays['type'].notna()].groupby(['type', 'transferred'])['sales'].agg(['mean', 'count']).round(2)

holiday_transfer

In [None]:
holiday_type_sales = sales_holidays[sales_holidays['type'].notna()].groupby('type')['sales'].mean().reset_index()

fig = px.bar(holiday_type_sales, x='type', y='sales', title='Average Sales by Holiday Type',
             labels={'sales': 'Average Sales', 'type': 'Holiday Type'})

fig.show(renderer='iframe')

In [None]:
def get_sales_window(date, window=3):
    start_date = date - pd.Timedelta(days=window)
    end_date = date + pd.Timedelta(days=window)
    return (start_date, end_date)

holiday_dates = holidays[holidays['type'] == 'Holiday']['date'].unique()
holiday_windows = []

for date in holiday_dates:
    start_date, end_date = get_sales_window(date)
    window_sales = train[(train['date'] >= start_date) & 
                        (train['date'] <= end_date)]
    window_sales['days_from_holiday'] = (window_sales['date'] - date).dt.days
    holiday_windows.append(window_sales)

holiday_effect = pd.concat(holiday_windows)
daily_effect = holiday_effect.groupby('days_from_holiday')['sales'].mean().reset_index()

fig = px.line(daily_effect, x='days_from_holiday', y='sales',
              title='Average Sales Around Holidays',
              labels={'days_from_holiday': 'Days from Holiday', 'sales': 'Average Sales'})

fig.add_vline(x=0, line_dash="dash", line_color="red")

fig.show(renderer='iframe')

In [None]:
yearly_holiday_sales = sales_holidays[sales_holidays['type'].notna()].groupby(['year', 'type'])['sales'].mean().reset_index()

fig = px.line(yearly_holiday_sales, x='year', y='sales', 
              color='type', title='Holiday Sales Trends Over Years',
              labels={'sales': 'Average Sales', 'year': 'Year', 'type': 'Holiday Type'})

fig.show(renderer='iframe')

### **An interesting insight is that the additional holidays are the periods of the highest sales**

In [None]:
baseline_sales = sales_holidays[~sales_holidays['is_holiday']]['sales'].mean()

holiday_uplift = sales_holidays[sales_holidays['type'].notna()].groupby('type').agg({
    'sales': lambda x: (x.mean() - baseline_sales) / baseline_sales * 100}).round(2)

holiday_uplift

In [None]:
sales_holidays = sales_holidays.merge(stores[['store_nbr', 'type']], 
                                    on='store_nbr', how='left',
                                    suffixes=('_holiday', '_store'))

store_holiday_interaction = sales_holidays[sales_holidays['type_holiday'].notna()].groupby(['type_store', 'type_holiday'])['sales'].mean().reset_index()

fig = px.bar(store_holiday_interaction, x='type_store', y='sales', color='type_holiday', 
             title='Average Sales by Store Type and Holiday Type', barmode='group',
             labels={'type_store': 'Store Type', 'sales': 'Average Sales', 'type_holiday': 'Holiday Type'})

fig.show(renderer='iframe')

---

### ANALYSIS IN TRAIN SET

In [None]:
train.head()

### MONTHLY SALES TREND

In [None]:
monthly_sales = train.groupby(pd.Grouper(key='date', freq='M'))['sales'].sum().reset_index()

fig = px.line(monthly_sales, x='date', y='sales', title='Monthly Total Sales Over Time',
              labels={'sales': 'Total Sales', 'date': 'Date'})

fig.update_layout(showlegend=False)

fig.show(renderer='iframe')

### **There is a deep dive in July 2017 which I think it's quite normal, because of the lack data for the periods after July 2017**

### SALES BY FAMILY CATEGORY

In [None]:
family_sales = train.groupby('family')['sales'].agg(['sum', 'mean']).reset_index()
family_sales = family_sales.sort_values('sum', ascending=True)

fig = go.Figure()

fig.add_trace(go.Bar(x=family_sales['sum'], y=family_sales['family'],
    orientation='h', name='Total Sales'))

fig.update_layout(title='Total Sales by Product Family',
    xaxis_title='Total Sales', yaxis_title='Product Family', height=800)

fig.show(renderer='iframe')

### DAILY SALES PATTERN

In [None]:
train['dayofweek'] = train['date'].dt.dayofweek
train['month'] = train['date'].dt.month

daily_pattern = train.groupby('dayofweek')['sales'].mean().reset_index()
daily_pattern['dayofweek'] = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

fig = px.bar(daily_pattern, x='dayofweek', y='sales',
             title='Average Daily Sales Pattern',
             labels={'sales': 'Average Sales', 'dayofweek': 'Day of Week'})

fig.show(renderer='iframe')

### MONTHLY SEASONALITY

In [None]:
monthly_pattern = train.groupby('month')['sales'].mean().reset_index()
monthly_pattern['month'] = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

fig = px.line(monthly_pattern, x='month', y='sales', title='Monthly Sales Seasonality',
              labels={'sales': 'Average Sales', 'month': 'Month'})

fig.show(renderer='iframe')

### **November and December have the rising trend because of Black Friday & Christmas & New Year**

### YEAR-TO-YEAR GROWTH ANALYSIS

In [None]:
train['year'] = train['date'].dt.year
yearly_sales = train.groupby('year')['sales'].sum().reset_index()
yearly_growth = yearly_sales.copy()
yearly_growth['growth'] = yearly_growth['sales'].pct_change() * 100

fig = go.Figure()

fig.add_trace(go.Bar(x=yearly_growth['year'], y=yearly_growth['sales'], name='Total Sales'))

fig.add_trace(go.Scatter(x=yearly_growth['year'], y=yearly_growth['growth'], name='Growth Rate (%)', yaxis='y2'))

fig.update_layout(title='Yearly Sales and Growth Rate', yaxis=dict(title='Total Sales'),
    yaxis2=dict(title='Growth Rate (%)', overlaying='y', side='right'), showlegend=True)

fig.show(renderer='iframe')

### **There is an decreasing trend on growth rate since 2014 in the dataset, which is displayed on the plot**

### KEY METRICS

In [None]:
total_sales = train['sales'].sum()
avg_daily_sales = train.groupby('date')['sales'].sum().mean()
sales_volatility = train.groupby('date')['sales'].sum().std() / avg_daily_sales * 100

print(f"\nKey Metrics:")
print('----------------------------------')
print(f"Total Sales: {total_sales:,.2f}")
print(f"Average Daily Sales: {avg_daily_sales:,.2f}")
print(f"Sales Volatility (CV): {sales_volatility:.2f}%")

---

### MOVING AVERAGES

In [None]:
daily_sales = train.groupby('date')['sales'].sum().reset_index()

windows = [7, 14, 30, 90]
for window in windows:
    daily_sales[f'MA_{window}'] = daily_sales['sales'].rolling(window=window).mean()

fig = go.Figure()

fig.add_trace(go.Scatter(x=daily_sales['date'], y=daily_sales['sales'],
    name='Raw Sales', line=dict(color='gray', width=1), opacity=0.5))

colors = ['blue', 'green', 'red', 'purple']

for window, color in zip(windows, colors):
    fig.add_trace(go.Scatter(x=daily_sales['date'], y=daily_sales[f'MA_{window}'],
        name=f'{window}-day MA', line=dict(color=color, width=2)))

fig.update_layout(title='Sales Trends with Multiple Moving Averages',
    xaxis_title='Date', yaxis_title='Sales', legend_title='Moving Averages',
    hovermode='x unified')

fig.show(renderer='iframe')

## VOLATILITY

In [None]:
for window in windows:
    daily_sales[f'Volatility_{window}'] = daily_sales['sales'].rolling(window=window).std()

fig = go.Figure()

for window, color in zip(windows, colors):
    fig.add_trace(go.Scatter(x=daily_sales['date'], y=daily_sales[f'Volatility_{window}'],
        name=f'{window}-day Volatility', line=dict(color=color, width=2)))

fig.update_layout(title='Sales Volatility Over Time', xaxis_title='Date',
    yaxis_title='Standard Deviation', legend_title='Rolling Windows', hovermode='x unified')

fig.show(renderer='iframe')

### MOVING AVERAGES BY STORE TYPE

In [None]:
store_sales = train.merge(stores, on='store_nbr')
daily_store_type = store_sales.groupby(['date', 'type'])['sales'].sum().reset_index()

store_type_ma = {}
for store_type in daily_store_type['type'].unique():
    type_data = daily_store_type[daily_store_type['type'] == store_type]
    store_type_ma[store_type] = pd.DataFrame({'date': type_data['date'], 'sales': type_data['sales']})
    
    for window in windows:
        store_type_ma[store_type][f'MA_{window}'] = type_data['sales'].rolling(window=window).mean()

fig = go.Figure()

for store_type, data in store_type_ma.items():
    
    fig.add_trace(go.Scatter(x=data['date'], y=data['MA_30'],
        name=f'{store_type}', line=dict(width=2)))

fig.update_layout(title='30-Day Moving Average Sales by Store Type',
    xaxis_title='Date', yaxis_title='Sales (30-day MA)',
    legend_title='Store Type', hovermode='x unified')

fig.show(renderer='iframe')

---

### EXPONENTIAL MOVING AVERAGES

In [None]:
daily_sales = train.groupby('date')['sales'].sum().reset_index()

spans = [7, 14, 30, 90]
for span in spans:
    daily_sales[f'EMA_{span}'] = daily_sales['sales'].ewm(span=span, adjust=False).mean()

fig = go.Figure()

fig.add_trace(go.Scatter(x=daily_sales['date'], y=daily_sales['sales'], name='Raw Sales',
    line=dict(color='lightgray', width=1), opacity=0.5))

colors = ['blue', 'green', 'red', 'purple']

for span, color in zip(spans, colors):
    fig.add_trace(go.Scatter(x=daily_sales['date'], y=daily_sales[f'EMA_{span}'],
        name=f'{span}-day EMA', line=dict(color=color, width=2)))

fig.update_layout(title='Sales Trends with Multiple Exponential Moving Averages',
    xaxis_title='Date', yaxis_title='Sales', legend_title='Exponential Moving Averages',
    hovermode='x unified')

fig.show(renderer='iframe')

---

### DECOMPOSITION ANALYSIS

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from scipy import stats

daily_sales = train.groupby('date')['sales'].sum().reset_index()
daily_sales = daily_sales.set_index('date')

decomposition_mult = sm.tsa.seasonal_decompose(daily_sales['sales'], period=7, model='multiplicative')

fig = make_subplots(rows=4, cols=1, subplot_titles=('Original', 'Trend', 'Seasonal', 'Residual'),
                    vertical_spacing=0.1)


fig.add_trace( go.Scatter(x=daily_sales.index, y=daily_sales['sales'],
               name='Original', line=dict(color='blue')), row=1, col=1)


fig.add_trace(go.Scatter(x=daily_sales.index, y=decomposition_mult.trend,
               name='Trend', line=dict(color='red')), row=2, col=1)


fig.add_trace(go.Scatter(x=daily_sales.index, y=decomposition_mult.seasonal,
               name='Seasonal', line=dict(color='green')), row=3, col=1)


fig.add_trace(go.Scatter(x=daily_sales.index, y=decomposition_mult.resid,
               name='Residual', line=dict(color='purple')), row=4, col=1)

fig.update_layout(height=1200, title='Multiplicative Time Series Decomposition', showlegend=False)

fig.show(renderer='iframe')

### SEASONAL DECOMPOSITION WITH ADDITIVE MODEL

In [None]:
decomposition_add = sm.tsa.seasonal_decompose(daily_sales['sales'], period=7, model='additive')

fig = make_subplots(rows=4, cols=1, subplot_titles=('Original', 'Trend', 'Seasonal', 'Residual'), vertical_spacing=0.1)

fig.add_trace(go.Scatter(x=daily_sales.index, y=daily_sales['sales'],
               name='Original', line=dict(color='blue')), row=1, col=1)


fig.add_trace(go.Scatter(x=daily_sales.index, y=decomposition_add.trend,
               name='Trend', line=dict(color='red')), row=2, col=1)


fig.add_trace(go.Scatter(x=daily_sales.index, y=decomposition_add.seasonal,
               name='Seasonal', line=dict(color='green')), row=3, col=1)


fig.add_trace(go.Scatter(x=daily_sales.index, y=decomposition_add.resid,
               name='Residual', line=dict(color='purple')), row=4, col=1)

fig.update_layout(height=1200, title='Additive Time Series Decomposition', showlegend=False)

fig.show(renderer='iframe')

### SEASONAL PATTERNS

In [None]:
seasonal_patterns = pd.DataFrame({'Multiplicative': decomposition_mult.seasonal, 'Additive': decomposition_add.seasonal})

fig = go.Figure()

fig.add_trace(go.Scatter(x=seasonal_patterns.index[-7:], y=seasonal_patterns['Multiplicative'][-7:], name='Multiplicative', line=dict(color='blue')))

fig.add_trace(go.Scatter(x=seasonal_patterns.index[-7:], y=seasonal_patterns['Additive'][-7:], name='Additive', line=dict(color='red')))

fig.update_layout(title='Weekly Seasonal Patterns Comparison', xaxis_title='Date', yaxis_title='Seasonal Component', showlegend=True)

fig.show(renderer='iframe')

### STRENGTH OF TREND AND SEASONALITY

In [None]:
def strength_of_trend(decomposition):
    if isinstance(decomposition.resid, pd.Series):
        detrended = decomposition.resid + decomposition.seasonal
    else:
        detrended = pd.Series(decomposition.resid + decomposition.seasonal)
    return max(0, 1 - np.var(detrended) / np.var(decomposition.trend + detrended))

def strength_of_seasonality(decomposition):
    if isinstance(decomposition.resid, pd.Series):
        deseasonalized = decomposition.resid + decomposition.trend
    else:
        deseasonalized = pd.Series(decomposition.resid + decomposition.trend)
    return max(0, 1 - np.var(decomposition.resid) / np.var(decomposition.seasonal + decomposition.resid))


mult_trend_strength = strength_of_trend(decomposition_mult)
mult_seasonal_strength = strength_of_seasonality(decomposition_mult)
add_trend_strength = strength_of_trend(decomposition_add)
add_seasonal_strength = strength_of_seasonality(decomposition_add)

print(f"Trend Strength: {mult_trend_strength:.2f}")
print(f"Seasonal Strength: {mult_seasonal_strength:.2f}")
print(f"Trend Strength: {mult_trend_strength:.2f}")
print(f"Seasonal Strength: {mult_seasonal_strength:.2f}")

## RESIDUAL ANALYSIS

In [None]:
fig = make_subplots(rows=2, cols=2,
                    subplot_titles=('Multiplicative Residuals Distribution', 
                                  'Additive Residuals Distribution',
                                  'Multiplicative Residuals QQ Plot', 
                                  'Additive Residuals QQ Plot'))

fig.add_trace(go.Histogram(x=decomposition_mult.resid.dropna(), 
                 name='Multiplicative', nbinsx=50), row=1, col=1)


fig.add_trace(go.Histogram(x=decomposition_add.resid.dropna(), 
                 name='Additive', nbinsx=50), row=1, col=2)

fig.show(renderer='iframe')

## RESIDUAL ANALYSIS

In [None]:
mult_resid = decomposition_mult.resid.dropna()
add_resid = decomposition_add.resid.dropna()

mult_quantiles = np.quantile(mult_resid, np.linspace(0, 1, len(mult_resid)))
add_quantiles = np.quantile(add_resid, np.linspace(0, 1, len(add_resid)))
theoretical_quantiles = stats.norm.ppf(np.linspace(0.01, 0.99, len(mult_resid)))

fig.add_trace(go.Scatter(x=theoretical_quantiles, y=mult_quantiles, mode='markers',
                         name='Multiplicative QQ'), row=2, col=1)

fig.add_trace(go.Scatter(x=theoretical_quantiles, y=add_quantiles,
               mode='markers', name='Additive QQ'), row=2, col=2)

fig.update_layout(height=800, title='Residual Analysis', showlegend=False)

fig.show(renderer='iframe')

---

## ANOMALY DETECTION

In [None]:
daily_sales = train.groupby('date')['sales'].sum().reset_index()

### ZSCORE

In [None]:
def detect_anomalies_zscore(data, threshold=3):
    mean = np.mean(data)
    std = np.std(data)
    z_scores = np.abs((data - mean) / std)
    return z_scores > threshold


daily_sales['zscore'] = np.abs((daily_sales['sales'] - daily_sales['sales'].mean()) / daily_sales['sales'].std())
daily_sales['is_anomaly_zscore'] = detect_anomalies_zscore(daily_sales['sales'])

### IQR

In [None]:
Q1 = daily_sales['sales'].quantile(0.25)
Q3 = daily_sales['sales'].quantile(0.75)
IQR = Q3 - Q1

daily_sales['is_anomaly_iqr'] = (daily_sales['sales'] < (Q1 - 1.5 * IQR)) | (daily_sales['sales'] > (Q3 + 1.5 * IQR))

### ROLLING STATISTICS METHOD

In [None]:
window = 30

daily_sales['rolling_mean'] = daily_sales['sales'].rolling(window=window).mean()
daily_sales['rolling_std'] = daily_sales['sales'].rolling(window=window).std()
daily_sales['is_anomaly_rolling'] = ((daily_sales['sales'] < (daily_sales['rolling_mean'] - 3 * daily_sales['rolling_std'])) |
    (daily_sales['sales'] > (daily_sales['rolling_mean'] + 3 * daily_sales['rolling_std'])))

### VISUALIZATION ANOMALIES WITH Z-SCORE

In [None]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=daily_sales[~daily_sales['is_anomaly_zscore']]['date'],
    y=daily_sales[~daily_sales['is_anomaly_zscore']]['sales'],
    mode='markers',name='Normal',marker=dict(color='blue', size=4)))


fig.add_trace(go.Scatter(x=daily_sales[daily_sales['is_anomaly_zscore']]['date'],
    y=daily_sales[daily_sales['is_anomaly_zscore']]['sales'],
    mode='markers', name='Anomaly (Z-score)', marker=dict(color='red', size=8, symbol='x')))

fig.update_layout(title='Anomaly Detection using Z-Score Method',
    xaxis_title='Date', yaxis_title='Sales', showlegend=True)

fig.show(renderer='iframe')

### COMBINED METHOD FOR ANOMALY DETECTION

In [None]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=daily_sales['date'], y=daily_sales['sales'],
    mode='lines', name='Sales', line=dict(color='gray')))


fig.add_trace(go.Scatter(x=daily_sales['date'], y=daily_sales['rolling_mean'],
    mode='lines', name='Rolling Mean', line=dict(color='gold', dash='dash')))


fig.add_trace(go.Scatter(x=daily_sales['date'], y=daily_sales['rolling_mean'] + 3 * daily_sales['rolling_std'],
    mode='lines', name='Upper Bound', line=dict(color='navy', dash='dot')))

fig.add_trace(go.Scatter(x=daily_sales['date'], y=daily_sales['rolling_mean'] - 3 * daily_sales['rolling_std'],
    mode='lines', name='Lower Bound', line=dict(color='purple', dash='dot')))


fig.add_trace(go.Scatter(x=daily_sales[daily_sales['is_anomaly_zscore']]['date'],
    y=daily_sales[daily_sales['is_anomaly_zscore']]['sales'], mode='markers',
    name='Z-score Anomaly', marker=dict(color='red', size=8, symbol='x')))

fig.add_trace(go.Scatter(x=daily_sales[daily_sales['is_anomaly_iqr']]['date'],
    y=daily_sales[daily_sales['is_anomaly_iqr']]['sales'],
    mode='markers', name='IQR Anomaly', marker=dict(color='magenta', size=8, symbol='circle')))

fig.add_trace(go.Scatter(x=daily_sales[daily_sales['is_anomaly_rolling']]['date'],
    y=daily_sales[daily_sales['is_anomaly_rolling']]['sales'], mode='markers',
    name='Rolling Anomaly', marker=dict(color='green', size=8, symbol='diamond')))

fig.update_layout(title='Anomaly Detection - Multiple Methods Comparison',
    xaxis_title='Date', yaxis_title='Sales', showlegend=True)

fig.show(renderer='iframe')

### ANOMALIES BY DAY OF WEEK

In [None]:
daily_sales['dayofweek'] = daily_sales['date'].dt.dayofweek
anomaly_dow = daily_sales.groupby('dayofweek')['is_anomaly_zscore'].mean() * 100

fig = px.bar(x=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'],
    y=anomaly_dow.values, title='Percentage of Anomalies by Day of Week',
    labels={'x': 'Day of Week', 'y': 'Anomaly Percentage'})

fig.show(renderer='iframe')

### SEASONAL ANALYSIS IN ANOMALIES

In [None]:
daily_sales['month'] = daily_sales['date'].dt.month
anomaly_month = daily_sales.groupby('month')['is_anomaly_zscore'].mean() * 100

fig = px.line(x=range(1, 13), y=anomaly_month.values,
    title='Percentage of Anomalies by Month',
    labels={'x': 'Month', 'y': 'Anomaly Percentage'})

fig.show(renderer='iframe')

---

### STORE NETWORK ANALYSIS

In [None]:
type_colors = {'A': 'red', 'B': 'blue', 'C': 'green', 'D': 'purple', 'E': 'orange'}

fig = go.Figure()

for _, store in stores.iterrows():
    fig.add_trace(go.Scatter(x=[store['city']], y=[store['state']],
        mode='markers', name=f'Store {store["store_nbr"]}',
        marker=dict(size=10, color=type_colors[store['type']], symbol='circle'),
        text=f'Store {store["store_nbr"]}<br>Type: {store["type"]}<br>City: {store["city"]}',
        hoverinfo='text'))

fig.update_layout(title='Store Network Distribution', xaxis_title='City', yaxis_title='State', showlegend=False)

fig.show(renderer='iframe')

---

## FORECASTING WITH PROPHET

In [None]:
pip install prophet

In [None]:
import prophet

In [None]:
prophet_data = daily_sales.rename(columns={'date': 'ds', 'sales': 'y'})

from prophet import Prophet

model = Prophet(yearly_seasonality=True, weekly_seasonality=True, daily_seasonality=False, changepoint_prior_scale=0.05,
    seasonality_prior_scale=10, holidays_prior_scale=10)

model.add_country_holidays(country_name='US')

model.fit(prophet_data)

future_dates = model.make_future_dataframe(periods=30) # 30 DAYS
forecast = model.predict(future_dates)

In [None]:
## VISUALIZING ACTUAL VALUES

fig = go.Figure()

fig.add_trace(go.Scatter(x=prophet_data['ds'], y=prophet_data['y'],
    name='Actual', mode='markers', marker=dict(color='red', size=2),))

## VISUALIZING PREDICTED VALUES

fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat'],
    name='Predicted', mode='lines', line=dict(color='gold'),))

## UNCERTAINTY INTERVALS

fig.add_trace(go.Scatter(x=forecast['ds'].tolist() + forecast['ds'].tolist()[::-1],
    y=forecast['yhat_upper'].tolist() + forecast['yhat_lower'].tolist()[::-1],
    fill='toself', fillcolor='orange', line=dict(color='blue'),
    name='Prediction Interval'))

fig.update_layout(title='Sales Forecast with Prophet',
    xaxis_title='Date', yaxis_title='Sales', showlegend=True)

fig.show(renderer='iframe')

In [None]:
fig = model.plot_components(forecast)

plt.show()

### ADVANCE FORECASTING TECHNIQUES WITH PROPHET


In [None]:
daily_sales = train.groupby('date')['sales'].sum().reset_index()

In [None]:
oil['date'] = pd.to_datetime(oil['date'])
oil = oil.rename(columns={'dcoilwtico': 'oil_price'})
oil['oil_price'] = oil['oil_price'].fillna(method='ffill').fillna(method='bfill')

In [None]:
prophet_data = daily_sales.rename(columns={'date': 'ds', 'sales': 'y'})
prophet_data['day_of_week'] = prophet_data['ds'].dt.dayofweek
prophet_data['month'] = prophet_data['ds'].dt.month
prophet_data['quarter'] = prophet_data['ds'].dt.quarter

In [None]:
prophet_data = prophet_data.merge(oil[['date', 'oil_price']], left_on='ds', right_on='date', how='left')

In [None]:
prophet_data['oil_price'] = prophet_data['oil_price'].fillna(prophet_data['oil_price'].mean())

In [None]:
## ADVANCE PROPHET MODEL TO FORECAST

model = Prophet(yearly_seasonality=20, weekly_seasonality=True,
    daily_seasonality=False, changepoint_prior_scale=0.05,
    seasonality_prior_scale=10, holidays_prior_scale=10,
    changepoint_range=0.95, interval_width=0.95)

model.add_country_holidays(country_name='US')

model.add_seasonality(name='quarterly', period=91.25, fourier_order=5)

model.add_seasonality(name='monthly', period=30.5, fourier_order=5)

model.add_regressor('oil_price', mode='multiplicative')

model.fit(prophet_data)

In [None]:
future_dates = model.make_future_dataframe(periods=60)
future_dates['oil_price'] = prophet_data['oil_price'].iloc[-1]
forecast = model.predict(future_dates)

### FORECAST VISUALIZATIONS

In [None]:
fig = make_subplots(rows=3, cols=1, subplot_titles=('Forecast', 'Trend', 'Seasonalities'))

fig.add_trace(go.Scatter(x=prophet_data['ds'], y=prophet_data['y'],
               name='Actual', mode='markers', marker=dict(color='blue', size=2)), row=1, col=1)

fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat'],
               name='Forecast', line=dict(color='red')), row=1, col=1)

fig.add_trace(go.Scatter(x=forecast['ds'].tolist() + forecast['ds'].tolist()[::-1],
               y=forecast['yhat_upper'].tolist() + forecast['yhat_lower'].tolist()[::-1],
               fill='toself', fillcolor='rgba(255,0,0,0.2)', line=dict(color='rgba(255,0,0,0)'),
               name='Prediction Interval'), row=1, col=1)

fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['trend'],
               name='Trend', line=dict(color='green')), row=2, col=1)

fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yearly'],
               name='Yearly Seasonality', line=dict(color='purple')),row=3, col=1)

fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['weekly'],
               name='Weekly Seasonality', line=dict(color='orange')),row=3, col=1)

fig.update_layout(height=1200, title='Advanced Sales Forecast Analysis')

fig.show(renderer='iframe')

In [None]:
components = ['trend', 'yearly', 'weekly', 'quarterly', 'monthly', 'holidays', 'oil_price']
fig = make_subplots(rows=len(components), cols=1, subplot_titles=components, vertical_spacing=0.05)

for i, component in enumerate(components, 1):
    if component in forecast.columns:
        fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast[component],name=component), row=i, col=1)

fig.update_layout(height=2000, title='Forecast Components Visualizations')

fig.show(renderer='iframe')

### MODEL PERFORMANCE ANALYSIS VISUALIZATIONS

In [None]:
metrics = pd.DataFrame({'date': prophet_data['ds'], 'actual': prophet_data['y'],
    'predicted': forecast['yhat'][:len(prophet_data)], 'error': prophet_data['y'] - forecast['yhat'][:len(prophet_data)]})


metrics['rolling_rmse'] = np.sqrt(metrics['error'].rolling(window=30).apply(lambda x: np.mean(x**2)))

metrics['rolling_mape'] = metrics['error'].abs().rolling(window=30).mean() / metrics['actual'].rolling(window=30).mean() * 100


fig = make_subplots(rows=2, cols=1, subplot_titles=('Rolling RMSE', 'Rolling MAPE'))

fig.add_trace(go.Scatter(x=metrics['date'], y=metrics['rolling_rmse'], name='Rolling RMSE'), row=1, col=1)

fig.add_trace(go.Scatter(x=metrics['date'], y=metrics['rolling_mape'], name='Rolling MAPE'), row=2, col=1)

fig.update_layout(height=800, title='Rolling Performance Metrics')

fig.show(renderer='iframe')

In [None]:
fig = make_subplots(rows=2, cols=2, subplot_titles=('Residual Distribution', 'Residual QQ Plot',
                                  'Residuals vs Fitted', 'Residual Autocorrelation'))

fig.add_trace(go.Histogram(x=metrics['error'], nbinsx=50, name='Residuals'),row=1, col=1)

theoretical_quantiles = stats.norm.ppf(np.linspace(0.01, 0.99, len(metrics)))
sample_quantiles = np.quantile(metrics['error'].dropna(), np.linspace(0.01, 0.99, len(metrics)))

fig.add_trace(go.Scatter(x=theoretical_quantiles, y=sample_quantiles, mode='markers', name='QQ Plot'),row=1, col=2)


fig.add_trace(go.Scatter(x=metrics['predicted'], y=metrics['error'],
                         mode='markers', marker=dict(size=2), name='Residuals vs Fitted'),
              row=2, col=1)


acf_values = sm.tsa.acf(metrics['error'].dropna(), nlags=40)

fig.add_trace(go.Scatter(x=list(range(len(acf_values))), y=acf_values, name='ACF'),row=2, col=2)

fig.update_layout(height=1000, title='Residual Analysis')

fig.show(renderer='iframe')

### CROSS-VALIDATION WITH PROPHET

In [None]:
from prophet.diagnostics import cross_validation, performance_metrics

df_cv = cross_validation(model, initial='365 days', period='30 days', horizon='60 days')
df_p = performance_metrics(df_cv)

print("\nCross-validation Metrics:")
print(df_p[['horizon', 'rmse', 'mape', 'coverage']].round(2))

---

## THE END

---