# Seasonal and Trend Decomposition Analysis
**Dataset:** Shopee Philippines Uncommon Retail Products (March 2022 – November 2025)  
**Objective:** Decompose sales patterns into Trend, Seasonal, and Residual components for product and category-level insights.

---

## Methodology Overview
1. **Data Preparation:** Load CSV, parse dates (dd/mm/yyyy format), select metrics
2. **Product-Level Processing:** Truncate pre-listing NaNs, filter by duration (≥12 months)
3. **Category-Level Aggregation:** Sum sales by category for market analysis
4. **Decomposition:** Apply additive model (Y = T + S + R) with 12-month period
5. **Metric Extraction:** Calculate seasonal amplitude, trend slope, residual anomalies
6. **Visualization:** Plot strategic samples and export results

## 1. Import Required Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Set visualization styles
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("Libraries imported successfully!")

Libraries imported successfully!


## 2. Define Cleanup Function
Clean up old export files before generating new results.

In [2]:
import os
import shutil

def cleanup_exports():
    """
    Remove old export files and directories before generating new results.
    This ensures we always have fresh exports without accumulating old files.
    """
    # Define export paths (relative to notebook location in EDA/)
    csv_file = '../seasonal_trend/seasonal_decomposition_results.csv'
    viz_base_dir = '../seasonal_trend/visualizations'
    
    # Remove CSV file if it exists
    if os.path.exists(csv_file):
        os.remove(csv_file)
        print(f"Removed old file: {csv_file}")
    
    # Remove visualizations directory if it exists
    if os.path.exists(viz_base_dir):
        shutil.rmtree(viz_base_dir)
        print(f"Removed old directory: {viz_base_dir}/")
    
    print("Cleanup completed. Ready for new exports.")

# Execute cleanup
cleanup_exports()

Cleanup completed. Ready for new exports.


## 3. Load and Prepare Dataset

In [3]:
# Load the CSV dataset
df = pd.read_csv('../CSV/consolidated_file_cleaned_v2.csv')

print(f"Dataset loaded successfully!")
print(f"Shape: {df.shape}")
print(f"\nColumn names:")
for i, col in enumerate(df.columns, 1):
    print(f"{i}. {col}")

print(f"\nFirst few rows:")
print(df.head())

print(f"\nData types:")
print(df.dtypes)

Dataset loaded successfully!
Shape: (7554662, 28)

Column names:
1. product
2. time
3. avg.sku_price(₱)
4. sold/day
5. revenue/day(₱)
6. sold/m
7. product_sales_rate(%)
8. price(₱)
9. sku
10. sold
11. sold/month(₱)
12. revenue/month
13. new_ratings
14. ratings
15. ratings_rate
16. likes
17. rating_star
18. new_likes
19. second-level_category
20. third-level_category
21. fourth-level_category
22. fifth-level_category
23. id
24. top-level_category
25. seller_from
26. listing_time
27. active_months
28. suitable_for_seasonal_analysis

First few rows:
                                             product        time  \
0     Cute Different Designs  button accessories ...  2022-03-01   
1     Cute Different Designs  button accessories ...  2022-04-01   
2     Cute Different Designs  button accessories ...  2022-05-01   
3     Cute Different Designs  button accessories ...  2022-06-01   
4     Cute Different Designs  button accessories ...  2022-07-01   

   avg.sku_price(₱)  sold/day  revenue

## 4. Understand Data Structure
Examine the `time` column and identify how the data is organized (already in YYYY-MM-DD format).

In [4]:
# Examine the time column and data structure
print("Sample 'time' values:")
print(df['time'].head(20))

print(f"\nUnique time values: {df['time'].nunique()}")
print(f"\nTime value examples:")
print(df['time'].unique()[:10])

# Check if time is already datetime or needs parsing
print(f"\nData type of 'time' column: {df['time'].dtype}")

# Check if suitable_for_seasonal_analysis exists
if 'suitable_for_seasonal_analysis' in df.columns:
    print(f"\nChecking 'suitable_for_seasonal_analysis' column:")
    print(df['suitable_for_seasonal_analysis'].value_counts())

# Check data structure - is this wide format or long format?
print(f"\nSample of key columns:")
print(df[['product', 'time', 'sold/m', 'revenue/month', 'top-level_category']].head(20))

Sample 'time' values:
0     2022-03-01
1     2022-04-01
2     2022-05-01
3     2022-06-01
4     2022-07-01
5     2022-08-01
6     2022-09-01
7     2022-10-01
8     2022-11-01
9     2022-12-01
10    2023-01-01
11    2023-02-01
12    2023-03-01
13    2023-04-01
14    2023-05-01
15    2023-06-01
16    2023-07-01
17    2023-08-01
18    2023-09-01
19    2023-10-01
Name: time, dtype: object

Unique time values: 45

Time value examples:

Unique time values: 45

Time value examples:
['2022-03-01' '2022-04-01' '2022-05-01' '2022-06-01' '2022-07-01'
 '2022-08-01' '2022-09-01' '2022-10-01' '2022-11-01' '2022-12-01']

Data type of 'time' column: object

Checking 'suitable_for_seasonal_analysis' column:
suitable_for_seasonal_analysis
True     7142715
False     411947
Name: count, dtype: int64

Sample of key columns:
['2022-03-01' '2022-04-01' '2022-05-01' '2022-06-01' '2022-07-01'
 '2022-08-01' '2022-09-01' '2022-10-01' '2022-11-01' '2022-12-01']

Data type of 'time' column: object

Checking 'suita

## 5. Parse Time Column and Restructure Data
Convert time strings to datetime objects and pivot to create time series per product.

In [5]:
# Parse time column - it's already in YYYY-MM-DD format
df['date'] = pd.to_datetime(df['time'], errors='coerce')

# Check for parsing errors
print(f"Successfully parsed dates: {df['date'].notna().sum()} / {len(df)}")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")

# Set frequency to Month Start for time series analysis
df['period'] = df['date'].dt.to_period('M')

print(f"\nUnique periods: {df['period'].nunique()}")
print(f"Sample periods: {sorted(df['period'].unique())[:12]}")

# Display data structure
print(f"\nData structure (long format - multiple rows per product):")
print(df[['product', 'date', 'period', 'sold/m', 'revenue/month', 'top-level_category']].head(15))

Successfully parsed dates: 7554662 / 7554662
Date range: 2022-03-01 00:00:00 to 2025-11-01 00:00:00

Unique periods: 45
Sample periods: [Period('2022-03', 'M'), Period('2022-04', 'M'), Period('2022-05', 'M'), Period('2022-06', 'M'), Period('2022-07', 'M'), Period('2022-08', 'M'), Period('2022-09', 'M'), Period('2022-10', 'M'), Period('2022-11', 'M'), Period('2022-12', 'M'), Period('2023-01', 'M'), Period('2023-02', 'M')]

Data structure (long format - multiple rows per product):

Unique periods: 45
Sample periods: [Period('2022-03', 'M'), Period('2022-04', 'M'), Period('2022-05', 'M'), Period('2022-06', 'M'), Period('2022-07', 'M'), Period('2022-08', 'M'), Period('2022-09', 'M'), Period('2022-10', 'M'), Period('2022-11', 'M'), Period('2022-12', 'M'), Period('2023-01', 'M'), Period('2023-02', 'M')]

Data structure (long format - multiple rows per product):
                                              product       date   period  \
0      Cute Different Designs  button accessories ... 2

## 6. Product-Level Processing: Truncation and Filtering
**Step A:** Filter products marked as suitable for seasonal analysis  
**Step B:** Create time series per product (pivot from long to wide format)  
**Step C:** Truncate pre-listing NaNs and filter by duration (≥12 months)

In [6]:
# Step A: Filter products suitable for seasonal analysis
if 'suitable_for_seasonal_analysis' in df.columns:
    df_filtered = df[df['suitable_for_seasonal_analysis'] == True].copy()
    print(f"Products marked suitable for seasonal analysis: {df_filtered['product'].nunique()}")
else:
    df_filtered = df.copy()
    print("No 'suitable_for_seasonal_analysis' column found, using all products")

# Step B: Pivot data to create time series per product
# Primary metric: sold/m (units sold per month)
pivot_sold = df_filtered.pivot_table(
    index='product',
    columns='period',
    values='sold/m',
    aggfunc='first'  # Use first value if duplicates
)

# Secondary metric: revenue/month (for validation)
pivot_revenue = df_filtered.pivot_table(
    index='product',
    columns='period',
    values='revenue/month',
    aggfunc='first'
)

# Get category mapping
product_category = df_filtered.groupby('product')['top-level_category'].first()

print(f"\nPivoted data shape: {pivot_sold.shape}")
print(f"Products: {len(pivot_sold)}, Time periods: {len(pivot_sold.columns)}")
print(f"\nTime periods: {pivot_sold.columns.tolist()[:12]}")

# Step C: Process each product - truncate and filter
processed_products = []

for product_id in pivot_sold.index:
    # Get time series data
    ts_data = pivot_sold.loc[product_id].values
    ts_dates = pivot_sold.columns
    
    # Find first valid (non-NaN) index (listing date)
    valid_indices = np.where(~pd.isna(ts_data))[0]
    
    if len(valid_indices) == 0:
        continue  # Skip products with no data
    
    first_valid_idx = valid_indices[0]
    
    # Truncate pre-listing NaNs
    truncated_data = ts_data[first_valid_idx:]
    truncated_dates = ts_dates[first_valid_idx:]
    
    # Filter by duration (≥12 months)
    if len(truncated_data) < 12:
        continue
    
    # Store processed product
    processed_products.append({
        'product_id': product_id,
        'category': product_category.get(product_id, 'Unknown'),
        'time_series': truncated_data,
        'dates': truncated_dates
    })

print(f"\nTotal products after pivot: {len(pivot_sold)}")
print(f"Products with ≥12 months data: {len(processed_products)}")
print(f"Filtered out: {len(pivot_sold) - len(processed_products)} products")

Products marked suitable for seasonal analysis: 158437

Pivoted data shape: (158437, 45)
Products: 158437, Time periods: 45

Time periods: [Period('2022-03', 'M'), Period('2022-04', 'M'), Period('2022-05', 'M'), Period('2022-06', 'M'), Period('2022-07', 'M'), Period('2022-08', 'M'), Period('2022-09', 'M'), Period('2022-10', 'M'), Period('2022-11', 'M'), Period('2022-12', 'M'), Period('2023-01', 'M'), Period('2023-02', 'M')]

Pivoted data shape: (158437, 45)
Products: 158437, Time periods: 45

Time periods: [Period('2022-03', 'M'), Period('2022-04', 'M'), Period('2022-05', 'M'), Period('2022-06', 'M'), Period('2022-07', 'M'), Period('2022-08', 'M'), Period('2022-09', 'M'), Period('2022-10', 'M'), Period('2022-11', 'M'), Period('2022-12', 'M'), Period('2023-01', 'M'), Period('2023-02', 'M')]

Total products after pivot: 158437
Products with ≥12 months data: 158412
Filtered out: 25 products

Total products after pivot: 158437
Products with ≥12 months data: 158412
Filtered out: 25 products

## 7. Category-Level Aggregation (Market Analysis)
Sum sales by category to create category-level demand profiles for broad market trend analysis.

In [7]:
# Group products by category and aggregate sales
category_aggregates = {}

for product in processed_products:
    category = product['category']
    dates = product['dates']
    sales = product['time_series']
    
    if category not in category_aggregates:
        # Initialize category with full date range
        # Find the earliest and latest dates across all products in category
        category_aggregates[category] = {
            'products': [],
            'dates_list': []
        }
    
    category_aggregates[category]['products'].append({
        'dates': dates,
        'sales': sales
    })

# Aggregate sales for each category
for category in category_aggregates:
    # Find common date range
    all_dates = []
    for prod in category_aggregates[category]['products']:
        all_dates.extend(prod['dates'].tolist())
    
    unique_dates = sorted(set(all_dates))
    
    # Sum sales across products for each date
    aggregated_sales = np.zeros(len(unique_dates))
    
    for prod in category_aggregates[category]['products']:
        for i, date in enumerate(unique_dates):
            if date in prod['dates']:
                idx = prod['dates'].tolist().index(date)
                if not pd.isna(prod['sales'][idx]):
                    aggregated_sales[i] += prod['sales'][idx]
    
    category_aggregates[category]['dates'] = pd.PeriodIndex(unique_dates)
    category_aggregates[category]['sales'] = aggregated_sales

print(f"Categories found: {len(category_aggregates)}")
print(f"\nCategory summary:")
for cat, data in category_aggregates.items():
    valid_sales = data['sales'][~pd.isna(data['sales'])]
    print(f"  {cat}: {len(data['products'])} products, {len(data['dates'])} months, "
          f"Total sales: {np.nansum(valid_sales):.0f}")

Categories found: 30

Category summary:
  Fashion Accessories: 7457 products, 45 months, Total sales: 22010263377
  Beauty: 16501 products, 45 months, Total sales: 925950278
  Stationery: 10429 products, 45 months, Total sales: 347964820
  Computers & Accessories: 1827 products, 45 months, Total sales: 19871788
  Women Shoes: 3184 products, 45 months, Total sales: 52932499
  Home & Living: 36960 products, 45 months, Total sales: 1304590470
  Men Clothes: 4533 products, 45 months, Total sales: 123019610
  Motorcycles: 3518 products, 45 months, Total sales: 66752485
  Automobiles: 2982 products, 45 months, Total sales: 48809894
  Mobile & Gadgets: 9761 products, 45 months, Total sales: 190077070
  Audio: 2253 products, 45 months, Total sales: 41234174
  Mom & Baby: 6395 products, 45 months, Total sales: 228533838
  Baby & Kids Fashion: 5220 products, 45 months, Total sales: 84073527
  Sports & Outdoors: 5360 products, 45 months, Total sales: 194324480
  Women Clothes: 10469 products, 45 

## 8. Seasonal Decomposition - Product Level
Apply additive decomposition (Y = T + S + R) to each product with ≥24 months of data (minimum for reliable seasonal decomposition).

In [8]:
# Apply seasonal decomposition to products with ≥24 months
product_decompositions = []

for product in processed_products:
    # Require at least 24 months for reliable decomposition
    if len(product['time_series']) < 24:
        continue
    
    # Create time series with period index, convert to timestamp for decomposition
    dates_timestamp = product['dates'].to_timestamp()
    
    ts = pd.Series(
        product['time_series'],
        index=dates_timestamp
    )
    
    # Handle NaN values within the series (interpolate)
    ts_filled = ts.interpolate(method='linear', limit_direction='both')
    
    # Check if we have enough valid data after interpolation
    if ts_filled.isna().sum() > len(ts_filled) * 0.3:  # Skip if >30% NaN
        continue
    
    try:
        # Apply additive decomposition with 12-month period
        decomposition = seasonal_decompose(
            ts_filled,
            model='additive',
            period=12,
            extrapolate_trend='freq'
        )
        
        product_decompositions.append({
            'product_id': product['product_id'],
            'category': product['category'],
            'original': ts_filled,
            'trend': decomposition.trend,
            'seasonal': decomposition.seasonal,
            'residual': decomposition.resid,
            'dates': dates_timestamp
        })
    except Exception as e:
        print(f"Decomposition failed for product {product['product_id']}: {e}")
        continue

print(f"Successfully decomposed {len(product_decompositions)} products (≥24 months)")

Successfully decomposed 126556 products (≥24 months)


## 9. Seasonal Decomposition - Category Level
Apply decomposition to aggregated category sales for market-level trend analysis.

In [9]:
# Apply decomposition to category aggregates with ≥24 months
category_decompositions = []

for category, cat_data in category_aggregates.items():
    # Require at least 24 months for reliable decomposition
    if len(cat_data['sales']) < 24:
        continue
    
    # Create time series with timestamp index
    dates_timestamp = cat_data['dates'].to_timestamp()
    
    ts = pd.Series(
        cat_data['sales'],
        index=dates_timestamp
    )
    
    # Handle NaN values (interpolate)
    ts_filled = ts.interpolate(method='linear', limit_direction='both')
    
    # Check if we have enough valid data
    if ts_filled.isna().sum() > len(ts_filled) * 0.3:  # Skip if >30% NaN
        continue
    
    try:
        # Apply additive decomposition with 12-month period
        decomposition = seasonal_decompose(
            ts_filled,
            model='additive',
            period=12,
            extrapolate_trend='freq'
        )
        
        category_decompositions.append({
            'category': category,
            'original': ts_filled,
            'trend': decomposition.trend,
            'seasonal': decomposition.seasonal,
            'residual': decomposition.resid,
            'dates': dates_timestamp
        })
    except Exception as e:
        print(f"Decomposition failed for category {category}: {e}")
        continue

print(f"Successfully decomposed {len(category_decompositions)} categories (≥24 months)")

Successfully decomposed 30 categories (≥24 months)


## 10. Extract Metrics for Product-Level Analysis
Calculate key metrics: mean monthly sales, seasonal amplitude, trend slope, and residual anomalies (z-scores).

In [10]:
# Extract metrics for each decomposed product
product_metrics = []

for decomp in product_decompositions:
    # Calculate mean monthly sales
    mean_monthly_sales = decomp['original'].mean()
    
    # Calculate seasonal amplitude: (Max(S) - Min(S)) / Mean(T)
    seasonal_max = decomp['seasonal'].max()
    seasonal_min = decomp['seasonal'].min()
    trend_mean = decomp['trend'].mean()
    seasonal_amplitude = (seasonal_max - seasonal_min) / trend_mean if trend_mean != 0 else 0
    
    # Calculate trend slope using linear regression
    x = np.arange(len(decomp['trend']))
    y = decomp['trend'].values
    valid_mask = ~np.isnan(y)
    if valid_mask.sum() > 1:
        trend_slope, _ = np.polyfit(x[valid_mask], y[valid_mask], 1)
    else:
        trend_slope = 0
    
    # Calculate max residual z-score
    residuals = decomp['residual'].dropna()
    if len(residuals) > 0 and residuals.std() != 0:
        residual_zscores = np.abs((residuals - residuals.mean()) / residuals.std())
        max_residual_zscore = residual_zscores.max()
    else:
        max_residual_zscore = 0
    
    product_metrics.append({
        'product': decomp['product_id'],
        'category': decomp['category'],
        'mean_monthly_sales': mean_monthly_sales,
        'seasonal_amplitude': seasonal_amplitude,
        'trend_slope': trend_slope,
        'max_residual_zscore': max_residual_zscore
    })

# Create DataFrame
metrics_df = pd.DataFrame(product_metrics)

print(f"Extracted metrics for {len(metrics_df)} products")
print(f"\nMetrics summary:")
print(metrics_df.describe())

Extracted metrics for 126556 products

Metrics summary:
       mean_monthly_sales  seasonal_amplitude    trend_slope  \
count       126556.000000       126556.000000  126556.000000   
mean           725.692247           18.563135      -8.280920   
std           5146.983781           36.406209     229.304348   
min              0.000000            0.000000  -27838.214143   
25%             59.739766            4.909610     -13.197569   
50%            145.642735            8.884173      -2.213184   
75%            431.522222           16.234967       4.228130   
max         727191.955556          300.000000   25181.185040   

       max_residual_zscore  
count        126556.000000  
mean              4.183368  
std               1.285669  
min               0.000000  
25%               3.254372  
50%               4.464622  
75%               5.318657  
max               5.740880  


## 11. Export Results to CSV
Export the decomposition metrics to CSV for use in subsequent analysis (Task 2.2.3).

In [12]:
# Export metrics to CSV
output_filename = '../CSV/seasonal_decomposition_results.csv'
metrics_df.to_csv(output_filename, index=False)

print(f"Metrics exported to '{output_filename}'")
print(f"Total rows: {len(metrics_df)}")
print(f"\nColumns: {list(metrics_df.columns)}")

Metrics exported to '../CSV/seasonal_decomposition_results.csv'
Total rows: 126556

Columns: ['product', 'category', 'mean_monthly_sales', 'seasonal_amplitude', 'trend_slope', 'max_residual_zscore']


## 12. Visualization - Strategic Sample Selection
Identify products for visualization based on highest seasonal amplitude and residual anomalies.

In [13]:
# Select strategic samples for visualization
# Top 10 by seasonal amplitude
top_seasonal = metrics_df.nlargest(10, 'seasonal_amplitude')

# Top 10 by max residual z-score
top_anomaly = metrics_df.nlargest(10, 'max_residual_zscore')

# One representative per category (highest mean sales)
category_reps = metrics_df.loc[metrics_df.groupby('category')['mean_monthly_sales'].idxmax()]

print("Strategic Samples Selected:")
print(f"\nTop 10 by Seasonal Amplitude:")
print(top_seasonal[['product', 'category', 'seasonal_amplitude']])
print(f"\nTop 10 by Residual Anomalies:")
print(top_anomaly[['product', 'category', 'max_residual_zscore']])
print(f"\nCategory Representatives ({len(category_reps)} categories):")
print(category_reps[['product', 'category', 'mean_monthly_sales']])

Strategic Samples Selected:

Top 10 by Seasonal Amplitude:
                                                 product           category  \
21073  Adjustable Car Battery Holder Stabilizer Fixed...        Automobiles   
49455  GDPLUS BAVIN Waterproof Phone Bag Touch-Screen...   Mobile & Gadgets   
71760  Mini LED Candle Light | Flameless Candle Lamp ...      Home & Living   
782    (35-125kg)Padded Strapless bra Plus size Bra p...      Women Clothes   
2584   1-10Pcs Rice Sub Packaged Boxes / 280ML Refrig...      Home & Living   
3617   100*10cmTransparent Sole Protector For Shoe Se...        Women Shoes   
9210   20 Color Eyeshadow Palette Sequins Glitter Pea...             Beauty   
9372   200*200 Metal Rack Steel Rack Cold-rolled Meta...      Home & Living   
10127  20PCS PE Foam Wall Stickers 35*35cm Wall Decal...      Home & Living   
11872  3 sa 1 Solar USB Rechargeable LED Camping Ligh...  Sports & Outdoors   

       seasonal_amplitude  
21073               300.0  
49455          

## 13. Visualize Decomposition for Top Seasonal Products
Plot decomposition components (Trend, Seasonal, Residual) for products with highest seasonal amplitude.
Each product gets its own separate graph saved in a dedicated directory.

In [14]:
import os

# Create directory for top seasonal products
output_dir = '../seasonal_trend/visualizations/top_seasonal_products'
os.makedirs(output_dir, exist_ok=True)

# Visualize decomposition for top seasonal products (individual plots)
top_seasonal_ids = top_seasonal['product'].head(10).values

for product_id in top_seasonal_ids:
    # Find decomposition for this product
    decomp = next((d for d in product_decompositions if d['product_id'] == product_id), None)
    
    if decomp is None:
        continue
    
    # Create a 2x2 subplot for this product
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle(f"Seasonal Decomposition: {product_id}", fontsize=16, fontweight='bold')
    
    # Plot original series
    axes[0, 0].plot(decomp['dates'], decomp['original'], color='blue', linewidth=2)
    axes[0, 0].set_title('Original Series', fontsize=12)
    axes[0, 0].set_ylabel('Sales')
    axes[0, 0].grid(True, alpha=0.3)
    axes[0, 0].tick_params(axis='x', rotation=45)
    
    # Plot trend
    axes[0, 1].plot(decomp['dates'], decomp['trend'], color='green', linewidth=2)
    axes[0, 1].set_title('Trend Component', fontsize=12)
    axes[0, 1].set_ylabel('Trend')
    axes[0, 1].grid(True, alpha=0.3)
    axes[0, 1].tick_params(axis='x', rotation=45)
    
    # Plot seasonal
    axes[1, 0].plot(decomp['dates'], decomp['seasonal'], color='orange', linewidth=2)
    axes[1, 0].set_title('Seasonal Component', fontsize=12)
    axes[1, 0].set_ylabel('Seasonal')
    axes[1, 0].axhline(y=0, color='black', linestyle='--', alpha=0.3)
    axes[1, 0].grid(True, alpha=0.3)
    axes[1, 0].tick_params(axis='x', rotation=45)
    
    # Plot residual
    axes[1, 1].plot(decomp['dates'], decomp['residual'], color='red', linewidth=2)
    axes[1, 1].set_title('Residual Component', fontsize=12)
    axes[1, 1].set_ylabel('Residual')
    axes[1, 1].axhline(y=0, color='black', linestyle='--', alpha=0.3)
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    
    # Save with sanitized filename
    safe_filename = "".join(c for c in product_id if c.isalnum() or c in (' ', '-', '_')).rstrip()
    safe_filename = safe_filename.replace(' ', '_')[:100]  # Limit length
    filepath = os.path.join(output_dir, f'{safe_filename}.png')
    plt.savefig(filepath, dpi=300, bbox_inches='tight')
    plt.close()

print(f"Saved {len(top_seasonal_ids)} seasonal decomposition plots to '{output_dir}/'")
print(f"Files saved successfully!")

Saved 10 seasonal decomposition plots to '../seasonal_trend/visualizations/top_seasonal_products/'
Files saved successfully!


## 14. Visualize Decomposition for Top Anomaly Products
Plot decomposition with highlighted anomalies for products with highest residual z-scores.
Each product gets its own separate graph saved in a dedicated directory.

In [15]:
# Create directory for top anomaly products
output_dir = '../seasonal_trend/visualizations/top_anomaly_products'
os.makedirs(output_dir, exist_ok=True)

# Visualize decomposition for top anomaly products (individual plots)
top_anomaly_ids = top_anomaly['product'].head(10).values

for product_id in top_anomaly_ids:
    # Find decomposition for this product
    decomp = next((d for d in product_decompositions if d['product_id'] == product_id), None)
    
    if decomp is None:
        continue
    
    # Create a 2x2 subplot for this product
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle(f"Seasonal Decomposition with Anomalies: {product_id}", fontsize=16, fontweight='bold')
    
    # Plot original series
    axes[0, 0].plot(decomp['dates'], decomp['original'], color='blue', linewidth=2)
    axes[0, 0].set_title('Original Series', fontsize=12)
    axes[0, 0].set_ylabel('Sales')
    axes[0, 0].grid(True, alpha=0.3)
    axes[0, 0].tick_params(axis='x', rotation=45)
    
    # Plot trend
    axes[0, 1].plot(decomp['dates'], decomp['trend'], color='green', linewidth=2)
    axes[0, 1].set_title('Trend Component', fontsize=12)
    axes[0, 1].set_ylabel('Trend')
    axes[0, 1].grid(True, alpha=0.3)
    axes[0, 1].tick_params(axis='x', rotation=45)
    
    # Plot seasonal
    axes[1, 0].plot(decomp['dates'], decomp['seasonal'], color='orange', linewidth=2)
    axes[1, 0].set_title('Seasonal Component', fontsize=12)
    axes[1, 0].set_ylabel('Seasonal')
    axes[1, 0].axhline(y=0, color='black', linestyle='--', alpha=0.3)
    axes[1, 0].grid(True, alpha=0.3)
    axes[1, 0].tick_params(axis='x', rotation=45)
    
    # Plot residual with highlighted anomalies
    residuals = decomp['residual']
    axes[1, 1].plot(decomp['dates'], residuals, color='red', linewidth=2, alpha=0.6)
    
    # Highlight anomalies (|z-score| > 2)
    if residuals.std() != 0:
        z_scores = np.abs((residuals - residuals.mean()) / residuals.std())
        anomaly_mask = z_scores > 2
        axes[1, 1].scatter(decomp['dates'][anomaly_mask], residuals[anomaly_mask], 
                          color='darkred', s=100, zorder=5, label='Anomalies (|z| > 2)', marker='*')
        axes[1, 1].legend(loc='best')
    
    axes[1, 1].set_title('Residual Component with Anomalies', fontsize=12)
    axes[1, 1].set_ylabel('Residual')
    axes[1, 1].axhline(y=0, color='black', linestyle='--', alpha=0.3)
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    
    # Save with sanitized filename
    safe_filename = "".join(c for c in product_id if c.isalnum() or c in (' ', '-', '_')).rstrip()
    safe_filename = safe_filename.replace(' ', '_')[:100]
    filepath = os.path.join(output_dir, f'{safe_filename}.png')
    plt.savefig(filepath, dpi=300, bbox_inches='tight')
    plt.close()

print(f"Saved {len(top_anomaly_ids)} anomaly decomposition plots to '{output_dir}/'")
print(f"Files saved successfully!")

Saved 10 anomaly decomposition plots to '../seasonal_trend/visualizations/top_anomaly_products/'
Files saved successfully!


## 15. Visualize Category-Level Decomposition
Plot category-level decompositions for market trend analysis.
Each category gets its own separate graph saved in a dedicated directory.

In [16]:
# Create directory for category-level decompositions
output_dir = 'seasonal_trend/visualizations/category_decompositions'
os.makedirs(output_dir, exist_ok=True)

# Visualize category-level decompositions (individual plots)
if len(category_decompositions) > 0:
    for decomp in category_decompositions:
        # Create a 2x2 subplot for this category
        fig, axes = plt.subplots(2, 2, figsize=(15, 10))
        fig.suptitle(f"Category Decomposition: {decomp['category']}", fontsize=16, fontweight='bold')
        
        # Plot original series
        axes[0, 0].plot(decomp['dates'], decomp['original'], color='blue', linewidth=2)
        axes[0, 0].set_title('Original Series (Total Sales)', fontsize=12)
        axes[0, 0].set_ylabel('Total Sales')
        axes[0, 0].grid(True, alpha=0.3)
        axes[0, 0].tick_params(axis='x', rotation=45)
        
        # Plot trend
        axes[0, 1].plot(decomp['dates'], decomp['trend'], color='green', linewidth=2)
        axes[0, 1].set_title('Trend Component', fontsize=12)
        axes[0, 1].set_ylabel('Trend')
        axes[0, 1].grid(True, alpha=0.3)
        axes[0, 1].tick_params(axis='x', rotation=45)
        
        # Plot seasonal
        axes[1, 0].plot(decomp['dates'], decomp['seasonal'], color='orange', linewidth=2)
        axes[1, 0].set_title('Seasonal Component', fontsize=12)
        axes[1, 0].set_ylabel('Seasonal')
        axes[1, 0].axhline(y=0, color='black', linestyle='--', alpha=0.3)
        axes[1, 0].grid(True, alpha=0.3)
        axes[1, 0].tick_params(axis='x', rotation=45)
        
        # Plot residual
        axes[1, 1].plot(decomp['dates'], decomp['residual'], color='red', linewidth=2, alpha=0.7)
        axes[1, 1].set_title('Residual Component', fontsize=12)
        axes[1, 1].set_ylabel('Residual')
        axes[1, 1].axhline(y=0, color='black', linestyle='--', alpha=0.3)
        axes[1, 1].grid(True, alpha=0.3)
        axes[1, 1].tick_params(axis='x', rotation=45)
        
        plt.tight_layout()
        
        # Save with sanitized filename
        safe_filename = "".join(c for c in decomp['category'] if c.isalnum() or c in (' ', '-', '_')).rstrip()
        safe_filename = safe_filename.replace(' ', '_')[:100]
        filepath = os.path.join(output_dir, f'{safe_filename}.png')
        plt.savefig(filepath, dpi=300, bbox_inches='tight')
        plt.close()
    
    print(f"Saved {len(category_decompositions)} category decomposition plots to '{output_dir}/'")
    print(f"Files saved successfully!")
else:
    print("No category decompositions available for visualization")

Saved 30 category decomposition plots to 'seasonal_trend/visualizations/category_decompositions/'
Files saved successfully!


## 16. Seasonal Subseries Plot (Calendar View)
Visualize seasonal patterns across years for representative products.
Each product gets its own separate graph saved in a dedicated directory.

In [17]:
# Create directory for seasonal subseries plots
output_dir = '../seasonal_trend/visualizations/seasonal_subseries'
os.makedirs(output_dir, exist_ok=True)

# Create seasonal subseries plots for top seasonal products
def plot_seasonal_subseries(decomp, title, output_path):
    """Plot seasonal pattern by month across years"""
    df_plot = pd.DataFrame({
        'date': decomp['dates'],
        'value': decomp['original'].values
    })
    df_plot['year'] = df_plot['date'].dt.year
    df_plot['month'] = df_plot['date'].dt.month
    
    fig, ax = plt.subplots(figsize=(14, 6))
    
    # Plot each year as a separate line
    for year in sorted(df_plot['year'].unique()):
        year_data = df_plot[df_plot['year'] == year]
        ax.plot(year_data['month'], year_data['value'], 
               marker='o', label=str(year), linewidth=2, markersize=6)
    
    ax.set_xlabel('Month', fontsize=12, fontweight='bold')
    ax.set_ylabel('Sales', fontsize=12, fontweight='bold')
    ax.set_title(title, fontsize=14, fontweight='bold', pad=20)
    ax.set_xticks(range(1, 13))
    ax.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                        'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
    ax.legend(title='Year', bbox_to_anchor=(1.05, 1), loc='upper left', frameon=True)
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close()

# Plot for top seasonal products
top_for_subseries = top_seasonal['product'].head(10).values

for product_id in top_for_subseries:
    decomp = next((d for d in product_decompositions if d['product_id'] == product_id), None)
    if decomp:
        # Sanitize filename
        safe_filename = "".join(c for c in product_id if c.isalnum() or c in (' ', '-', '_')).rstrip()
        safe_filename = safe_filename.replace(' ', '_')[:100]
        filepath = os.path.join(output_dir, f'{safe_filename}.png')
        
        plot_seasonal_subseries(decomp, f"Seasonal Subseries - {product_id}", filepath)

print(f"Saved {len(top_for_subseries)} seasonal subseries plots to '{output_dir}/'")
print(f"Files saved successfully!")

Saved 10 seasonal subseries plots to '../seasonal_trend/visualizations/seasonal_subseries/'
Files saved successfully!


## 17. Summary and Interpretation
Review key findings from the seasonal decomposition analysis.

In [18]:
# Generate summary report
print("="*80)
print("SEASONAL DECOMPOSITION ANALYSIS SUMMARY")
print("="*80)

print(f"\n1. DATA PROCESSING:")
print(f"   - Total products in dataset: {len(df)}")
print(f"   - Products with ≥12 months: {len(processed_products)}")
print(f"   - Products decomposed (≥24 months): {len(product_decompositions)}")
print(f"   - Categories analyzed: {len(category_decompositions)}")

print(f"\n2. TOP PRODUCTS BY SEASONAL AMPLITUDE:")
print(f"   (Strong seasonal patterns - inventory optimization opportunities)")
for i, row in top_seasonal.head(5).iterrows():
    print(f"   {row['product'][:50]:50s} | Amplitude: {row['seasonal_amplitude']:.3f}")

print(f"\n3. TOP PRODUCTS BY TREND SLOPE:")
top_trending = metrics_df.nlargest(5, 'trend_slope')
print(f"   (Emerging interest - growing demand)")
for i, row in top_trending.iterrows():
    print(f"   {row['product'][:50]:50s} | Slope: {row['trend_slope']:.3f}")

print(f"\n4. TOP PRODUCTS BY RESIDUAL ANOMALIES:")
print(f"   (Viral events or short-term consumer surges)")
for i, row in top_anomaly.head(5).iterrows():
    print(f"   {row['product'][:50]:50s} | Max Z-Score: {row['max_residual_zscore']:.2f}")

print(f"\n5. CATEGORY-LEVEL INSIGHTS:")
for decomp in category_decompositions:
    trend_slope_cat = np.polyfit(range(len(decomp['trend'])), 
                                  decomp['trend'].fillna(method='ffill').values, 1)[0]
    seasonal_amp_cat = (decomp['seasonal'].max() - decomp['seasonal'].min()) / decomp['trend'].mean()
    print(f"   {decomp['category']:30s} | Trend Slope: {trend_slope_cat:8.2f} | "
          f"Seasonal Amp: {seasonal_amp_cat:.3f}")

print(f"\n6. OUTPUT FILES GENERATED:")
print(f"   - seasonal_trend/seasonal_decomposition_results.csv (metrics for {len(metrics_df)} products)")
print(f"   - seasonal_trend/visualizations/top_seasonal_products/ (individual product plots)")
print(f"   - seasonal_trend/visualizations/top_anomaly_products/ (individual anomaly plots)")
print(f"   - seasonal_trend/visualizations/category_decompositions/ (individual category plots)")
print(f"   - seasonal_trend/visualizations/seasonal_subseries/ (individual subseries plots)")

print("\n" + "="*80)

SEASONAL DECOMPOSITION ANALYSIS SUMMARY

1. DATA PROCESSING:
   - Total products in dataset: 7554662
   - Products with ≥12 months: 158412
   - Products decomposed (≥24 months): 126556
   - Categories analyzed: 30

2. TOP PRODUCTS BY SEASONAL AMPLITUDE:
   (Strong seasonal patterns - inventory optimization opportunities)
   Adjustable Car Battery Holder Stabilizer Fixed Bra | Amplitude: 300.000
   GDPLUS BAVIN Waterproof Phone Bag Touch-Screen Und | Amplitude: 300.000
   Mini LED Candle Light | Flameless Candle Lamp | We | Amplitude: 300.000
   (35-125kg)Padded Strapless bra Plus size Bra push  | Amplitude: 300.000
   1-10Pcs Rice Sub Packaged Boxes / 280ML Refrigerat | Amplitude: 300.000

3. TOP PRODUCTS BY TREND SLOPE:
   (Emerging interest - growing demand)
   3D vinyl Floor sticker ( 91.44* 15.24cm) self adhe | Slope: 25181.185
   3D Self-Adhesive Wallpaper Continuous Waterproof B | Slope: 10672.797
   Petsup Cat Wet Food Real Meat Delish 85g Real Chic | Slope: 10633.470
   ❀◇ Rece