# ðŸš€ Quick Start Guide - E-Grocery Forecaster

HÆ°á»›ng dáº«n nhanh Ä‘á»ƒ cháº¡y pipeline tá»« Ä‘áº§u Ä‘áº¿n cuá»‘i.

## Má»¥c tiÃªu
- Load dá»¯ liá»‡u tá»« FreshRetail dataset
- Cháº¡y toÃ n bá»™ pipeline (WS0 â†’ WS6)
- Train model vÃ  Ä‘Ã¡nh giÃ¡
- Táº¡o predictions

## Dataset
- **Default**: FreshRetailNet-50K (hourly data)
- **Location**: `data/2_raw/`


In [1]:
# Setup
import sys
from pathlib import Path
import pandas as pd
import numpy as np

# Add project root to path
project_root = Path().resolve().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

print(f"Project root: {project_root}")
print(f"Python path: {sys.path[0]}")


Project root: D:\datastorm\E-Grocery_Forecaster
Python path: D:\datastorm\E-Grocery_Forecaster


## Step 1: Load Data


In [2]:
from src.config import setup_project_path, get_dataset_config, setup_logging
from src.pipelines._01_load_data import load_data

setup_project_path()
setup_logging()

# Get config
config = get_dataset_config()
print(f"Dataset: {config['name']}")
print(f"Temporal unit: {config['temporal_unit']}")
print(f"Time column: {config['time_column']}")

# Load data
dataframes, config = load_data()
print(f"\nLoaded {len(dataframes)} dataframes:")
for name, df in dataframes.items():
    print(f"  - {name}: {df.shape}")


Added D:\datastorm\E-Grocery_Forecaster\src to sys.path
Dataset: FreshRetailNet-50K
Temporal unit: hour
Time column: hour_timestamp
2025-11-13 01:25:32,323 - src.pipelines._01_load_data - INFO - [PIPELINE STEP 1: LOAD DATA]
2025-11-13 01:25:32,324 - src.pipelines._01_load_data - INFO - Active Dataset: FreshRetailNet-50K
2025-11-13 01:25:32,325 - src.pipelines._01_load_data - INFO - Data Directory: D:\datastorm\E-Grocery_Forecaster\data\2_raw
2025-11-13 01:25:32,326 - src.pipelines._01_load_data - INFO - Loading FreshRetailNet-50K dataset...
2025-11-13 01:25:32,327 - src.pipelines._01_load_data - INFO -   Loading freshretail_train.parquet...
2025-11-13 01:25:36,607 - src.pipelines._01_load_data - INFO - âœ“ Converted 'dt' to 'hour_timestamp'
2025-11-13 01:25:36,625 - src.pipelines._01_load_data - INFO - âœ“ Converted 'sale_amount' to 'sales_quantity'
2025-11-13 01:25:36,636 - src.pipelines._01_load_data - INFO - Cleaning raw data (handling errors and outliers)...
2025-11-13 01:25:37,803

AttributeError: 'NoneType' object has no attribute 'shape'

## Step 2: Feature Engineering (WS0-WS6)


In [None]:
from src.pipelines._02_feature_enrichment import main as run_feature_enrichment

# Run feature enrichment pipeline
run_feature_enrichment()

# Load master feature table
from src.config import OUTPUT_FILES
master_df = pd.read_parquet(OUTPUT_FILES['master_feature_table'])
print(f"Master feature table shape: {master_df.shape}")
print(f"\nColumns ({len(master_df.columns)}):")
print(master_df.columns.tolist()[:10], "...")


2025-11-13 01:26:06,102 - src.pipelines._02_feature_enrichment - INFO - STARTING FEATURE ENRICHMENT PIPELINE
2025-11-13 01:26:06,104 - src.pipelines._01_load_data - INFO - [PIPELINE STEP 1: LOAD DATA]
2025-11-13 01:26:06,104 - src.pipelines._01_load_data - INFO - Active Dataset: FreshRetailNet-50K
2025-11-13 01:26:06,104 - src.pipelines._01_load_data - INFO - Data Directory: D:\datastorm\E-Grocery_Forecaster\data\2_raw
2025-11-13 01:26:06,105 - src.pipelines._01_load_data - INFO - Loading FreshRetailNet-50K dataset...
2025-11-13 01:26:06,105 - src.pipelines._01_load_data - INFO -   Loading freshretail_train.parquet...
2025-11-13 01:26:09,800 - src.pipelines._01_load_data - INFO - âœ“ Converted 'dt' to 'hour_timestamp'
2025-11-13 01:26:09,808 - src.pipelines._01_load_data - INFO - âœ“ Converted 'sale_amount' to 'sales_quantity'
2025-11-13 01:26:09,816 - src.pipelines._01_load_data - INFO - Cleaning raw data (handling errors and outliers)...
2025-11-13 01:26:11,002 - src.pipelines._01_lo

[Parallel(n_jobs=12)]: Using backend MultiprocessingBackend with 12 concurrent workers.
[Parallel(n_jobs=12)]: Done  26 tasks      | elapsed:    5.7s
[Parallel(n_jobs=12)]: Done 1048 tasks      | elapsed:    6.0s
[Parallel(n_jobs=12)]: Done 29976 tasks      | elapsed:   11.1s
[Parallel(n_jobs=12)]: Done 64920 tasks      | elapsed:   18.6s
[Parallel(n_jobs=12)]: Done 108120 tasks      | elapsed:   28.2s
[Parallel(n_jobs=12)]: Done 169944 tasks      | elapsed:   37.3s
[Parallel(n_jobs=12)]: Done 229656 tasks      | elapsed:   49.5s
[Parallel(n_jobs=12)]: Done 309784 tasks      | elapsed:  1.1min
[Parallel(n_jobs=12)]: Done 390600 tasks      | elapsed:  1.4min
[Parallel(n_jobs=12)]: Done 474992 tasks      | elapsed:  1.8min


## Step 3: Model Training


In [None]:
import argparse
from src.pipelines._03_model_training import main as train_models

# Create args for training
class Args:
    tune = False
    full_data = False

args = Args()

# Train models
train_models(args)

print("\nâœ… Model training complete!")
print("Check models/ directory for trained models")


## Step 4: Make Predictions


In [None]:
from src.pipelines._05_prediction import main as make_predictions

# Make predictions
make_predictions()

print("\nâœ… Predictions complete!")
print("Check reports/ directory for prediction results")
