# **1. Business Understanding**

## **Project Overview**
A Sales and Inventory Forecasting System was developed to help retail and product-driven businesses anticipate future demand, optimize stock levels, and support strategic decision-making. Accurate forecasting enables improved budgeting, efficient purchasing cycles, and prevention of operational risks such as stock-outs and overstocking. By analyzing historical sales and inventory movement over an 18-month period, the system identifies patterns, detects seasonality, and predicts future behavior across hundreds of products.

## **1.1 Business Problem**
Retail operations frequently encounter uncertainty in customer demand, fluctuating sales volumes, and inconsistent supplier lead times. Without data-driven forecasting, businesses may face:

- Stock-outs during high-demand periods  
- Overstocking that ties up capital and increases storage costs  
- Poor financial planning and inaccurate budgeting  
- Reduced ability to respond to market changes  

A forecasting system provides visibility into future demand and stock requirements, reducing operational risk.

## **1.2 Project Objectives**
The forecasting system is designed to:

- Predict monthly sales for each product  
- Forecast revenue trends across the business  
- Estimate future inventory requirements  
- Identify products at risk of depletion or overstocking  
- Support budgeting, procurement planning, and operational strategy  

These objectives ensure that sales performance, stock movement, and purchasing decisions are aligned with future expectations.

## **1.3 Key Business Questions**
The system aims to answer several critical operational and financial questions:

- What are the expected sales levels over the next 3, 6, and 12 months?  
- Which products are experiencing growth or decline?  
- When will current stock fall below reorder thresholds?  
- How much inventory should be replenished, and when?  
- Which suppliers contribute to delays due to long lead times?  

Answering these questions strengthens both operational decision-making and financial forecasting.

## **1.4 Success Criteria**
The project is considered successful if:

- Forecasting accuracy meets acceptable business thresholds  
  (for example, MAPE < 20%, depending on product behavior)  
- Sales, revenue, and stock forecasts are clear and actionable  
- Operational users can identify inventory risks early  
- A dashboard enables leaders to visualize trends and predictions  
- The forecasting results support data-driven procurement and budgeting  

## **Key Highlights**
- End-to-end forecasting pipeline covering sales, revenue, and inventory  
- 18 months of synthetic but realistic daily transactional data  
- Multi-table relational database reflecting real business operations  
- Forecasts support strategic planning, purchasing, and stock control  
- Designed for scalability to hundreds of products and multiple suppliers  


# **2. Data Understanding**

## **Overview of the Dataset**
The project uses a multi-table relational dataset designed to resemble a real retail or product-based business. The data spans an 18-month period and includes detailed information about products, daily sales transactions, supplier performance, and inventory conditions. Understanding the structure of these datasets is essential for detecting patterns, identifying trends, validating data quality, and selecting the appropriate forecasting techniques.

## **2.1 Data Sources / Tables**
Four primary tables form the foundation of the forecasting system:

### **A. Products Table**
Contains core product attributes used for sales forecasting, pricing analysis, and inventory planning.

**Key Columns**
- `product_id`  
- `product_name`  
- `category`  
- `brand`  
- `sku_code`  
- `cost_price`  
- `selling_price`  
- `weight_kg`  
- `dimensions`  
- `launch_date`  
- `discontinued` (0/1)

These attributes enable advanced analytics such as product lifecycle evaluation, category-level performance, and profitability estimation.

### **B. Sales Table (Daily Transaction Data)**
Captures every sales transaction over the 18-month period, enriched with customer, pricing, and operational details. This table is the core source for identifying demand patterns, seasonality, and price-driven behavior.

**Key Columns**
- `sale_id`  
- `product_id`  
- `sale_date`  
- `quantity_sold`  
- `selling_price_at_time`  
- `revenue`  
- `profit`  
- `customer_type` (retail, wholesale, online, store)  
- `region` (Nairobi, Mombasa, Kisumu, etc.)  
- `payment_method` (cash, mpesa, card, transfer)  
- `discount_applied`  
- `order_channel` (website, app, store)  
- `promotion_flag` (0/1)

This table supports analysis of demand shifts, pricing effects, regional behavior, and channel performance.

### **C. Inventory Table**
Provides operational stock information and logistics-related attributes that help determine inventory stability and future replenishment requirements.

**Key Columns**
- `product_id`  
- `current_stock`  
- `reorder_level`  
- `safety_stock`  
- `lead_time_days`  
- `last_restock_date`  
- `warehouse_location`  
- `max_capacity`  
- `stock_value`

This table supports forecasting stock depletion, identifying risks, and optimizing purchases.

### **D. Suppliers Table**
Contains supplier details used to model lead times, delivery reliability, and sourcing impact on forecasting.

**Key Columns**
- `supplier_id`  
- `supplier_name`  
- `contact_email`  
- `country`  
- `delivery_lead_time_days`  
- `reliability_score`

This enables simulation of supply chain delays and supplier performance trends.

## **2.2 Entity Relationship (ER) Structure**
The relational database follows a clean and scalable structure suitable for forecasting and operational analytics.

**Core Relationships**
- `Products.product_id` → `Sales.product_id`  
  (Each product has many sales records)

- `Products.product_id` → `Inventory.product_id`  
  (Each product has one inventory profile)

- `Suppliers.supplier_id` → `Products.supplier_id`  
  (Each product is linked to a supplier)

This design ensures data integrity and supports complex joins for forecasting, trend detection, and inventory risk analysis.

## **2.3 Initial Data Exploration Approach**
Before modeling, exploratory checks are performed to validate the generated dataset:

### **A. Date Range Checks**
- Identify earliest and latest `sale_date`  
- Confirm full 18-month coverage  
- Detect gaps or inconsistencies in the daily timeline  

### **B. Row Count Checks**
Expected approximate sizes:
- Products: *200–300 items*  
- Inventory: *same number of products*  
- Suppliers: *20–30 suppliers*  
- Sales: *3,000–5,000 daily transactions (randomized)*  

### **C. Sample Data Previews**
Preview the first rows from each table to confirm:
- Correct column names and formats  
- Realistic pricing and quantity distributions  
- Accurate foreign key relationships  
- Randomized, non-sequential ordering of sales  

## **2.4 Understanding Sales Patterns**
Several aspects of the daily sales data are examined prior to modeling:

- Frequency and randomness of transactions  
- Product-level demand variability  
- Seasonal patterns or spikes  
- Regional differences  
- Effects of promotions, discounts, or channels  

These patterns guide the selection of forecasting models and feature engineering strategies.

## **Key Highlights**
- Multi-table relational structure representing real retail operations  
- Detailed daily sales spanning 18 months  
- Randomized and unsorted transactions for realistic behavior  
- Comprehensive product, supplier, and inventory attributes  
- Designed for advanced forecasting and inventory optimization  
