# Business Understanding
The purpose of this project is to build a system that can **accurately forecast future sales and inventory needs** to support **budgeting, purchasing, and strategic planning**.

## **1. Business Problem**
Retail and product-based businesses must **anticipate future sales** to:
- **Plan inventory levels**
- **Avoid stock-outs**
- **Prevent overstocking**
- **Improve budgeting and financial forecasting**

Without forecasting, the business risks:
- **Running out of stock** during demand peaks  
- **Holding too much inventory**, tying up cash  
- **Poor financial planning** and delayed decisions  

## **2. Objectives**
This project aims to:
- **Forecast monthly sales** for each product  
- **Forecast monthly revenue**  
- **Predict future inventory requirements**  
- **Identify products at risk of low stock / stock-out**  
- **Support budgeting and strategic planning** using predictive analytics  

## **3. Key Business Questions**
The forecasting system should answer:
- **What will sales look like in the next 3, 6, and 12 months?**  
- **Which products are growing or declining?**  
- **When will current stock fall below reorder level?**  
- **How much inventory will be needed in the coming months?**

## **4. Success Criteria**
The project is successful if:
- **Forecasting accuracy is acceptable**  
  (e.g., **MAPE < X%** depending on business tolerance)
- **Insights are clear and actionable** for both finance and operations  
- A **dashboard** is created to visualize sales trends, future predictions, and inventory risks  
- The system **supports budgeting, purchasing decisions, and financial planning**  


# Data Understanding

This project uses a rich, multi-dimensional dataset that simulates real-world retail operations.  
The goal is to build a forecasting system capable of predicting future sales, revenue, and inventory needs.  
To achieve this, multiple tables are designed with detailed attributes that support analytics, forecasting, and operational insights.

## **1. Identify Data Sources / Tables**

We work with four main tables: **Products, Sales, Inventory, Suppliers.**  
Each contains detailed columns to mimic realistic business datasets.

### **Products Table (Expanded Attributes)**  
Contains core product information for forecasting, pricing, and inventory planning.

**Columns include:**

- product_id (PK)  
- product_name  
- category  
- brand  
- sku_code  
- cost_price  
- selling_price  
- weight_kg  
- dimensions  
- launch_date  
- discontinued (0/1)

These attributes allow advanced analysis such as product lifecycle, profitability, and category-level forecasting.

---

### **Sales Table (Highly Detailed Transaction Data)**  
Contains every sales transaction with additional operational, customer, and pricing dimensions.

**Columns include:**

- sale_id (PK)  
- product_id (FK → Products)  
- sale_date  
- quantity_sold  
- selling_price_at_time  
- revenue  
- profit  
- customer_type (retail, wholesale, online, store)  
- region (e.g., Nairobi, Mombasa, Kisumu...)  
- payment_method (cash, mpesa, card, transfer)  
- discount_applied  
- order_channel (website, store, app)  
- promotion_flag (0/1)

This table is critical for understanding demand, seasonality, price effects, customer mix, and regional patterns.

---

### **Inventory Table (Operational Inventory Data)**  
Contains detailed stock status and logistics-related attributes.

**Columns include:**

- product_id (FK → Products)  
- current_stock  
- reorder_level  
- safety_stock  
- lead_time_days  
- last_restock_date  
- warehouse_location  
- max_capacity  
- stock_value

This supports forecasting future inventory needs, flagging risks, and planning replenishment cycles.

---

### **Suppliers Table (Optional but Realistic)**  
Provides external sourcing information for products.

**Columns include:**

- supplier_id (PK)  
- supplier_name  
- contact_email  
- country  
- delivery_lead_time_days  
- reliability_score

This helps simulate lead-time impacts on inventory forecasting.

---

## **2. Entity Relationship (ER) Diagram**

The database structure forms a clean relational model:

### **Core Relationships**
- **Products.product_id → Sales.product_id**  
  (Each product has many sales)

- **Products.product_id → Inventory.product_id**  
  (Each product has one inventory record)

- **Suppliers.supplier_id → Products.supplier_id**  
  (Each product has one supplier)

This structure supports complex joins for forecasting, trend analysis, and inventory management.

---

## **3. Initial Data Exploration**

Once the synthetic dataset is generated, we will perform:

### **A. Date Range Checks**
- Identify the earliest and latest sale dates  
- Confirm the dataset spans the full 3-year period  
- Ensure no gaps in date continuity

### **B. Row Counts**
- Products: ~10–20  
- Inventory: same number of products  
- Suppliers: 3–6  
- **Sales: 3,000+ entries (random, messy, no pattern)**

### **C. Sample Previews**
Examine first few rows from each table to verify:
- Column names and data types  
- Correct foreign key mappings  
- Realistic pricing, quantities, and profit values  
- Random distribution of customer types, regions, payments

---

## **4. Understand Sales Frequency & Patterns**

Since the dataset is synthetic but randomly generated, we verify:

- Sales occur **daily or randomly** across the timeline  
- Each product exhibits **unique demand patterns**  
- Seasonal boosts (e.g., holidays, mid-year sales) if applied  
- Variability in regions, customer types, and discounts

Understanding these helps choose appropriate forecasting models.

---

## **5. Potential Data Issues to Watch For**

Even synthetic datasets can highlight real business challenges:

- **Missing dates**  
- **Sudden spikes or dips** in sales  
- **Products with extremely low activity** (hard to forecast)  
- **Zero-sales months**  
- **Unusual discounts or promotions**  
- **Inventory levels far below demand**  
- **Lead times that are too long or too short**  

Identifying these early helps improve forecasting accuracy and business interpretation.

