Key Observations

In [5]:
import pandas as pd

# ✅ Corrected path — relative to your current working directory
file_path = 'dataset/sales_data.csv'
df = pd.read_csv(file_path)

# 1. Dataset shape
print(f"Total Rows: {df.shape[0]}")
print(f"Total Columns: {df.shape[1]}\n")

# 2. Column names and data types
print("Column Names and Data Types:")
print(df.dtypes, "\n")

# 3. Summary of numerical columns
print("Numerical Features Summary:")
print(df.describe(include=['int64', 'float64']), "\n")

# 4. Identify categorical columns
categorical_cols = df.select_dtypes(include=['object', 'category']).columns.tolist()
print(f"Categorical Columns ({len(categorical_cols)}): {categorical_cols}\n")

# 5. Unique values in each categorical column
print("Unique Values in Categorical Columns:")
for col in categorical_cols:
    print(f"{col}: {df[col].nunique()} unique values")

# 6. Sample rows
print("\nSample Data (First 5 rows):")
print(df.head())


Total Rows: 76000
Total Columns: 16

Column Names and Data Types:
Date                   object
Store ID               object
Product ID             object
Category               object
Region                 object
Inventory Level         int64
Units Sold              int64
Units Ordered           int64
Price                 float64
Discount                int64
Weather Condition      object
Promotion               int64
Competitor Pricing    float64
Seasonality            object
Epidemic                int64
Demand                  int64
dtype: object 

Numerical Features Summary:
       Inventory Level    Units Sold  Units Ordered         Price  \
count     76000.000000  76000.000000   76000.000000  76000.000000   
mean        301.062842     88.827316      89.090645     67.726028   
std         226.510161     43.994525     162.404627     39.377899   
min           0.000000      0.000000       0.000000      4.740000   
25%         136.000000     58.000000       0.000000     31.997500

# Dataset Summary

**Total Observations (Rows):** 76,000 records

Each row represents daily sales data for a specific product at a specific store.

**Total Features (Columns):**  
16 columns comprising a mix of categorical and numerical variables.

---

## Feature Overview

| Feature Name         | Type               | Description                                                                 |
|----------------------|--------------------|-----------------------------------------------------------------------------|
| Date                 | Categorical        | Transaction date (760 unique days, suggests ~2 years of daily data)         |
| Store ID             | Categorical        | Unique identifier for each of the 5 stores                                  |
| Product ID           | Categorical        | Unique identifier for each of the 20 products                               |
| Category             | Categorical        | Product category (e.g., Electronics, Clothing, Groceries) - 5 unique values |
| Region               | Categorical        | Geographical region of the store (North, South, East, West - 4 regions)     |
| Inventory Level      | Numerical (int)    | Units of inventory available in store                                       |
| Units Sold           | Numerical (int)    | Number of units sold on that particular date                                |
| Units Ordered        | Numerical (int)    | Number of units ordered for replenishment                                   |
| Price                | Numerical (float)  | Selling price of the product                                                |
| Discount             | Numerical (int)    | Discount applied on the product                                             |
| Weather Condition    | Categorical        | Weather on the day (Snowy, Sunny, Cloudy, Rainy - 4 types)                  |
| Promotion            | Numerical (int)    | Promotion indicator (1 = active, 0 = no promotion)                          |
| Competitor Pricing   | Numerical (float)  | Competitor's price for a comparable product                                 |
| Seasonality          | Categorical        | Season of the year (Winter, Summer, Spring, Fall - 4 values)                |
| Epidemic             | Numerical (int)    | Binary flag indicating epidemic conditions (1 = epidemic, 0 = normal)       |
| Demand               | Numerical (int)    | Target variable - actual demand for the product on that day                 |

---