# Introduction: Interactive Business Dashboard in Streamlit

## Objective: Develop an interactive dashboard for analyzing sales, profit, and segment-wise performance

### Objective 1: Import Libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
print("✅ Libraries imported successfully")

✅ Libraries imported successfully


### Objective 2: Load Dataset

In [4]:
df = pd.read_csv("C:/Users/luqma/Downloads/archive/superstore.csv")

In [5]:
print("✅ Dataset loaded successfully")
print("Shape of dataset:", df.shape)
df.head()

✅ Dataset loaded successfully
Shape of dataset: (51290, 27)


Unnamed: 0,Category,City,Country,Customer.ID,Customer.Name,Discount,Market,记录数,Order.Date,Order.ID,...,Sales,Segment,Ship.Date,Ship.Mode,Shipping.Cost,State,Sub.Category,Year,Market2,weeknum
0,Office Supplies,Los Angeles,United States,LS-172304,Lycoris Saunders,0.0,US,1,2011-01-07 00:00:00.000,CA-2011-130813,...,19,Consumer,2011-01-09 00:00:00.000,Second Class,4.37,California,Paper,2011,North America,2
1,Office Supplies,Los Angeles,United States,MV-174854,Mark Van Huff,0.0,US,1,2011-01-21 00:00:00.000,CA-2011-148614,...,19,Consumer,2011-01-26 00:00:00.000,Standard Class,0.94,California,Paper,2011,North America,4
2,Office Supplies,Los Angeles,United States,CS-121304,Chad Sievert,0.0,US,1,2011-08-05 00:00:00.000,CA-2011-118962,...,21,Consumer,2011-08-09 00:00:00.000,Standard Class,1.81,California,Paper,2011,North America,32
3,Office Supplies,Los Angeles,United States,CS-121304,Chad Sievert,0.0,US,1,2011-08-05 00:00:00.000,CA-2011-118962,...,111,Consumer,2011-08-09 00:00:00.000,Standard Class,4.59,California,Paper,2011,North America,32
4,Office Supplies,Los Angeles,United States,AP-109154,Arthur Prichep,0.0,US,1,2011-09-29 00:00:00.000,CA-2011-146969,...,6,Consumer,2011-10-03 00:00:00.000,Standard Class,1.32,California,Paper,2011,North America,40


### Objective 3: Check Missing Values

In [6]:
df.isnull().sum()

Category          0
City              0
Country           0
Customer.ID       0
Customer.Name     0
Discount          0
Market            0
记录数               0
Order.Date        0
Order.ID          0
Order.Priority    0
Product.ID        0
Product.Name      0
Profit            0
Quantity          0
Region            0
Row.ID            0
Sales             0
Segment           0
Ship.Date         0
Ship.Mode         0
Shipping.Cost     0
State             0
Sub.Category      0
Year              0
Market2           0
weeknum           0
dtype: int64

In [7]:
print("✅ Missing values checked")

✅ Missing values checked


### Objective 4: Select Relevant Columns

In [8]:
df = df[['Order.Date', 'Region', 'Category', 'Sub.Category', 'Customer.Name', 'Sales', 'Profit']]

In [9]:
print("✅ Relevant columns selected")
df.head()

✅ Relevant columns selected


Unnamed: 0,Order.Date,Region,Category,Sub.Category,Customer.Name,Sales,Profit
0,2011-01-07 00:00:00.000,West,Office Supplies,Paper,Lycoris Saunders,19,9.3312
1,2011-01-21 00:00:00.000,West,Office Supplies,Paper,Mark Van Huff,19,9.2928
2,2011-08-05 00:00:00.000,West,Office Supplies,Paper,Chad Sievert,21,9.8418
3,2011-08-05 00:00:00.000,West,Office Supplies,Paper,Chad Sievert,111,53.2608
4,2011-09-29 00:00:00.000,West,Office Supplies,Paper,Arthur Prichep,6,3.1104


### Objective 5: Convert Date Column

In [10]:
df['Order.Date'] = pd.to_datetime(df['Order.Date'])

In [11]:
print("✅ Order Date column converted to datetime")

✅ Order Date column converted to datetime


### Objective 6: Quick Descriptive Statistics

In [12]:
df.describe()
print("✅ Descriptive statistics generated")

✅ Descriptive statistics generated


### Objective 7: Save Cleaned Dataset (for Streamlit)

In [13]:
df.to_csv("cleaned_superstore.csv", index=False)
print("✅ Cleaned dataset saved as cleaned_superstore.csv")


✅ Cleaned dataset saved as cleaned_superstore.csv


# 📌 Project Workflow Note  
This project is divided into **two parts** for better clarity and organization:  

---

## 1️⃣ Data Cleaning (Jupyter Notebook)  
- This Jupyter Notebook is focused only on **loading, cleaning, and preprocessing the dataset (CSV file)**.  
- After cleaning, the **dataset is saved** so it can be used in the dashboard.  
- Purpose: Ensure the data is **ready, accurate, and consistent** before visualization.  

---

## 2️⃣ Dashboard Development (Visual Studio Code - `app.py`)  
- The **Streamlit dashboard and UI code** are written in **Visual Studio Code** inside the file `app.py`.  
- The `app.py` file **uses the cleaned dataset** created in this notebook.  
- Purpose: Build an **interactive dashboard** for visualization and data exploration.  

---

✅ **Summary**:  
- **This Notebook = Data Cleaning & Saving Dataset**  
- **`app.py` (VS Code) = Streamlit Dashboard & UI** 