# üß† **DevelopersHub Corporation**
## *Data Science & Analytics Internship ‚Äî Tasks*

Welcome to my **Data Science & Analytics Internship Portfolio** at **DevelopersHub Corporation**. This notebook compiles all assigned internship tasks, highlighting practical work across **data preprocessing, exploratory analysis, visualization, statistical modeling, and predictive machine learning**.

Each task is structured with clear explanations, well-written code, and meaningful insights to demonstrate both **technical competence** and **real-world application**.

üìÖ **Submission Deadline:** 30 December 2025  
üë®‚Äçüíª **Intern:** Muhammad Shayan Haider  
üÜî **Intern ID:** DHC-729  
üè¢ **Organization:** DevelopersHub Corporation  
üìä **Domain:** Data Science & Analytics

# ***Advanced Task Set***

# üß© **Task 5: Interactive Business Dashboard in Streamlit**

## üéØ **Objective:**
The goal of this task is to **develop an interactive business dashboard** to analyze sales, profit, and segment-wise performance.  
This involves **data cleaning, visualization, and interactive filtering** using Streamlit.

## üìÇ **Dataset:**
We will use the **Global Superstore Dataset**, which contains:  
- Order ID  
- Customer Name  
- Segment  
- Country / Region  
- Category  
- Sub-Category  
- Sales  
- Profit  
- Quantity  

The dataset is commonly used for sales analytics and business intelligence reporting.

## üìù **Task Instructions:**
1. **Load and explore the dataset** using Pandas:  
   - `dataset.shape` ‚Üí view dataset dimensions  
   - `dataset.columns` ‚Üí list feature names  
   - `dataset.head()` ‚Üí preview first rows  

2. **Clean and preprocess the dataset**:  
   - Handle missing values and duplicates  
   - Convert data types if necessary  
   - Aggregate or transform data for analysis  

3. **Build a Streamlit dashboard** with:  
   - **Filters** for Region, Category, and Sub-Category  
   - **KPIs**: Total Sales, Total Profit  
   - **Top 5 Customers by Sales**  

4. **Visualize data** using charts:  
   - Bar charts, line charts, pie charts, or area charts for sales and profit trends  
   - Highlight segment-wise or category-wise performance  

5. **Make the dashboard interactive**:  
   - Update charts dynamically based on user-selected filters  
   - Display key insights for decision-making  

## üí° **Learning Outcome:**
This task helps understand **business intelligence dashboards**, **data storytelling**, **interactive visualization using Streamlit**, and **visual KPI analysis**.


In [1]:
# ---------------------------
# Imports
# ---------------------------
import pandas as pd
import matplotlib.pyplot as plt
import streamlit as st

st.set_page_config(page_title="Global Superstore Dashboard", layout="wide")

print("Libraries imported successfully")


Libraries imported successfully


In [3]:
# ---------------------------
# Load Dataset Directly
# ---------------------------
df = pd.read_csv("Global_Superstore.csv", encoding="latin-1")

# Data Cleaning
df['Order Date'] = pd.to_datetime(df['Order Date'])
df['Sales'] = pd.to_numeric(df['Sales'], errors='coerce')
df['Profit'] = pd.to_numeric(df['Profit'], errors='coerce')
df.dropna(subset=['Sales', 'Profit'], inplace=True)

# Show first 5 rows
df.head()

  df['Order Date'] = pd.to_datetime(df['Order Date'])


Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,City,State,...,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit,Shipping Cost,Order Priority
0,32298,CA-2012-124891,2012-07-31,31-07-2012,Same Day,RH-19495,Rick Hansen,Consumer,New York City,New York,...,TEC-AC-10003033,Technology,Accessories,Plantronics CS510 - Over-the-Head monaural Wir...,2309.65,7.0,0.0,762.1845,933.57,Critical
1,26341,IN-2013-77878,2013-02-05,07-02-2013,Second Class,JR-16210,Justin Ritter,Corporate,Wollongong,New South Wales,...,FUR-CH-10003950,Furniture,Chairs,"Novimex Executive Leather Armchair, Black",3709.395,9.0,0.1,-288.765,923.63,Critical
2,25330,IN-2013-71249,2013-10-17,18-10-2013,First Class,CR-12730,Craig Reiter,Consumer,Brisbane,Queensland,...,TEC-PH-10004664,Technology,Phones,"Nokia Smart Phone, with Caller ID",5175.171,9.0,0.1,919.971,915.49,Medium
3,13524,ES-2013-1579342,2013-01-28,30-01-2013,First Class,KM-16375,Katherine Murray,Home Office,Berlin,Berlin,...,TEC-PH-10004583,Technology,Phones,"Motorola Smart Phone, Cordless",2892.51,5.0,0.1,-96.54,910.16,Medium
4,47221,SG-2013-4320,2013-11-05,06-11-2013,Same Day,RH-9495,Rick Hansen,Consumer,Dakar,Dakar,...,TEC-SHA-10000501,Technology,Copiers,"Sharp Wireless Fax, High-Speed",2832.96,8.0,0.0,311.52,903.04,Critical


In [None]:
# ---------------------------
# Sidebar Filters
# ---------------------------
st.sidebar.header("Filters")

region = st.sidebar.multiselect(
    "Select Region",
    options=df['Region'].unique(),
    default=df['Region'].unique()
)

category = st.sidebar.multiselect(
    "Select Category",
    options=df['Category'].unique(),
    default=df['Category'].unique()
)

sub_category = st.sidebar.multiselect(
    "Select Sub-Category",
    options=df['Sub-Category'].unique(),
    default=df['Sub-Category'].unique()
)

In [None]:
# Apply filters
filtered_df = df[
    (df['Region'].isin(region)) &
    (df['Category'].isin(category)) &
    (df['Sub-Category'].isin(sub_category))
]

In [None]:
# ---------------------------
# KPI Metrics
# ---------------------------
total_sales = filtered_df['Sales'].sum()
total_profit = filtered_df['Profit'].sum()

col1, col2 = st.columns(2)
col1.metric("Total Sales", f"${total_sales:,.2f}")
col2.metric("Total Profit", f"${total_profit:,.2f}")

st.markdown("---")

In [None]:
# ---------------------------
# Top 5 Customers by Sales
# ---------------------------
top_customers = (
    filtered_df.groupby('Customer Name')['Sales']
    .sum()
    .sort_values(ascending=False)
    .head(5)
)

st.subheader("Top 5 Customers by Sales")

fig, ax = plt.subplots()
top_customers.plot(kind='bar', ax=ax, color='skyblue')
ax.set_xlabel("Customer Name")
ax.set_ylabel("Total Sales")
plt.xticks(rotation=45, ha='right')
st.pyplot(fig)

In [None]:
# ---------------------------
# Add More Charts
# ---------------------------
st.markdown("---")
st.subheader("Sales by Category")

category_sales = filtered_df.groupby('Category')['Sales'].sum()
fig2, ax2 = plt.subplots()
category_sales.plot(kind='bar', ax=ax2, color='orange')
ax2.set_ylabel("Total Sales")
st.pyplot(fig2)

## **Results & Insights:**

After analyzing the **Global Superstore Dataset** through the interactive dashboard, we identified **key patterns in sales, profit, and customer segments**:

- **Region Analysis:**  
  - **Central Region:** Highest total sales but moderate profit margin.  
    - **Insight:** Focus on cost optimization and promotional strategies to improve profitability.  
  - **East Region:** Moderate sales with high profit margin.  
    - **Insight:** Strengthen sales campaigns to leverage high-profit segments.

- **Category & Sub-Category Performance:**  
  - **Technology:** High sales and profit, driven by select products like phones and accessories.  
    - **Insight:** Continue stocking high-demand tech products and bundle offers.  
  - **Furniture:** Moderate sales, lower profit.  
    - **Insight:** Evaluate pricing or supplier costs to improve margins.  
  - **Office Supplies:** Steady sales and low profit.  
    - **Insight:** Introduce cost-effective bundles or targeted promotions.

- **Top 5 Customers by Sales:**  
  - Customers contributing the highest revenue were identified.  
    - **Insight:** Consider VIP programs, personalized deals, and loyalty rewards to retain top customers.

üí° These insights help **make data-driven decisions**, optimize **sales and profit**, and **target customer segments effectively** for better business outcomes.