# SuperStore Sales Exploratory Data Analysis with Python

## Scenario

Super Store is a family-owned retail business specializing in office supplies, furniture, and technology products. Operated by two owners, the business has experienced moderate growth but faces challenges in understanding its sales performance and customer behavior. The owners lack the technical expertise to interpret raw data or complex charts but are keen to explore how data analytics can optimize their operations.

The primary problem is the absence of an accessible, data-driven approach to decision-making. This gap leads to inefficiencies in inventory management, ineffective discounting strategies, and missed opportunities to cater to high-value customers. Without clear insights, the business struggles to maximize profit and customer loyalty in an increasingly competitive market.


## Purpose

The goal of this project is to provide actionable insights into the Super Store’s sales data through exploratory data analysis (EDA) and visualizations.

### Objectives:

The owners want to understand:

1. **Top-performing Product Categories:** They want to know which products drive the most revenue and profit to help them focus on items with the highest return.

2. **Sales and Profit by Region:** Understanding regional performance will help them decide where to focus marketing and sales efforts.

3. **Customer Segmentation Analysis:** They would like to know who their most valuable customers are and whether certain customer segments are more profitable than others.

4. **Seasonal Sales Trends:** Identifying peak times for certain products could help them manage inventory and anticipate demand.

5. **Discount Analysis:** The owners offer discounts regularly but are uncertain if discounts are helping or hurting profit. They want to know if there's an optimal discount rate that increases sales without eroding profit.


## Data Sources

The primary data source for this project is the **superstore.xls**, which contains the following fields:

- Order ID, Order Date, Ship Date, Ship Mode
- Customer ID, Customer Name, Segment
- Country, City, State, Postal Code, Region
- Product ID, Category, Sub-Category, Product Name
- Sale
- Discount
- Quantity
- Profit


## Deliverables

1. **EDA Report:** A concise PDF report summarizing key insights from the analysis with accompanying visualizations.
2. **Interactive Dashboard:** Hosted on a free platform like Streamlit Sharing for online access.
3. **User Training Documentation:** A step-by-step guide for using the dashboard.


## Installing and Loading Packages

## Prerequisites

- conda install anaconda::tabulate
- conda install conda-forge::plotly

In [1]:
# To update a package, run the following command in the terminal or command prompt:
# pip install -U package_name

# To install an exact version of a package, run the following command in the terminal or command prompt:
# !pip install package_name==desired_version

# After installing or updating the package, restart the Jupyter notebook.

#1. Install the `watermark` package.
#1.a. This package is used to record the versions of other packages used in this Jupyter notebook.
!pip install -q -U watermark

In [2]:
# Import the warnings

import warnings
warnings.filterwarnings("ignore")

In [3]:
# Import the useful libraries.

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [4]:
# Showing all the columns available in dataset

pd.set_option('display.max_columns',250) # Showing all the columns available in dataset
pd.set_option('display.max_rows',300) # To display all the rows
pd.set_option('display.width',1000)
pd.set_option('display.float_format',  '{:,.4f}'.format) 

In [5]:
%reload_ext watermark
%watermark -a "AmeduStephen"

Author: AmeduStephen



## Data Acquisition

In [6]:
# Get the directory of the script's location, assumed here to be '../notebooks' and to be on the same folder level with '../data'
script_dir = os.getcwd()
# Please note that os.getcwd() depends on the current working directory, which might not always align with the script's location  

# Navigate to the parent folder
parent_dir = os.path.abspath(os.path.join(script_dir, ".."))

# Construct the path to the Excel file in the desired relative location
raw_data_path = os.path.join(parent_dir, "data", "raw", "superstore.xls")

# Read the Excel file into a DataFrame
df = pd.read_excel(raw_data_path, sheet_name="Orders", index_col="Row ID")

## Data Profiling

In this section, we will try to find information about the data types used, look at the datasets for inconsistent data formats, and identify missing values and duplications

# The End