<a href="https://colab.research.google.com/github/Rohit-JohnsoN/AI-Augmented-Business-Analytics-Dashboard/blob/main/Global_SuperStore_Sales_Analysis_with_Python_%E2%80%93_NewPrediction_com.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Step 1: Get Everything Setup and Configured
Setting up the required tools and configuration. This section is crucial for ensuring smooth execution of the code in subsequent sections.

**Note:** Always ensure you have the latest version of libraries to avoid compatibility issues.


In [1]:
# Install the latest version of the plotly library
!pip install plotly --upgrade

# Import necessary libraries
import pandas as pd             # Library for data manipulation and analysis
import plotly.express as px    # Plotly Express for simple syntax plotting
import plotly.io as pio        # To access Plotly's input-output module

# Set the default plotly template to 'seaborn' for consistent and visually appealing plots
px.defaults.template = "seaborn"

Collecting plotly
  Downloading plotly-6.3.0-py3-none-any.whl.metadata (8.5 kB)
Downloading plotly-6.3.0-py3-none-any.whl (9.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.8/9.8 MB[0m [31m41.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: plotly
  Attempting uninstall: plotly
    Found existing installation: plotly 5.24.1
    Uninstalling plotly-5.24.1:
      Successfully uninstalled plotly-5.24.1
Successfully installed plotly-6.3.0


## Step 2: Load the Data and Get a Preview
Loading the Superstore Sales dataset and getting a preview. Understanding your data structure is the first step in any analysis.

**Beginner Mistake to Avoid:** Not checking the first few rows of your dataset. Always inspect the initial rows to understand your data's structure.


In [2]:
# Load the Superstore Sales dataset

# URL pointing to the raw data file on GitHub
url = 'https://raw.githubusercontent.com/yannie28/Global-Superstore/master/Global_Superstore(CSV).csv'
# Read the CSV data from the URL into a pandas DataFrame
data = pd.read_csv(url)

# Display the first few rows of the data to understand its structure and get a quick overview
data.head()


Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Postal Code,City,...,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit,Shipping Cost,Order Priority
0,40098,CA-2014-AB10015140-41954,11/11/2014,11/13/2014,First Class,AB-100151402,Aaron Bergman,Consumer,73120.0,Oklahoma City,...,TEC-PH-5816,Technology,Phones,Samsung Convoy 3,221.98,2,0.0,62.15,40.77,High
1,26341,IN-2014-JR162107-41675,2/5/2014,2/7/2014,Second Class,JR-162107,Justin Ritter,Corporate,,Wollongong,...,FUR-CH-5379,Furniture,Chairs,"Novimex Executive Leather Armchair, Black",3709.4,9,0.1,-288.77,923.63,Critical
2,25330,IN-2014-CR127307-41929,10/17/2014,10/18/2014,First Class,CR-127307,Craig Reiter,Consumer,,Brisbane,...,TEC-PH-5356,Technology,Phones,"Nokia Smart Phone, with Caller ID",5175.17,9,0.1,919.97,915.49,Medium
3,13524,ES-2014-KM1637548-41667,1/28/2014,1/30/2014,First Class,KM-1637548,Katherine Murray,Home Office,,Berlin,...,TEC-PH-5267,Technology,Phones,"Motorola Smart Phone, Cordless",2892.51,5,0.1,-96.54,910.16,Medium
4,47221,SG-2014-RH9495111-41948,11/5/2014,11/6/2014,Same Day,RH-9495111,Rick Hansen,Consumer,,Dakar,...,TEC-CO-6011,Technology,Copiers,"Sharp Wireless Fax, High-Speed",2832.96,8,0.0,311.52,903.04,Critical


## Step 3: Analyize the Sales Data
The final step for using Python to build a sales analysis is to actually create the analysis.

### Analysis 1: Sales Analysis by Category
Visualizing total sales by category provides a high-level overview of where the majority of sales are coming from.

**Note:** Proper labeling in your plots is essential. It makes your charts easily understandable to anyone viewing them.


In [3]:
# Create a bar chart using Plotly Express
fig = px.bar(
    # Group data by 'Category', sum the values, then reset the index for plotting
    data.groupby('Category').sum().reset_index(),
    # Set 'Category' as the x-axis variable
    x='Category',
    # Set 'Sales' as the y-axis variable
    y='Sales',
    # Provide a title for the chart
    title='Total Sales by Category',
    # Rename the 'Sales' label for clarity
    labels={'Sales': 'Total Sales ($)'},
    # Use the 'seaborn' theme for the chart
    template='seaborn'
)
# Update the trace to display y-values on the bars and format them as currency
fig.update_traces(texttemplate='%{y:$,.0f}', textposition='outside')
# Display the generated chart
fig.show()


### Analysis 2: Monthly Sales Analysis
Breaking down sales on a monthly basis helps in understanding trends, seasonal variations, and anomalies.

**Beginner Mistake to Avoid:** Not converting date columns into a datetime datatype. Always ensure date-related operations are performed on columns of the correct datatype.


In [5]:
# Convert the 'Order Date' column from string format to datetime format
data['Order Date'] = pd.to_datetime(data['Order Date'])

# Extract the month and year from 'Order Date' and store it in a new column 'Order Month'
data['Order Month'] = data['Order Date'].dt.to_period('M').astype(str)

# Select only numeric columns before grouping and summing
monthly_sales = data.select_dtypes(include='number').groupby(data['Order Month']).sum().reset_index()

# Create a line chart to visualize monthly sales over time
fig = px.line(monthly_sales,
              x='Order Month',  # x-axis: Month and Year
              y='Sales',        # y-axis: Sales amount
              labels={'Sales': 'Monthly Sales ($)', 'Order Month': 'Month'},
              title='Monthly Sales Over Time')

# Update the y-axis to display values as currency (with $ prefix)
fig.update_layout(yaxis_tickprefix='$')

# Display the generated chart
fig.show()

###Analysis 3: Monthly Sales Over Time by Category
Brief description

**Note**:

In [8]:
# Group the data by both 'Order Month' and 'Category', calculate the sum for each grouping, then reset the index for plotting
monthly_sales_by_category = data[['Order Month', 'Category'] + list(data.select_dtypes(include='number').columns)].groupby(['Order Month', 'Category']).sum().reset_index()

# Create a line chart to visualize monthly sales over time, segmented by product category
fig = px.line(monthly_sales_by_category,
              x='Order Month',  # x-axis: Month and Year
              y='Sales',        # y-axis: Sales amount
              color='Category', # Differentiate lines by product category
              labels={'Sales': 'Monthly Sales ($)', 'Order Month': 'Month', 'Category': 'Product Category'},
              title='Monthly Sales Over Time by Category')

# Update the y-axis to display values as currency (with $ prefix)
fig.update_layout(yaxis_tickprefix='$')

# Display the generated chart
fig.show()

### Analysis 4: Profit vs Sales Analysis by Subcategory
Visualizing the relationship between profit and sales for each subcategory can reveal which products are the most lucrative.

**Note:** A scatter plot is particularly useful for this type of analysis as it visually separates high-profit, high-sales products from the rest.


In [11]:
# Group the data by 'Sub-Category', calculate the sum for each subcategory, then reset the index for plotting
subcategory_data = data[['Sub-Category'] + list(data.select_dtypes(include='number').columns)].groupby('Sub-Category').sum().reset_index()

# Create a scatter plot to visualize the relationship between profit and sales for each subcategory
fig = px.scatter(subcategory_data,
                 x='Sales',          # x-axis: Sales amount
                 y='Profit',         # y-axis: Profit amount
                 color='Sub-Category', # Differentiate points by subcategory using color
                 size='Sales',       # Vary point size based on sales amount
                 labels={'Sales': 'Total Sales ($)', 'Profit': 'Total Profit ($)'},
                 title='Profit vs Sales by Subcategory')

# Update the x and y axes to display values as currency (with $ prefix)
fig.update_layout(xaxis_tickprefix='$', yaxis_tickprefix='$')

# Display the generated scatter plot
fig.show()

### Analysis 5: Shipping Cost by Ship Mode
Visualizing the total shipping cost for each ship mode helps understand the distribution of shipping expenses across different modes.

In [12]:
# Group the data by 'Ship Mode' and calculate the sum of 'Shipping Cost' for each mode, then reset the index
shipping_cost_by_mode = data.groupby('Ship Mode')['Shipping Cost'].sum().reset_index()

# Create a bar chart to visualize the shipping cost by ship mode
fig = px.bar(shipping_cost_by_mode,
             x='Ship Mode',  # x-axis: Ship Mode
             y='Shipping Cost', # y-axis: Total Shipping Cost
             labels={'Shipping Cost': 'Total Shipping Cost ($)', 'Ship Mode': 'Ship Mode'},
             title='Total Shipping Cost by Ship Mode')

# Update the y-axis to display values as currency (with $ prefix)
fig.update_layout(yaxis_tickprefix='$')

# Display the generated bar chart
fig.show()