BUSINESS STATUS UNDERSTANDING

SalesPro Software Analysis

Project Overview:
The goal of this project is to analyze and understand the sales performance of SalesPro Software Suite. The dataset includes key business metrics like revenue, profit, units sold, and customer satisfaction across different regions and sales channels.

This analysis, called Exploratory Data Analysis (EDA), helps discover hidden trends, patterns, and relationships in the data. It also allows us to make informed business decisions by summarizing the data, detecting anomalies, and visualizing the findings using charts and graphs.

In [None]:
import numpy as np  # for numerical calculation
import pandas as pd  # for data analysis
import matplotlib.pyplot as plt  # used for data visualisation
import seaborn as sns      # for data visualization
plt.style.use('seaborn-whitegrid')



In [None]:
df=pd.read_excel('/kaggle/input/sales-software-business-growth-datanew/sales_software_business_growth_data_enhanced (2).xlsx')

In [None]:
df.info()  #details of columns

In [None]:
df.shape        #it will shows how many rows and columns

In [None]:
df.head(5)  #to find first records

In [None]:
df.tail(5)  #to find last records

In [None]:
# see a random records from the data 
df.sample(10)

In [None]:
# Check duplicates
df[df.duplicated()]

In [None]:
# drop the duplicates
df.drop_duplicates()

In [None]:
pd.isnull(df)

#true= null , false=not null

In [None]:
pd.isnull(df).sum()    #total sum of null values column wise

In [None]:
df['Revenue'].dtypes   #check datatype

In [None]:
df.columns   #check columns

In [None]:
#returns statistical description of the data in the DataFrame (i.e. count, mean, std_deviation, etc)
df.describe()

In [None]:
df[['Revenue', 'Units_Sold', 'Gross_Profit','Net_Profit']].describe()   #for specific columns

In [None]:
df['Region'].value_counts()

**Exploratory Data Analysis**

**By Region**

In [None]:
df['Region'].value_counts().plot.pie(autopct='%1.1f%%')

**BY SALES CHANNEL**

In [None]:
df['Sales_Channel'].value_counts()

In [None]:
df['Sales_Channel'].value_counts().plot.pie(autopct='%1.1f%%')

**BY REVENUE**

In [None]:
# Bar plot of Net_Profit by region 
plt.figure(figsize=(6,4))
ax = sns.barplot(x='Region', y='Net_Profit', data=df, palette='viridis')
plt.title('Net Profit by Region')

# Adding values on top of each bar
for p in ax.patches:
    ax.annotate(format(p.get_height(), '.2f'),
                (p.get_x() + p.get_width() / 2., p.get_height()),
                ha = 'center', va = 'center', 
                fontsize=12, color='black', 
                xytext=(0, 9), textcoords='offset points')

plt.show()

In [None]:
# visualize top sales channel

plt.figure(figsize=(6,4))
ax = sns.barplot(x='Sales_Channel', y='Units_Sold', data=df, palette='viridis')
plt.title('Sales by Unit sold')

# Adding values on top of each bar
for p in ax.patches:
    ax.annotate(format(p.get_height(), '.2f'),
                (p.get_x() + p.get_width() / 2., p.get_height()),
                ha = 'center', va = 'center', 
                fontsize=12, color='black', 
                xytext=(0, 9), textcoords='offset points')

plt.show()

In [None]:
# show the sales over years
df['Year'].value_counts().plot.pie(autopct='%1.1f%%')

In [None]:
plt.figure(figsize = (6, 4))
sns.boxplot(x = 'Region', y = 'Gross_Profit', data = df, palette = 'inferno')


In [None]:
plt.figure(figsize=(8,4))

ax = sns.countplot(data = df, x = 'Sales_Channel',hue = "Region")

# Add the values on top of the bars
for p in ax.patches:
    # Get the height of each bar (the count)
    ax.annotate(f'{p.get_height()}', 
                (p.get_x() + p.get_width() / 2, p.get_height()), 
                ha='center', va='center', 
                fontsize=10, color='black', 
                xytext=(0, 5), textcoords='offset points')

# Set plot title and show the plot
plt.title('Sales Channel Distribution by Region')
plt.show()

In [None]:
# Create a pairplot to visualize relationships between multiple features
sns.pairplot(df[['Revenue', 'Net_Profit', 'Gross_Profit','Region']])  # Replace with actual column names
plt.show()

In [None]:
# Compute the correlation matrix
corr_matrix = df[['License_Cost', 'Subscription_Cost', 'Support_Cost', 'Website_Maintenance_Cost']].corr()

# Create the heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='Blues', fmt='.2f', linewidths=0.5, cbar=True)

# Title and show the plot
plt.title('Correlation Heatmap')
plt.show()

CONCLUSION:

North America generated the highest revenue and profit, indicating strong performance in that region.
Certain regions, like Asia, showed lower sales, which could be a focus area for improvement.
The website channel has the highest units sold, suggesting it is the most effective sales channel.
we identified WEBSITE AND NORTH AMERICA REGION drove the most revenue and profit
