# Sales Performance Analysis Report.

**Report description**: It is a report of Sales Analysis for ABC company. The company provided 3 datasets (sales.csv, products.csv, customers.csv) and seeking to understand it's sales performance accross various products, regions and customers. 

Performing this analysis to clean, analyze, visualize, and summarize the
data into actionable business insights.


**Key Objectives**: Performing end to end analysis on company's data to get following insights:

1. Identifying key patterns.

2. Bottlenecks affecting overall profitability.

3. Growth areas


**There are some business questions to answer too**:

1. Which product categories and subcategories contribute the most to total sales and profit?

2. Which regions or customer segments have declining sales trends?

3. What is the month-over-month sales growth rate?

4. Who are the top 10 customers by revenue and profit margin?

5. Which discount strategies lead to lower profitability?

6. What are the overall KPIs: Total Sales, Profit Margin %, and Average Order Value?


**Expected deliverables are**:

1. A Data cleaning and Exploratory Data Analysis report.

2. Business Insights Summary -- 5-7 actionable recommendations for management.



## Sequence of Tasks in this Data Analysis Report.

Data gathering -> Data Assessing -> Data cleaning -> Data Transformation -> Exploratory Data Analysis (Including Modeling as Part of EDA) -> Answering Business Questions -> Detailed Conclusion

## Dependencies.

Contains important python modules and libraries for further analysis.

In [2]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

## Data Gathering.

3 datasets are given by company - Sales.csv, products.csv and customers.csv

Importing these datasets as Pandas DataFrame Object for further analysis.

In [3]:
sales = pd.read_csv('./raw_datasets/sales.csv')
products = pd.read_csv('./raw_datasets/products.csv')
customers = pd.read_csv('./raw_datasets/customers.csv')

## Data Assessment.

**Objectives**: To understand data's structure, what content it have?, what's the quality of the data!...

### Structure of data

All 3 datasets are rectangular objects (basically data arranged in rows and columns).

In [3]:
# printing rows and columns of all datasets.

print(f"In sales.csv, number of Rows are: {sales.shape[0]} and number of Columns are: {sales.shape[1]}")
print(f"In products.csv, number of Rows are: {products.shape[0]} and number of Columns are: {products.shape[1]}")
print(f"In customers.csv, number of Rows are: {customers.shape[0]} and number of Columns are: {customers.shape[1]}")

In sales.csv, number of Rows are: 15000 and number of Columns are: 8
In products.csv, number of Rows are: 500 and number of Columns are: 7
In customers.csv, number of Rows are: 2005 and number of Columns are: 7


Datasets are not of same size. sales.csv is largest dataset after it comes cutomers.csv then comes the smallest dataset products.csv.

### Content of data.

Understanding what each feature means of datasets is crucial. Here is brief descreption of all features of all 3 datasets.

#### **Feature description of sales.csv**

**Order_ID**: Generated when a new order is placed. Unique to every order and customer. 2 orders generated by same customer will be assigned 2 different Order_IDs. So it's truly unique in all dataset.

**Customer_ID**: A unique ID assigned to every customer who placed at least one order. One interesting point is, 1 Customer_ID can have more than 1 Order_IDs. 

**Product_ID**: A unique ID assigned to every product which company sells.

**Quantity**: Number of products ordered by the customer.

**Discount**: Discount given to customer on order (in rupees).

**Order_Value**: Total amount payed by the customer.

**Profit**: Profit generated by the company on specific order.

**Order_Date**: On which date order is placed.

In [None]:
# First 5 rows of dataset.
sales.head()

Unnamed: 0,Order_ID,Customer_ID,Product_ID,Quantity,Discount,Order_Value,Profit,Order_Date
0,O141598,C01653,P0340,5,0,572.349964,46.194363,2024-01-09
1,O569509,C01284,P0299,0,5,105.50105,18.342919,2023-02-16
2,O240973,C01082,P0479,1,10,501.322948,123.010896,2024-02-17
3,O914001,C01470,P0052,3,15,802.281732,44.784786,2023-12-17
4,O614116,C01187,P0242,4,20,1198.218159,263.222539,2023-12-31


#### **Feature description of products.csv**

**Product_ID**: A unique ID assigned to every product which company sells.

**Category**: To which category these products belong. (These categories are: Electronics, Sports, Home Decor, Books, Beauty, Clothing, Electrnics). (Electrnics seem as corrupt category in dataset.)

**SubCategory**: Every Category is then further divided into SubCategories. These are:

1. Sports: Outdoor, Team Sports, Gym.
2. Electronics: Accessories, Mobiles, Laptops.
3. Home Decor: Kitchen, Furniture, Lighting.
4. Books: Fiction, Educational, Non_Fiction.
5. Beauty: Skincare, Haircare, Makeup.
6. Clothing: Women, Kids, Men.
7. Electrnics: Skincare, Non_Fiction, Haircare, Fiction, Team Sports. (It is actually corrupted, needed to be corrected later).

**Cost_Price**: Cost of individual product.

**Selling_Price**: Prize at which product is sold.

**Supplier**: Name of the product supplier who supplies the product to company.

**High_Margin**: 0 if the profit margin on product is low, 1 if it's high.

In [None]:
# First 5 rows of dataset.
products.head()

Unnamed: 0,Product_ID,Category,SubCategory,Cost_Price,Selling_Price,Supplier,High_Margin
0,P0001,Sports,Outdoor,132.55,290.48,Supplier_A,0
1,P0002,Home Decor,Kitchen,94.99,165.56,Supplier_A,0
2,P0003,Sports,Team Sports,59.26,145.66,Supplier_D,0
3,P0004,Electrnics,Skincare,50.35,125.33,Supplier_A,0
4,P0005,Sports,Gym,286.14,487.79,Supplier_A,1


#### **Feature description of customers.csv**

**Customer_ID**: A unique ID assigned to every customer who placed at least one order.

**Age**: Age of customer (In years).

**Anual_Income**: Annual income of customer (In dollars).

**Region**: Which region customers belong to (categories: North, South, East, West).

**Segment**: Which Business segment customers belong to (categories: Retail, Small Business, Corporate).

**Loyalty_Score**: Loyalty of the customers to our company out of 0 to 10. Based on various company's internal and external factors.

**High_Value**: Categorising the value of customers based on several factors (Will discover in further analysis) . high = 1 and low = 0.

In [None]:
# First 5 rows of dataset.
customers.head()

Unnamed: 0,Customer_ID,Age,Annual_Income,Region,Segment,Loyalty_Score,High_Value
0,C00001,31,51934.65,South,Small Business,7.33,0
1,C00002,41,64551.67,South,Corporate,8.24,0
2,C00003,48,45114.86,West,Retail,7.71,0
3,C00004,55,65788.27,North,Small Business,8.43,1
4,C00005,36,42874.66,East,Retail,4.91,0


### Quality Assessment of the data.