# Example Usage

Welcome to the `salesanalyzer_mds` package! This package is designed to help small-sized businesses analyze their retail sales data efficiently, without needing extensive data analytics expertise. If you've ever felt overwhelmed by tools like Pandas or Scikit-learn, or wished for more retail-specific functions, you're in the right place.

In this notebook, we'll walk through how to use the `salesanalyzer_mds` package to extract valuable insights from your sales data. We’ll demonstrate key functionalities using real-world examples, so you can start improving your business decisions right away!

## Imports

Let us begin by setting up all our imports for this demonstration, which includes all 3 `salesanalyzer_mds` functions:
- `sales_summary_statistics`: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.
- `segment_revenue_share`: Segments products into three categories: cheap, medium, expensive, based on price, and calculates their respective share in total revenue.
- `predict_sales`: Predicts future sales based on the provided historical data and the target.
sales_summary_statistics: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.

In [1]:
import pandas as pd

from salesanalyzer_mds.sales_summary_statistics import sales_summary_statistics
from salesanalyzer_mds.segment_revenue_share import segment_revenue_share
from salesanalyzer_mds.predict_sales import predict_sales

## Create a sample data

Next, let us create a sample data to work with. 
> Note:
> `salesanalyzer_mds` package is not limited to the sample data columns and can be customized to suit your specific requirements.

In [2]:
sample_data = pd.DataFrame({
    'InvoiceNo' : ['INV-240891','INV-240892', 'INV-240893', 'INV-240894', 'INV-240895', 'INV-240896', 'INV-240898'],
    'Description': ['Laptop', 'Headphones', 'Headphones', 'Monitor', 'Headphones', 'Laptop', 'Monitor'],
    'Quantity' : [2, 3, 1, 3, 5, 2, 1],
    'InvoiceDate' : ['2023-06-09', '2023-07-11', '2023-08-21', '2023-08-25', '2023-09-10', '2023-10-30', '2023-10-30'],
    'UnitPrice' : [1500, 300, 250, 500, 420, 2000, 700],
    'CustomerID' : [85732, 70179, 85673, 22367, 57682, 99123, 45612],
    'Country' : ['USA', 'Singapoore', 'Germany', 'USA', 'Geramny', 'Singapoore', 'USA']
})

sample_data.head()

Unnamed: 0,InvoiceNo,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,INV-240891,Laptop,2,2023-06-09,1500,85732,USA
1,INV-240892,Headphones,3,2023-07-11,300,70179,Singapoore
2,INV-240893,Headphones,1,2023-08-21,250,85673,Germany
3,INV-240894,Monitor,3,2023-08-25,500,22367,USA
4,INV-240895,Headphones,5,2023-09-10,420,57682,Geramny


## Get Summary Statistics

One of the key features of `salesanalyzer_mds` is its ability to quickly generate sales summary. Use the `analyze_sales_trends()` function to generate insights like total revenue, average order value, and top selling products.
> Use help(sales_summary_statistics) for more information about the function

In [3]:
sales_summary_statistics(sample_data)

Unnamed: 0,Value
total_revenue,12450.0
unique_customers,7
average_order_value,1778.571429
top_selling_product_quantity,Headphones
top_selling_product_revenue,Laptop
average_revenue_per_customer,1778.571429


## Get Revenue Share for each Product Category

Another feature of `saleanalyzer`, the `segment_revenue_share()` function, segments products into three categories (cheap < medium < expensive) — based on their price, and calculates the respective share of total revenue contributed by each segment. This function is particularly useful for analyzing product sales data and understanding revenue distribution across different pricing tiers.
> Use help(sales_summary_statistics) for more information about the function

In [4]:
revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity')
revenue_share

Unnamed: 0,PriceSegment,TotalRevenue,RevenueShare (%)
0,cheap,1150,9.24
1,medium,4300,34.54
2,expensive,7000,56.22


## Predict Future Sales

Now that you have a good summary of your **past** sales, say, you want to peek into the **future** and predict how your products will sell in a month, 2 months or even a year? You can do this with `predict_sales()` function. This function uses a Random Forest machine learning model to make predictions on your specified target (e.g. quantity sold). The output will be a data frame with predicted values, and the model's performance score (Mean Squared Error).

> **Important** <br>
> `predict_sales()` checks for duplicate entries, and only considers unique data points <br>
> By default the function uses 70% data for training and 30% for testing, to change that you can pass test_size = 0.2 increase the ratio, if your data size is small 

In [5]:
new_data = pd.DataFrame({
    'InvoiceNo' : ['INV-250891','INV-250892'],
    'Description': ['Laptop', 'Headphones'],
    'InvoiceDate' : ['2025-01-30', '2025-02-01'],
    'UnitPrice' : [2000, 300],
    'CustomerID' : [85732, 70179],
    'Country' : ['USA', 'Singapoore']
})

predict_sales(sample_data, 
              new_data, 
              numeric_features = ['UnitPrice'], 
              categorical_features = ['Description', 'Country'], 
              target = 'Quantity', 
              date_feature = 'InvoiceDate')

MSE of the model: 6.7


Unnamed: 0,Predicted values
0,1.77
1,1.33


If you don't want to include a date feature into your analysis, you can omit it from your arguments.


In [6]:
predict_sales(sample_data, new_data, ['UnitPrice'], ['Description', 'Country'], 'Quantity', test_size=0.2)

MSE of the model: 1.72


Unnamed: 0,Predicted values
0,1.89
1,1.88


This is the end of the tutorial, where you have seen how to get sales data insights using our package.