# Example usage

Welcome to the `sales_analyzer` package! This package is designed to help small-sized businesses analyze their retail sales data efficiently, without needing extensive data analytics expertise. If you've ever felt overwhelmed by tools like Pandas or Scikit-learn, or wished for more retail-specific functions, you're in the right place.

In this notebook, we'll walk through how to use the `sales_analyzer` package to extract valuable insights from your sales data. We’ll demonstrate key functionalities using real-world examples, so you can start improving your business decisions right away.

## Imports

In [1]:
import pandas as pd
import numpy as np
import random
from datetime import datetime, timedelta
import random

from salesanalyzer.sales_summary_statistics import sales_summary_statistics
from salesanalyzer.segment_revenue_share import segment_revenue_share

## Create a sample data

We'll first create a sample data to work with.

In [2]:
def generate_random_dates(n):
    random.seed(1)
    # Get the current date
    today = datetime.now()
    # Calculate the date two years ago
    two_years_ago = today - timedelta(days=730)
    
    # Generate n random dates
    random_dates = [
        two_years_ago + timedelta(days=random.randint(0, (today - two_years_ago).days))
        for _ in range(n)
    ]

    return random_dates  

In [3]:
def anonymize_data(obs=50):
    random.seed(1)

    df = pd.DataFrame({})

    fake_products = [
        "Laptop", "Monitor", "Headphone"
    ]

    fake_cities = [
        "Vancouver", "Toronto", "Calgary"
    ]
    

    # Replace InvoiceNo with random unique numbers
    df['InvoiceNo'] = [f'INV-{random.randint(100000, 999999)}' for _ in range(obs)]

    # Replace StockCode with random alphanumeric strings
    df['StockCode'] = [f'SC{random.randint(1000, 9999)}' for _ in range(obs)]

    # Replace Description with random fake product names
    df['Description'] = [random.choice(fake_products) for _ in range(obs)]

    # Modify Quantity with random realistic values
    df['Quantity'] = [int(np.random.exponential(2)) + 1 for _ in range(obs)]

    # Replace InvoiceDate with random dates in the last two years
    df['InvoiceDate'] = generate_random_dates(obs)

    # Modify UnitPrice with random prices
    df['UnitPrice'] = [round(random.uniform(0.5, 50), 2) for _ in range(obs)]

    # Replace CustomerID with random unique identifiers
    df['CustomerID'] = [random.randint(10000, 99999) for _ in range(obs)]

    # Replace Country with random countries
    df['Country'] = [random.choice(fake_cities) for _ in range(obs)]

    return df

In [4]:
sample_data = anonymize_data()
sample_data.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,INV-240891,SC8174,Laptop,1,2023-06-09 15:48:36.776452,41.96,85732,Toronto
1,INV-696853,SC9123,Laptop,1,2024-08-27 15:48:36.776452,28.04,56304,Vancouver
2,INV-988598,SC4818,Laptop,3,2023-03-28 15:48:36.776452,32.29,70179,Toronto
3,INV-941235,SC6663,Headphone,1,2023-10-11 15:48:36.776452,9.7,45294,Vancouver
4,INV-900875,SC4782,Headphone,4,2023-05-23 15:48:36.776452,49.63,96404,Toronto


## Get Summary Statistics

One of the key features of `sales_analyzer` is its ability to quickly generate sales summary. Use the `analyze_sales_trends` function to generate insights like total revenue, average order value, and top selling products.

In [5]:
sales_summary_statistics(sample_data)

Unnamed: 0,total_revenue,unique_customers,average_order_value,top_selling_product_quantity,top_selling_product_revenue,average_revenue_per_customer
0,3528.79,50,70.5758,Headphone,Headphone,70.5758


## Segment Products and Calculate Revenue Share

Another feature of `saleanalyzer`, the `segment_revenue_share` function, segments products into three categories — cheap, medium, and expensive — based on their price, and calculates the respective share of total revenue contributed by each segment. This function is particularly useful for understanding revenue distribution across different pricing tiers.



In [6]:
# Segment revenue share
revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity')

# Display the results
print(revenue_share)

  PriceSegment  TotalRevenue  RevenueShare (%)
0        cheap        615.89             17.45
1    expensive       2186.27             61.96
2       medium        726.63             20.59
