# Where to Focus - Item Comparisons

### Introduction

### 1. Item Comparisons

Item comparison occurs when we are comparing numbers or percentages between categories.  Sometimes these numbers are percentages:

* "Chocolate ice cream accounted for 25 percent of sales."

And sometimes these are the baseline numbers

* "Chocolate ice cream accounted for $25 mil in sales."

> Item comparisons almost always involve grouping.  So be careful to choose the correct units when performing that grouping.  So if we chose regional sales -- remember that this may not mean one region is performing better than another -- that region may just have more stores.

### Metrics to Compare on

Now, once the data is clean, to perform the item comparison, we can use a group by (whether in pandas or SQL).

In [2]:
import pandas as pd
df = pd.read_csv('./sales_data.csv', index_col = 0)
df[:2]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address,Month,Sales,City,Hour
0,295665,Macbook Pro Laptop,1,1700.0,2019-12-30 00:01:00,"136 Church St, New York City, NY 10001",12,1700.0,New York City,0
1,295666,LG Washing Machine,1,600.0,2019-12-29 07:03:00,"562 2nd St, New York City, NY 10001",12,600.0,New York City,7


For example, we can check our total product sales by price with something like the following.

In [9]:
top_products = df.groupby(['Product'])['Price Each'].agg(['sum']).sort_values('sum', ascending = False)
top_products[:3]

Unnamed: 0_level_0,sum
Product,Unnamed: 1_level_1
Macbook Pro Laptop,8030800.0
iPhone,4789400.0
ThinkPad Laptop,4127958.72


And we can even turn this into a function.

In [15]:
def build_grouped_by(df, col, target, agg = 'sum', agg_name = None):
    grouped_data = df.groupby([col])[target].agg([agg]).sort_values(agg, ascending = False)
    if agg_name:
        return grouped_data.rename({agg: agg_name})
    else:
        grouped_data

So then if we want to break down our data across different dimensions we can.

In [13]:
cols = ['City', 'Hour']
target = 'Price Each'
totals_by_col = [build_grouped_by(df, 'City', target, agg = 'sum') for col in cols]

totals = dict(zip(cols, totals_by_col))

In [None]:
And now we can see the total 

In [14]:
totals['City']

Unnamed: 0_level_0,sum
City,Unnamed: 1_level_1
San Francisco,8211461.74
Los Angeles,5421435.23
New York City,4635370.83
Boston,3637409.77
Atlanta,2779908.2
Dallas,2752627.82
Seattle,2733296.01
Portland,2307747.47
Austin,1809873.61


In [None]:
def build_grouped_by(table_name, col, target, engine, order_by_col = False):
    if order_by_col:
        query = f"""select {col}, sum({target}) total_amount from {table_name} group by {col} order by {col} asc"""
    else:
        query = f"""select {col}, sum({target}) total_amount from {table_name} group by {col} order by total_amount desc"""
    grouped = pd.read_sql(query, engine)
    return grouped

Now actually calculating item comparisons, is normally involves an aggregation.

In [None]:
def build_grouped_by(table_name, col, target, engine, order_by_col = False):
    if order_by_col:
        query = f"""select {col}, sum({target}) total_amount from {table_name} group by {col} order by {col} asc"""
    else:
        query = f"""select {col}, sum({target}) total_amount from {table_name} group by {col} order by total_amount desc"""
    grouped = pd.read_sql(query, engine)
    return grouped

### Resources

[Unit Economics](https://www.paddle.com/resources/unit-economics)

[Sales Product Data](https://www.kaggle.com/datasets/knightbearr/sales-product-data)