### Overview of the dataset:
The attached dataset is a baseline – aggregated data holding key performance metrics across different selling points for different products. Each product can be represented in different selling points (DistributionUnit) and have different prices depending on where is being sold.

### Explanation of some fields:


BaselineDetailID – Id of the product in the DistributionUnit

BaselineID – ID of current baseline.

ProductID – General ID of the Product

DistributionUnit – sales channel.

NSRperUC -  Net Sales Revenue Per Unit Case, presented only for TCCC Manufacturer.

COGS – Cost of goods sold

ExitRate – Percentage of the volume that will be lost if the product is not represented in the store

### Overview of the task:
Create an executive summary for a provided baseline that can be shown to a market owner to help him better understand the market structure and TCCC's position. Below is the list of the visualizations that might be helpful, but this list is not a hard restriction.

### High-level aggregations:

- top Manufacturers

- top Brands by Volume and Manufacturer

- top Categories by Volume

- top Brand PackSize by Channel and Manufacturer by Volume for all, by Revenue for TCCC

### Competitor selection:

- for the top 3 TCCC brands: the top 3 SKUs from each brand present the most suitable competitor SKU.
Describe what criteria are used to define competitor SKU and why. How do TCCC SKUs perform in contrast to competitor SKUs?

### Product Performance:

- what are the key products that must always be presented in the store to avoid loss of sales? (Hist Exit Rates)
- what are the key products for which price change can lead to an unprofitable loss of sales? (Hint: Elasticity)

The result should be presented in a jupyter-notebook with clear, interactive visualizations. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
df = pd.read_excel('products-metrics/DemoBaseline.xlsx')
df.head()

In [None]:
df.shape

In [None]:
df.columns

# High-level aggregations:

### - top Manufacturers


In [None]:
df['Manufacturer'].value_counts().head()

### - top Brands by Volume and Manufacturer


In [None]:
brand_manuf = df.groupby(['Manufacturer', 'Brand']).agg({'Volume': 'sum'}).reset_index().sort_values(by='Volume', ascending=False)
brand_manuf.head()


### - top Categories by Volume


In [None]:
categ = df.groupby('Category').agg({'Volume': 'sum'}).reset_index().sort_values(by='Volume', ascending=False)
categ.head()

### - top Brand PackSize by Channel and Manufacturer by Volume for all, by Revenue for TCCC

In [None]:
# add revenue column
df['Revenue'] = df['Price'] * df['Volume']
# define DataFrame of TCCC
tccc_df = df[df['Manufacturer'] == "TCCC"]

In [None]:
# group the TCCC dataframe by PackSize, Brand, Channel and calculate the sum of Revenue
tccc_packsizes = df.groupby(['PackSize', 'Brand', 'DistributionUnit']) \
                  .agg({'Revenue': 'sum'}) \
                  .reset_index()
tccc_packsizes.head(3)

In [None]:
# group the dataframe by PackSize, Brand, Channel and Manufacturer and calculate the sum of Volume
packsizes = df.groupby(['PackSize', 'Brand', 'DistributionUnit', 'Manufacturer']) \
              .agg({'Volume': 'sum'}) \
              .reset_index()
packsizes.head(3)

In [None]:
# get top TCCC PackSizes by Revenue
tccc_top_ps = tccc_packsizes.groupby(['Brand', 'DistributionUnit']) \
                  .agg({'Revenue': 'max'}) \
                  .reset_index()
tccc_top_ps.head(3)

In [None]:
# get top PackSizes by Volume
top_ps = packsizes.groupby(['Brand', 'DistributionUnit', 'Manufacturer']) \
                  .agg({'Volume': 'max'}) \
                  .reset_index()
top_ps.head(3)

In [None]:
# join TCCC tables to get PackSize
tccc_pack_size = tccc_top_ps.merge(tccc_packsizes, on='Revenue', suffixes=('_', '')).loc[:, 'Revenue':'DistributionUnit']
tccc_pack_size.head()

In [None]:
# join tables to get PackSize
pack_size = top_ps.merge(packsizes, on='Volume', suffixes=('_', '')).loc[:, 'Volume':'Manufacturer']
pack_size.head()

### - Top Manufacturer by Volume

In [None]:
manuf = df.groupby(['Manufacturer']).agg({'Volume': 'sum'}).reset_index().sort_values(by='Volume', ascending=False)
manuf.head()

# Competitor selection:

- for the top 3 TCCC brands: the top 3 SKUs from each brand present the most suitable competitor SKU.
Describe what criteria are used to define competitor SKU and why. How do TCCC SKUs perform in contrast to competitor SKUs?

## Top 3 TCCC brands 
Brands are chosen by Volume

In [None]:
tccc_top = df[df['Manufacturer'] == 'TCCC'].groupby('Brand') \
            .agg({'Volume': 'sum'}) \
            .sort_values(by='Volume', ascending=False) \
            .reset_index() \
            .head(3)

In [None]:
tccc_top

## Top products for each top brand of TCCC
Products are also taken by Volume

In [None]:
tccc_top_products = {}  # dict with top TCCC brands : dataframe with top products by Volume
for brand in tccc_top['Brand']:
    tccc_top_products[brand] = df[df['Brand'] == brand].groupby(['ProductID', 'Category']) \
                            .agg({'Volume': 'sum'}) \
                            .sort_values(by='Volume', ascending=False) \
                            .reset_index() \
                            .head(3)
tccc_top_products

### DataFrame of TCCC competitors

In [None]:
no_tccc_df = df[(df['Manufacturer'] != 'TCCC')]
no_tccc_df.head(3)

## Top competitors for COCA COLA SKUs
Since all COCA COLA (CC) products are in the same category, the competitors' products are the same for each CC product

In [None]:
tccc_top_products['COCA COLA']

In [None]:
# get products that are in the same category that CC SKUs, and get top-3 of them by Volume
cc_competitors = no_tccc_df[(no_tccc_df['Brand'] != 'COCA COLA') 
                           & (no_tccc_df['Category'] == 'COLAS')].groupby('ProductID') \
                            .agg({'Volume': 'sum'}) \
                            .sort_values(by='Volume', ascending=False) \
                            .reset_index() \
                            .head(3)
cc_competitors

In [None]:
# list of COCA COLA competitor brands
no_tccc_df[no_tccc_df['ProductID'].isin(cc_competitors['ProductID'])]['Brand'].unique()

As we see, the only competitor to COCA COLA products is PEPSI COLA brand. 
<br>The volume of PEPSI COLA products is much (~ 1.5-40 times) less than the volume of COCA COLA products.

## Top competitors for CIEL SKUs
Since all CIEL products are in the same category, the competitors' products are the same for each CIEL product

In [None]:
tccc_top_products['CIEL']

In [None]:
# get products that are in the same category that CIEL SKUs, and get top-3 of them by Volume
ciel_competitors = no_tccc_df[(no_tccc_df['Brand'] != 'CIEL') 
                               & (no_tccc_df['Category'] == 'AGUA EMBOTELLADA')].groupby('ProductID') \
                                .agg({'Volume': 'sum'}) \
                                .sort_values(by='Volume', ascending=False) \
                                .reset_index() \
                                .head(3)
ciel_competitors

In [None]:
# list of CIEL competitor brands
no_tccc_df[no_tccc_df['ProductID'].isin(ciel_competitors['ProductID'])]['Brand'].unique()

CIEL competitors are E-PURA and BONAFONT.
<br>CIEL competitors are almost on the same level based on Volume metric.

## Top competitors for SPRITE SKUs
Since all SPRITE products are in the same category, the competitors' products are the same for each SPRITE product

In [None]:
tccc_top_products['SPRITE']

In [None]:
# get products that are in the same category that SPRITE SKUs, and get top-3 of them by Volume
sprite_competitors = no_tccc_df[(no_tccc_df['Brand'] != 'SPRITE') 
                               & (no_tccc_df['Category'] == 'R. FRUTALES')].groupby('ProductID') \
                                .agg({'Volume': 'sum'}) \
                                .sort_values(by='Volume', ascending=False) \
                                .reset_index() \
                                .head(3)
sprite_competitors

In [None]:
# list of SPRITE competitor brands
no_tccc_df[no_tccc_df['ProductID'].isin(sprite_competitors['ProductID'])]['Brand'].unique()

In [None]:
no_tccc_df[no_tccc_df['ProductID'] == 358130584]['Brand'].unique()

SPRITE competitors are SQUIRT, SEVEN UP, and AGA.
<br>SPRITE competitors are better than SPRITE based on Volume metric. The top Volume of SQUIRT is 4.8 times greater than the top Volume of SPRITE

# Product Performance:

- what are the key products that must always be presented in the store to avoid loss of sales? (Hist Exit Rates)


In [None]:
# add the column of the absolute amount of the lost volume
df['LostVolume'] = df['Volume'] * df['ExitRate']

In [None]:
# get top-5 products with brands with the biggest volume loss
df.groupby(['ProductID', 'Brand']) \
  .agg({'LostVolume': 'sum'}) \
  .sort_values('LostVolume', ascending=False) \
  .head()


- what are the key products for which price change can lead to an unprofitable loss of sales? (Hint: Elasticity)

In [None]:
# add the column of the loss of revenue
df['RevenueLoss'] = df['Revenue'] * df['PriceElasticity']

In [None]:
df.groupby(['ProductID', 'Brand']) \
  .agg({'RevenueLoss':'sum'}) \
  .sort_values('RevenueLoss') \
  .head()