### Overview of the dataset:
The attached dataset is a baseline – aggregated data holding key performance metrics across different selling points for different products. Each product can be represented in different selling points (DistributionUnit) and have different prices depending on where is being sold.

### Explanation of some fields:


BaselineDetailID – Id of the product in the DistributionUnit

BaselineID – ID of current baseline.

ProductID – General ID of the Product

DistributionUnit – sales channel.

NSRperUC -  Net Sales Revenue Per Unit Case, presented only for TCCC Manufacturer.

COGS – Cost of goods sold

ExitRate – Percentage of the volume that will be lost if the product is not represented in the store

### Overview of the task:
Create an executive summary for a provided baseline that can be shown to a market owner to help him better understand the market structure and TCCC's position. Below is the list of the visualizations that might be helpful, but this list is not a hard restriction.

### High-level aggregations:

- top Manufacturers

- top Brands by Volume and Manufacturer

- top Categories by Volume

- top Brand PackSize by Channel and Manufacturer by Volume for all, by Revenue for TCCC

### Competitor selection:

- for the top 3 TCCC brands: the top 3 SKUs from each brand present the most suitable competitor SKU.
Describe what criteria are used to define competitor SKU and why. How do TCCC SKUs perform in contrast to competitor SKUs?

### Product Performance:

- what are the key products that must always be presented in the store to avoid loss of sales? (Hist Exit Rates)
- what are the key products for which price change can lead to an unprofitable loss of sales? (Hint: Elasticity)

The result should be presented in a jupyter-notebook with clear, interactive visualizations. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
df = pd.read_excel('products-metrics/DemoBaseline.xlsx')
df.head()

Unnamed: 0,BaselineDetailID,BaselineID,ProductID,DistributionUnit,CurrencyCode,Price,NSRperUC,EnvironmentalTaxPerUC,OperatingCOGSperUC,WarehouseCostPerUC,...,ExitRate,RatioNonOperatingCOGStoNSR,Distribution,Description,Manufacturer,Category,Brand,Flavor,PackSize,Packaging
0,816018,451,358130194,SOUTH,484,62.8757,0.0,0,0.0,0,...,12.71589,0,0.001,ADAS_PEPSICO_SEVEN UP LIMONADA_LIMON_1500ML_EN...,PEPSICO,ADAS,SEVEN UP LIMONADA,LIMON,1500,BOTELLA DE PLASTICO_NR
1,816019,451,358130194,EAST,484,65.9368,0.0,0,0.0,0,...,12.71589,0,0.001,ADAS_PEPSICO_SEVEN UP LIMONADA_LIMON_1500ML_EN...,PEPSICO,ADAS,SEVEN UP LIMONADA,LIMON,1500,BOTELLA DE PLASTICO_NR
2,816020,451,358130194,WEST,484,62.8757,0.0,0,0.0,0,...,12.71589,0,9.0,ADAS_PEPSICO_SEVEN UP LIMONADA_LIMON_1500ML_EN...,PEPSICO,ADAS,SEVEN UP LIMONADA,LIMON,1500,BOTELLA DE PLASTICO_NR
3,816021,451,358130194,NORTH,484,62.8757,0.0,0,0.0,0,...,12.71589,0,0.001,ADAS_PEPSICO_SEVEN UP LIMONADA_LIMON_1500ML_EN...,PEPSICO,ADAS,SEVEN UP LIMONADA,LIMON,1500,BOTELLA DE PLASTICO_NR
4,816014,451,358130195,SOUTH,484,126.7034,75.13333,0,22.429,0,...,9.63984,0,0.12124,ADAS_COCA-COLA COMPANY_DEL VALLE Y NADA_NARANJ...,TCCC,ADAS,DEL VALLE Y NADA,NARANJA,600,BOTELLA DE PLASTICO_NR


In [3]:
df.shape

(2843, 24)

In [4]:
df.columns

Index(['BaselineDetailID', 'BaselineID', 'ProductID', 'DistributionUnit',
       'CurrencyCode', 'Price', 'NSRperUC', 'EnvironmentalTaxPerUC',
       'OperatingCOGSperUC', 'WarehouseCostPerUC', 'DistributionCostPerUC',
       'TransactionsPerUC', 'Volume', 'PriceElasticity', 'ExitRate',
       'RatioNonOperatingCOGStoNSR', 'Distribution', 'Description',
       'Manufacturer', 'Category', 'Brand', 'Flavor', 'PackSize', 'Packaging'],
      dtype='object')

## High-level aggregations:

- top Manufacturers


In [5]:
df['Manufacturer'].value_counts().head()

TCCC              930
GRUPO PENAFIEL    348
PEPSICO           310
AGA DE MEXICO     226
DANONE            167
Name: Manufacturer, dtype: int64

- top Brands by Volume and Manufacturer


In [6]:
brand_manuf = df.groupby(['Manufacturer', 'Brand']).agg({'Volume': 'sum'}).reset_index().sort_values(by='Volume', ascending=False)
brand_manuf.head()

Unnamed: 0,Manufacturer,Brand,Volume
104,TCCC,COCA COLA,1041345000.0
87,PEPSICO,PEPSI COLA,43262800.0
102,TCCC,CIEL,27188350.0
22,GRUPO GEPP,E-PURA,21817690.0
15,DANONE,BONAFONT,12183420.0



- top Categories by Volume


In [7]:
categ = df.groupby('Category').agg({'Volume': 'sum'}).reset_index().sort_values(by='Volume', ascending=False)
categ.head()

Unnamed: 0,Category,Volume
7,COLAS,1107281000.0
2,AGUA EMBOTELLADA,65803320.0
12,R. FRUTALES,62894660.0
10,NARANJADAS,13490670.0
3,AGUA MINERAL NATURAL,7764272.0


- top Brand PackSize by Channel and Manufacturer by Volume for all, by Revenue for TCCC
- for every Brand for every Channel for every Manufacturer get top PackSize by Volume

In [8]:
packsizes = df.groupby(['PackSize', 'Brand', 'Category', 'Manufacturer']) \
              .agg({'Volume': 'sum'}) \
              .reset_index()
packsizes

Unnamed: 0,PackSize,Brand,Category,Manufacturer,Volume
0,1,AGGREGATED,AGGREGATED,AGGREGATED,2.859034e+06
1,70,GATORADE,ISOTONICOS,PEPSICO,5.863911e+02
2,100,LALA,NARANJADAS,LALA PRODS. LACTEOS,7.933880e+02
3,118,GERBER,BEBIDAS REFRESCANTES,GERBER,3.059784e+02
4,125,BOING,BEBIDAS REFRESCANTES,PASCUAL,4.022552e+03
...,...,...,...,...,...
570,20000,CIEL,AGUA EMBOTELLADA,TCCC,1.472050e+07
571,20000,E-PURA,AGUA EMBOTELLADA,GRUPO GEPP,1.406443e+07
572,20000,ELECTROPURA,AGUA EMBOTELLADA,GRUPO GEPP,1.927097e+06
573,20000,OTHERS MARCA UNIF,AGUA EMBOTELLADA,O.FAB.,8.690373e+04


In [9]:
top_ps = packsizes.groupby(['Brand', 'Category', 'Manufacturer']) \
                  .agg({'Volume': 'max'}) \
                  .reset_index()
top_ps.head()

Unnamed: 0,Brand,Category,Manufacturer,Volume
0,ACAPULCO,NARANJADAS,LA CONCORDIA,11082.22
1,ACAPULCOCO,JUGOS,CALAHUA,199.7364
2,ADES,SOYAS JUGO,TCCC,106118.0
3,AGA,R. FRUTALES,AGA DE MEXICO,2415774.0
4,AGGREGATED,AGGREGATED,AGGREGATED,2859034.0


In [12]:
pack_size = top_ps.merge(packsizes, on='Volume', suffixes=('_', '')).loc[:, 'Volume':'Manufacturer']
pack_size.head()

Unnamed: 0,Volume,PackSize,Brand,Category,Manufacturer
0,11082.22,1000,ACAPULCO,NARANJADAS,LA CONCORDIA
1,199.7364,330,ACAPULCOCO,JUGOS,CALAHUA
2,106118.0,946,ADES,SOYAS JUGO,TCCC
3,2415774.0,2000,AGA,R. FRUTALES,AGA DE MEXICO
4,2859034.0,1,AGGREGATED,AGGREGATED,AGGREGATED


Top Manufacturer by Volume

In [13]:
manuf = df.groupby(['Manufacturer']).agg({'Volume': 'sum'}).reset_index().sort_values(by='Volume', ascending=False)
manuf.head()

Unnamed: 0,Manufacturer,Volume
38,TCCC,1119014000.0
33,PEPSICO,60924700.0
14,GRUPO GEPP,25039620.0
0,AGA DE MEXICO,20749670.0
8,DANONE,12183420.0


## Competitor selection:

- for the top 3 TCCC brands: the top 3 SKUs from each brand present the most suitable competitor SKU.
Describe what criteria are used to define competitor SKU and why. How do TCCC SKUs perform in contrast to competitor SKUs?

## Product Performance:

- what are the key products that must always be presented in the store to avoid loss of sales? (Hist Exit Rates)



- what are the key products for which price change can lead to an unprofitable loss of sales? (Hint: Elasticity)