# Capstone Projekt Rossmann

# Intro and EDA

# 1. Project definition

## 1.1 Background

This project aims to investigate what insights the drugstore chain Rossmann can gain from its historical sales and advertising data, data on school vacations and public holidays and competitor data. The project will address the question of how this data can be used to optimize store operations and management with the aim of generating more sales. It will also determine how these data sets can be used to predict weekly sales (revenue) for each store with a sufficient level of accuracy. In addition, the possibility of creating automated reports at store level and an overall report is to be created.


Rossmann operates over 4,300 drugstores in 9 European countries. Currently, Rossmann store managers are tasked with preparing their weekly sales forecasts up to eight weeks in advance. Store sales results are influenced by many factors, including promotions, competitive intensity, school and national vacations, seasonal changes and location conditions. Because thousands of individual store managers create sales forecasts based on their individual circumstances, the accuracy of the results can vary widely. Therefore, the company's data science team is on a new mission to create a unified modeling methodology for store managers to predict weekly results with greater accuracy. The management team also needs an overall report with feasible or actionable strategies to understand the overall performance of all stores and find a way to optimize future sales performance (i.e. revenue). Finally, an individual store performance report needs to be provided to each store manager.

## 1.2 General task from the client

- Determine the key factors that influence sales and revenue
- Create strategies for the management to improve the business

- Build a standardized forecasting model which can predict the sales figures for the next eight weeks for each store.

- Generate a report with information on the overall performance of the 1115 stores as well as individual performance reports for each store.

## 1.3 Definition of use cases (background/business purpose)

- The analyses should help to better understand the current sales figures and the impact of e.g. promotions, product range, distance to competitors, etc. on sales.
- The resulting strategy and optimization proposals for increasing sales will then be discussed and implemented at management level.
- The standardized forecast model of sales figures per store for the next eight weeks should also be more accurate than the current individual forecasts of the individual store managers themselves, as well as saving corresponding time, monetary and capacity resources.
- The overall report as well as the individual store reports should also contribute to the mentioned objectives and also provide a quick and automated overview of the current performance.

## 1.4 Formulating hypotheses

1. promotions  
	1.1 In weeks with promotions, sales, number of customers and sales per customer are higher.  
2. promotion2  
	2.1 In months with Promo2, sales, number of customers and sales per customer are higher.  
	2.2 Since the store has been participating in Promo2, sales, number of customers and sales per customer are higher, even outside of Promo2 promotions.  
3. school holidays  
	3.1 Sales are lower in weeks with school vacations.  
	3.2 In weeks with school vacations, sales per customer are lower.  
4. public holidays  
	4.1 Sales are lower in weeks with public holidays.  
	4.2 In weeks with public holidays, more purchases are made on the other days.  
5. day of the week  
	5.1 Fewer purchases are made at the weekend.  
6. assortment  
	6.1 Stores with an extra assortment have higher sales than basic stores  
	6.2 Stores with an extended assortment have higher sales than basic and extra stores  
7. CompetitionDistance  
	7.1 The closer the competitor, the lower the sales.  
	7.2 Sales have been lower since the competitor opened.

## 1.5 Further interesting questions

1. promotions  
	1.1 Is a promotion worthwhile in weeks with public holidays?  
2. promotion2  
3. SchoolHolidays  
4. StateHolidays  
5. day of the week  
	5.1 Is a promotion on the weekend worthwhile?  
6. Assortment  
	6.1 What is the ratio of basic, extra and extended assortment?  
	6.2 What is the ratio of basic, extra and extended assortment in the individual store types?  
	6.3 What is the average turnover of the individual assortments?  
	6.4 How many promotion weeks are there in the individual assortments?  
	6.5 Do promotion weeks perform better in individual assortments than in others?  
7 CompetitionDistance  
8. StoreType  
	8.1 How many stores are there of which type?  
	8.2 How many promotion weeks are there in the individual store types?  
	8.3 Do promotion weeks perform better in individual store types than in others?


# Data Analysis

## Data preparing for EDA/ Row data preparation

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime

import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# matplotlib inline notebook





pd.set_option('display.max_columns', None)

### Store Data

In [None]:
df_store = pd.read_csv("store.csv", dtype={0:int, 1:object, 2:object, 3:float, 4:float, 5:float, 6:int, 7:float, 8:float, 9:object})


In [None]:
df_store.sample(10)

In [None]:
df_store.info()

In [None]:
for column in df_store.select_dtypes(include=['object']).columns:
    print(f"Unique values in '{column}': {df_store[column].unique()}")

In [None]:
df_store.describe()

In [None]:
px.box(df_store.select_dtypes(include=['number']))

### Train Data

In [None]:
df_train = pd.read_csv("train.csv", parse_dates=[2], dtype={7:object})

In [None]:
# Group by store and aggregate to week
df_train.set_index('Date', inplace=True)
weekly_sales_by_store = df_train.groupby('Store').resample('W').agg({'Sales': 'sum', 'Customers': 'sum', 'Open': 'sum', 'Promo': 'sum', 'StateHoliday': 'sum', 'SchoolHoliday': 'sum'}).reset_index()
weekly_sales_by_store

In [None]:
# Check non mixing state holidays
print("Unique values in 'StateHoliday':", weekly_sales_by_store.StateHoliday.unique())
# Create NumStateHoliday with how many state holidays are in the week
weekly_sales_by_store['StateHoliday'] = weekly_sales_by_store['StateHoliday'].astype(str)
weekly_sales_by_store['NumStateHoliday'] = weekly_sales_by_store['StateHoliday'].apply(lambda x: x.count('a') + x.count('b') + x.count('c'))
print("Unique values in 'NumStateHoliday':", weekly_sales_by_store.NumStateHoliday.unique())
# Remove aggregated 0 and doubled labels
weekly_sales_by_store['StateHoliday'] = weekly_sales_by_store['StateHoliday'].str.extract('(a|b|c)', expand=False).fillna('0')
print("Unique values in 'StateHoliday':", weekly_sales_by_store.StateHoliday.unique())

In [None]:
# Create Month and Year columns
weekly_sales_by_store.insert(weekly_sales_by_store.columns.get_loc('Date') + 1, 'CW', weekly_sales_by_store['Date'].dt.isocalendar().week)
weekly_sales_by_store.insert(weekly_sales_by_store.columns.get_loc('Date') + 2, 'Month', weekly_sales_by_store['Date'].dt.month)
weekly_sales_by_store.insert(weekly_sales_by_store.columns.get_loc('Date') + 3, 'Year', weekly_sales_by_store['Date'].dt.year)
weekly_sales_by_store.insert(weekly_sales_by_store.columns.get_loc('Date') + 4, 'DayOfWeek', weekly_sales_by_store['Date'].dt.dayofweek)

# Create SalesPerCustomer column
weekly_sales_by_store.insert(weekly_sales_by_store.columns.get_loc('Sales') + 1, 'SalesPerCustomer', weekly_sales_by_store['Sales'] / weekly_sales_by_store['Customers'])
# Create sales per open day
weekly_sales_by_store.insert(weekly_sales_by_store.columns.get_loc('Sales') + 2, 'SalesPerOpenDay', weekly_sales_by_store['Sales'] / weekly_sales_by_store['Open'])
# Create customers per open day
weekly_sales_by_store.insert(weekly_sales_by_store.columns.get_loc('Customers') + 1, 'CustomersPerOpenDay', weekly_sales_by_store['Customers'] / weekly_sales_by_store['Open'])

weekly_sales_by_store.insert(weekly_sales_by_store.columns.get_loc('Promo') + 1, 'IsPromo', np.where(weekly_sales_by_store['Promo'] > 0, 1, 0))
weekly_sales_by_store.insert(weekly_sales_by_store.columns.get_loc('StateHoliday') + 1, 'IsStateHoliday', np.where(weekly_sales_by_store['StateHoliday'] != '0', 1, 0))
weekly_sales_by_store.insert(weekly_sales_by_store.columns.get_loc('SchoolHoliday') + 1, 'IsSchoolHoliday', np.where(weekly_sales_by_store['SchoolHoliday'] > 0, 1, 0))


In [None]:
# Merge df_train with df_store
weekly_sales_with_store_info = pd.merge(weekly_sales_by_store, df_store, on='Store', how='left')
# Check if all sores were found
print("Amount of none found stores:", weekly_sales_with_store_info.Date.isna().sum())

In [None]:
# Create Promo2Member column too see if the store joined the Promo2 program
def is_promo2_active(row):
    if row['Promo2'] == 1:
        # Construct the starting date of Promo2
        promo2_start_date = datetime.fromisocalendar(int(row['Promo2SinceYear']), int(row['Promo2SinceWeek']), 1)
        # Check if the promo was active on the 'Date'
        return 1 if promo2_start_date <= row['Date'] else 0
    else:
        # If Promo2 is not active, return False
        return 0

# Apply the function across the DataFrame rows
weekly_sales_with_store_info['Promo2Member'] = weekly_sales_with_store_info.apply(is_promo2_active, axis=1)

In [None]:
# Create Promo2Active if current month is in PromoInterval
month_dict = {1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May', 6: 'Jun', 7: 'Jul', 8: 'Aug', 9: 'Sep', 10: 'Oct', 11: 'Nov', 12: 'Dec'}

def is_promo2_active_month(row):
	if row['Promo2Member'] == 1:
		# Check if the promo was active on the 'Date'
		return 1 if month_dict[row['Month']] in row['PromoInterval'].split(',') else 0
	else:
		# If Promo2 is not active, return False
		return 0

# Apply the function across the DataFrame rows
weekly_sales_with_store_info['Promo2Active'] = weekly_sales_with_store_info.apply(is_promo2_active_month, axis=1)


In [None]:
# Create IsCompetition column

def is_competition_active(row):
	if row['CompetitionOpenSinceYear'] > 0:
		# Construct the starting date of competition
		competition_start_date = datetime(int(row['CompetitionOpenSinceYear']), int(row['CompetitionOpenSinceMonth']), 1)
		# Check if the competition was active on the 'Date'
		return 1 if competition_start_date <= row['Date'] else 0
	else:
		# If competition is not active, return False
		return 0

# Apply the function across the DataFrame rows
weekly_sales_with_store_info.insert(weekly_sales_with_store_info.columns.get_loc('CompetitionOpenSinceYear') + 1, 'IsCompetition', weekly_sales_with_store_info.apply(is_competition_active, axis=1))

In [None]:
print(weekly_sales_with_store_info.info())
weekly_sales_with_store_info.sample(5)

In [None]:
used_df = weekly_sales_with_store_info[['StoreType', 'Sales', 'Customers']]
px.box(used_df, color='StoreType')

In [None]:
#save df to csv
weekly_sales_with_store_info.to_csv("weekly_sales_with_store_info.csv", index=False)


# EDA

## Analyse Hypotheses

### 1. Promotions

#### 1.1 In weeks with promotions, sales, number of customers and sales per customer are higher?

In [None]:
# To be compareble only weeks with >5 open days are used
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['Open'] >= 5)]
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerOpenDay', color='Promo')
fig.show()
# CustomersPerOpenDay
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='CustomersPerOpenDay', color='Promo')
fig.show()
# SalesPerCustomer
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerCustomer', color='Promo')
fig.show()

median_values = used_df.groupby(['StoreType', 'Promo']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()
median_values['SalesPerOpenDayDiff%'] = median_values.groupby('StoreType')['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values.groupby('StoreType')['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values.groupby('StoreType')['SalesPerCustomer'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='StoreType', columns='Promo', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot

In [None]:
# show the diff columns in a plotly bar plot unsing subplots
fig = make_subplots(rows=1, cols=3, subplot_titles=("SalesPerOpenDayDiff%", "CustomersPerOpenDayDiff%", "SalesPerCustomerDiff%"))
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerOpenDayDiff%'], name='SalesPerOpenDayDiff%'), row=1, col=1)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['CustomersPerOpenDayDiff%'], name='CustomersPerOpenDayDiff%'), row=1, col=2)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerCustomerDiff%'], name='SalesPerCustomerDiff%'), row=1, col=3)
fig.update_layout(title_text="Difference in percent between Promo and non Promo weeks for each StoreType")
fig.show()

**Result:** Hypothesis is true. The sales, customers and sales per customer are higher in weeks with promotions.
- Sales per day is ~ 35% higher
- Customers per day is ~ 19% higher
- Sales per customer is ~ 13% higher
- Stortype b is quite different. It just has 1/3 of the promotion impact.

### 2. Promotion2  
#### 2.1 In months with Promo2, sales, number of customers and sales per customer are higher?


In [None]:
# To be compareble only weeks with >5 open days are used
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['Promo2'] == 1) & (weekly_sales_with_store_info['Promo2Member'] == 1)]

fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerOpenDay', color='Promo2Active')
fig.show()
# CustomersPerOpenDay
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='CustomersPerOpenDay', color='Promo2Active')
fig.show()
# SalesPerCustomer
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerCustomer', color='Promo2Active')
fig.show()

median_values = used_df.groupby(['StoreType', 'Promo2Active']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()
median_values['SalesPerOpenDayDiff%'] = median_values.groupby('StoreType')['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values.groupby('StoreType')['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values.groupby('StoreType')['SalesPerCustomer'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='StoreType', columns='Promo2Active', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot

In [None]:
# see amount of data
median_values_amount = used_df.groupby(['StoreType', 'Promo2Active']).agg({
    'SalesPerOpenDay': ['median', 'count'], 
    'CustomersPerOpenDay': ['median', 'count'], 
    'SalesPerCustomer': ['median', 'count']
}).reset_index()
median_values_amount

In [None]:
# show the diff columns in a plotly bar plot unsing subplots
fig = make_subplots(rows=1, cols=3, subplot_titles=("SalesPerOpenDayDiff%", "CustomersPerOpenDayDiff%", "SalesPerCustomerDiff%"))
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerOpenDayDiff%'], name='SalesPerOpenDayDiff%'), row=1, col=1)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['CustomersPerOpenDayDiff%'], name='CustomersPerOpenDayDiff%'), row=1, col=2)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerCustomerDiff%'], name='SalesPerCustomerDiff%'), row=1, col=3)
fig.update_layout(title_text="Difference in percent between Promo2 and non Promo2 weeks for each StoreType")
fig.show()

**Result**: Hypothesis is not really true.
- Sales per day is ~ 0.4% higher
- Customers per day is the nearly the same as without Promo2
- Sales per customer is ~ 0.4% higher

#### 2.2 Since the store has been participating in Promo2, sales, number of customers and sales per customer are higher, even outside of Promo2 promotions?

In [None]:
# To be compareble only weeks with >5 open days are used
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['Promo2'] == 1)]

fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerOpenDay', color='Promo2Member')
fig.show()
# CustomersPerOpenDay
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='CustomersPerOpenDay', color='Promo2Member')
fig.show()
# SalesPerCustomer
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerCustomer', color='Promo2Member')
fig.show()

median_values = used_df.groupby(['StoreType', 'Promo2Member']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()
median_values['SalesPerOpenDayDiff%'] = median_values.groupby('StoreType')['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values.groupby('StoreType')['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values.groupby('StoreType')['SalesPerCustomer'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='StoreType', columns='Promo2Member', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot

In [None]:
median_values_amount = used_df.groupby(['StoreType', 'Promo2Member']).agg({
    'SalesPerOpenDay': ['median', 'count'], 
    'CustomersPerOpenDay': ['median', 'count'], 
    'SalesPerCustomer': ['median', 'count']
}).reset_index()
median_values_amount

In [None]:
# show the diff columns in a plotly bar plot unsing subplots
fig = make_subplots(rows=1, cols=3, subplot_titles=("SalesPerOpenDayDiff%", "CustomersPerOpenDayDiff%", "SalesPerCustomerDiff%"))
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerOpenDayDiff%'], name='SalesPerOpenDayDiff%'), row=1, col=1)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['CustomersPerOpenDayDiff%'], name='CustomersPerOpenDayDiff%'), row=1, col=2)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerCustomerDiff%'], name='SalesPerCustomerDiff%'), row=1, col=3)
fig.update_layout(title_text="Difference in percent between store was no Promo2 member and since they where Promo2 member each StoreType")
fig.show()

**Result**: Hypothesis is not true at all.
- Sales per day is ~ -10% lower
- Customers per day is ~ -12% lower
- Sales per customer is ~ -0,3% lower

### 3. SchoolHolidays   

#### 3.1 Sales are lower in weeks with school vacations?

In [None]:
# To be unbiased only weeks where is no StateHolidays.
used_df = weekly_sales_with_store_info[weekly_sales_with_store_info['IsStateHoliday'] == 0]
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerOpenDay', color='IsSchoolHoliday')
fig.show()
# CustomersPerOpenDay
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='CustomersPerOpenDay', color='IsSchoolHoliday')
fig.show()
# SalesPerCustomer
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerCustomer', color='IsSchoolHoliday')
fig.show()

median_values = used_df.groupby(['StoreType', 'IsSchoolHoliday']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()
median_values['SalesPerOpenDayDiff%'] = median_values.groupby('StoreType')['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values.groupby('StoreType')['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values.groupby('StoreType')['SalesPerCustomer'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='StoreType', columns='IsSchoolHoliday', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot

In [None]:
# show the diff columns in a plotly bar plot unsing subplots
fig = make_subplots(rows=1, cols=3, subplot_titles=("SalesPerOpenDayDiff%", "CustomersPerOpenDayDiff%", "SalesPerCustomerDiff%"))
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerOpenDayDiff%'], name='SalesPerOpenDayDiff%'), row=1, col=1)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['CustomersPerOpenDayDiff%'], name='CustomersPerOpenDayDiff%'), row=1, col=2)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerCustomerDiff%'], name='SalesPerCustomerDiff%'), row=1, col=3)
fig.update_layout(title_text="Difference in percent between SchoolHoliday and non SchoolHoliday weeks for each StoreType")
fig.show()

**Result:** Hypothesis is just true for storetype a.
- For Storetype a the sales on Schooldays are 1.3% lower.
- For Storetype b the sales on Schooldays are 4.6% higher.
- For Storetype c and d the sales on Schooldays are nearly the same higher with 0.7% and 0.9%.

#### 3.2 In weeks with school vacations, sales per customer are lower?

**Result:** Hypothesis just true for storetype d.
- For Storetype a the sales per customer on Schooldays are 0.6& higher.
- For Storetype b the sales per customer on Schooldays are 5.3% higher.
- For Storetype c there is no differnce compared no other days.
- For Storetype d the sales per customer on Schooldays are 1.1% lower.

### 4. StateHolidays
#### 4.1 Sales are lower in weeks with public holidays?

In [None]:
used_df = weekly_sales_with_store_info

fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='Sales', color='IsStateHoliday')
fig.show()

median_values = used_df.groupby(['StoreType', 'IsStateHoliday']).agg({'Sales': 'median'}).reset_index()
median_values['SalesDiff%'] = median_values.groupby('StoreType')['Sales'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='StoreType', columns='IsStateHoliday', values=['Sales', 'SalesDiff%']).style.format("{:.2f}")
median_values_pivot

In [None]:
px.bar(median_values, x='StoreType', y='SalesDiff%', title='Difference in percent between StateHoliday and non StateHoliday weeks for each StoreType')

In [None]:
# Check how many days store b is open if in the week is a StateHoliday
weekly_sales_with_store_info[(weekly_sales_with_store_info['StoreType'] == 'b') & (weekly_sales_with_store_info['IsStateHoliday'] == 1)]['Open'].value_counts()

**Result** Hypothesis is true for all storetypes exept storetype b.
- For Storetype a the sales on holidays are 13% lower.
- For Storetype b the sales on holidays are 1.5% higher, but this is due to that the stores are mostly opend all 7 days a week.
- For Storetype c the sales on holidays are 10.5% lower.
- For Storetype d the sales on holidays are 9.4% lower.

#### 4.2 In weeks with public holidays, more purchases are made on the other days?


In [None]:
used_df = weekly_sales_with_store_info

fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerOpenDay', color='IsStateHoliday')
fig.show()
# CustomersPerOpenDay
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='CustomersPerOpenDay', color='IsStateHoliday')
fig.show()
# SalesPerCustomer
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerCustomer', color='IsStateHoliday')
fig.show()

median_values = used_df.groupby(['StoreType', 'IsStateHoliday']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()
median_values['SalesPerOpenDayDiff%'] = median_values.groupby('StoreType')['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values.groupby('StoreType')['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values.groupby('StoreType')['SalesPerCustomer'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='StoreType', columns='IsStateHoliday', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot

In [None]:
# show the diff columns in a plotly bar plot unsing subplots
fig = make_subplots(rows=1, cols=3, subplot_titles=("SalesPerOpenDayDiff%", "CustomersPerOpenDayDiff%", "SalesPerCustomerDiff%"))
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerOpenDayDiff%'], name='SalesPerOpenDayDiff%'), row=1, col=1)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['CustomersPerOpenDayDiff%'], name='CustomersPerOpenDayDiff%'), row=1, col=2)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerCustomerDiff%'], name='SalesPerCustomerDiff%'), row=1, col=3)
fig.update_layout(title_text="Difference in percent between StateHolidays and non StateHolidays weeks for each StoreType")
fig.show()

**Result** Hypothesis is true for all storetypes exept storetype b.
- For Storetype a the sales on a open day within a holiday week are 4.5% higher.
- For Storetype b the sales are equal but it is due to the fact that the stores are mostly opend all 7 days a week.
- For Storetype c and d the sales on a open day within a holiday week are 8.0% higher.

### 5. DayOfWeek  
#### 5.1 Fewer purchases are made at the weekend?

In [None]:
fig = px.histogram(df_train, x='DayOfWeek', y='Sales', histfunc='avg')
fig.show()

In [None]:
df_train_merged = pd.merge(df_train, df_store, on='Store', how='left').sort_values(by='StoreType')

plt.figure(figsize=(20, 10))
fig = px.histogram(df_train_merged, x='DayOfWeek', y='Sales', histfunc='avg', title='Sales per DayOfWeek', facet_row='StoreType')
fig.update_layout(width=1000, height=800)
fig.show()


In [None]:
(df_train_merged[(df_train_merged['StoreType'] == 'a') & (df_train_merged['DayOfWeek'] == 6)]['Sales'].mean() - df_train_merged[(df_train_merged['StoreType'] == 'a') & (df_train_merged['DayOfWeek'] >= 1) & (df_train_merged['DayOfWeek'] <= 5)]['Sales'].mean()) / df_train_merged[(df_train_merged['StoreType'] == 'a') & (df_train_merged['DayOfWeek'] == 6)]['Sales'].mean()

**Result** Hypothesis is true except for StoreType b.
- In average, on a Saturday there is 30% less sales as within the week

### 6. Assortment  
#### 6.1 Stores with an extra assortment have higher sales than basic stores?  


In [None]:
used_df = weekly_sales_with_store_info

fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerOpenDay', color='Assortment')
fig.show()
# CustomersPerOpenDay
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='CustomersPerOpenDay', color='Assortment')
fig.show()
# SalesPerCustomer
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerCustomer', color='Assortment')
fig.show()

In [None]:
# Calculate difference between assortment a and b
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['Assortment'] == 'a') | (weekly_sales_with_store_info['Assortment'] == 'b')]

median_values = used_df.groupby(['StoreType', 'Assortment']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()
median_values['SalesPerOpenDayDiff%'] = median_values.groupby('StoreType')['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values.groupby('StoreType')['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values.groupby('StoreType')['SalesPerCustomer'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='StoreType', columns='Assortment', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot

In [None]:
# show the diff columns in a plotly bar plot unsing subplots
fig = make_subplots(rows=1, cols=3, subplot_titles=("SalesPerOpenDayDiff%", "CustomersPerOpenDayDiff%", "SalesPerCustomerDiff%"))
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerOpenDayDiff%'], name='SalesPerOpenDayDiff%'), row=1, col=1)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['CustomersPerOpenDayDiff%'], name='CustomersPerOpenDayDiff%'), row=1, col=2)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerCustomerDiff%'], name='SalesPerCustomerDiff%'), row=1, col=3)
fig.update_layout(title_text="Difference in percent between assortment a and b for each StoreType")
fig.show()

**Result** Hypothesis is false.
- Only Storetype b has the assortment caregory extra(b).
- The sales in Storetype b for assortment extra is 18% less then for assortment basic(a).
- They have 9.3% more customers but 34% less sales per customer.

#### 6.2 Stores with an extended assortment have higher sales than basic and extra stores?

In [None]:
# Calculate difference between assortment a and c
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['Assortment'] == 'a') | (weekly_sales_with_store_info['Assortment'] == 'c')]

median_values = used_df.groupby(['StoreType', 'Assortment']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()
median_values['SalesPerOpenDayDiff%'] = median_values.groupby('StoreType')['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values.groupby('StoreType')['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values.groupby('StoreType')['SalesPerCustomer'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='StoreType', columns='Assortment', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot


In [None]:
# show the diff columns in a plotly bar plot unsing subplots
fig = make_subplots(rows=1, cols=3, subplot_titles=("SalesPerOpenDayDiff%", "CustomersPerOpenDayDiff%", "SalesPerCustomerDiff%"))
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerOpenDayDiff%'], name='SalesPerOpenDayDiff%'), row=1, col=1)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['CustomersPerOpenDayDiff%'], name='CustomersPerOpenDayDiff%'), row=1, col=2)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerCustomerDiff%'], name='SalesPerCustomerDiff%'), row=1, col=3)
fig.update_layout(title_text="Difference in percent between assortment a and c for each StoreType")
fig.show()

In [None]:
# Calculate difference between assortment b and c
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['Assortment'] == 'b') | (weekly_sales_with_store_info['Assortment'] == 'c')]

median_values = used_df.groupby(['StoreType', 'Assortment']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()
median_values['SalesPerOpenDayDiff%'] = median_values.groupby('StoreType')['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values.groupby('StoreType')['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values.groupby('StoreType')['SalesPerCustomer'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='StoreType', columns='Assortment', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot

In [None]:
# show the diff columns in a plotly bar plot unsing subplots
fig = make_subplots(rows=1, cols=3, subplot_titles=("SalesPerOpenDayDiff%", "CustomersPerOpenDayDiff%", "SalesPerCustomerDiff%"))
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerOpenDayDiff%'], name='SalesPerOpenDayDiff%'), row=1, col=1)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['CustomersPerOpenDayDiff%'], name='CustomersPerOpenDayDiff%'), row=1, col=2)
fig.add_trace(go.Bar(x=median_values['StoreType'], y=median_values['SalesPerCustomerDiff%'], name='SalesPerCustomerDiff%'), row=1, col=3)
fig.update_layout(title_text="Difference in percent between assortment b and c for each StoreType")
fig.show()

**Result** Hypothesis is mostly true.
- Only Storetype b has the assortment caregory extra(b).
- The sales for assortment erweitert(c) is in average 26%(16%,81%,-2%,8%) higher then for assortment basic(a).
- The sales for assortment erweitert(c) is 121% higher then for assortment extra(b).

### 7. CompetitionDistance  
#### 7.1 The closer the competitor, the lower the sales?


In [None]:

sns.lmplot(x='CompetitionDistance', y='SalesPerCustomer', data=weekly_sales_with_store_info, hue='StoreType', fit_reg=False, x_bins=150, height=10
           , aspect=2
           )


**Result** Hypothesis is nearly true for storeType a, c, d.
- The sales per customer is rising if the distance to the competeter is getting up to 10km. After that the sales per customer is falling until 20km and is rising again after 20km.


#### 7.2 Sales have been lower since the competitor opened?

In [None]:
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['Store'] >= 0) & (weekly_sales_with_store_info['Store'] <= 100) & (weekly_sales_with_store_info['IsCompetition'].mean() > 0)].sort_values('Store')
plt.figure(figsize=(20, 10))
sns.barplot(x='Store', y='SalesPerCustomer', data=used_df, hue='IsCompetition', errorbar=None)

In [None]:
def returnSalesBeforeCompetition(df, storeId):
	store = df[(df['Store'] == storeId) & (df['IsCompetition'] == 0)]
	competition = df[(df['Store'] == storeId) & (df['IsCompetition'] == 1)]
	return store['Sales'].mean(), competition['Sales'].mean()

arr_sales_before_competition = []
arr_sales_after_competition = []

for id in range(1, 1116):
	store = weekly_sales_with_store_info[(weekly_sales_with_store_info['Store'] == id)]
	if (store['IsCompetition'].mean() > 0) and (store['IsCompetition'].mean() < 1):
		store_sales, competition_sales = returnSalesBeforeCompetition(weekly_sales_with_store_info, id)
		arr_sales_before_competition.append(store_sales)
		arr_sales_after_competition.append(competition_sales)

print(f"Mean sales before competition: {np.mean(arr_sales_before_competition)}")
print(f"Mean sales after competition: {np.mean(arr_sales_after_competition)}")
	

**Result** Hypothesis is true.
- The sales is in average 5,5% less after the competeter has opened.

## Answers Further interesting questions

### 1. Promotions  
#### 1.1 Is a promotion worthwhile in weeks with state holidays?

In [None]:
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['IsStateHoliday'] == 1)]

fig = px.box(used_df.sort_values('IsPromo'), x='IsPromo', y='SalesPerOpenDay')
fig.show()
# CustomersPerOpenDay
fig = px.box(used_df.sort_values('IsPromo'), x='IsPromo', y='CustomersPerOpenDay')
fig.show()
# SalesPerCustomer
fig = px.box(used_df.sort_values('IsPromo'), x='IsPromo', y='SalesPerCustomer')
fig.show()

In [None]:
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['IsStateHoliday'] == 1)]

median_values = used_df.groupby(['IsPromo']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()
median_values['SalesPerOpenDayDiff%'] = median_values['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values['SalesPerCustomer'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='IsPromo', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot

**Result**
- Also in a week with state holidays the promotions have a positive effect.
- The sales per day is 38% higher
- The customers per day is 20% higher
- The sales per customer is 16% higher

### 5. DayOfWeek  
#### 5.1 Is a promotion on the weekend worthwhile?

In [None]:
df_train_day = df_train.reset_index()
df_train_day

In [None]:
# Create Month and Year columns
df_train_day.insert(df_train_day.columns.get_loc('Date') + 1, 'CW', df_train_day['Date'].dt.isocalendar().week)
df_train_day.insert(df_train_day.columns.get_loc('Date') + 2, 'Month', df_train_day['Date'].dt.month)
df_train_day.insert(df_train_day.columns.get_loc('Date') + 3, 'Year', df_train_day['Date'].dt.year)

# Create SalesPerCustomer column
df_train_day.insert(df_train_day.columns.get_loc('Sales') + 1, 'SalesPerCustomer', df_train_day['Sales'] / df_train_day['Customers'])
# Create customers per open day
df_train_day.insert(df_train_day.columns.get_loc('Customers') + 1, 'CustomersPerOpenDay', df_train_day['Customers'] / df_train_day['Open'])

df_train_day.insert(df_train_day.columns.get_loc('Promo') + 1, 'IsPromo', np.where(df_train_day['Promo'] > 0, 1, 0))
df_train_day.insert(df_train_day.columns.get_loc('StateHoliday') + 1, 'IsStateHoliday', np.where(df_train_day['StateHoliday'] != '0', 1, 0))
df_train_day.insert(df_train_day.columns.get_loc('SchoolHoliday') + 1, 'IsSchoolHoliday', np.where(df_train_day['SchoolHoliday'] > 0, 1, 0))


In [None]:
used_df = df_train_day

fig = px.box(used_df.sort_values('DayOfWeek'), x='DayOfWeek', y='Sales')
fig.show()
# CustomersPerOpenDay
fig = px.box(used_df.sort_values('DayOfWeek'), x='DayOfWeek', y='CustomersPerOpenDay')
fig.show()
# SalesPerCustomer
fig = px.box(used_df.sort_values('DayOfWeek'), x='DayOfWeek', y='SalesPerCustomer')
fig.show()

median_values = used_df.groupby('DayOfWeek').agg({'Sales': 'median', 'Customers': 'median', 'SalesPerCustomer': 'median'}).reset_index()

median_values_pivot = median_values.pivot_table(index='DayOfWeek', values=['Sales', 'Customers', 'SalesPerCustomer']).style.format("{:.2f}")
median_values_pivot

**Result:**
- The amount of customers on a Saturday is -22% lower then within the week.
- The sales on a Saturday is ~ -20% lower then within the week.
- The sales per customer on a Saturday is ~ 2% lower then within the week.

### 6. Assortment  
#### 6.1 What is the ratio of basic, extra and extended assortment?  


In [None]:
# counts of assortment a, b and c and there percentage
fig = px.histogram(weekly_sales_with_store_info.sort_values('Assortment'), x='Assortment', title='Assortment counts')
fig.show()

fig = px.histogram(weekly_sales_with_store_info.sort_values('Assortment'), x='Assortment', title='Assortment percentage', histnorm='percent')
fig.show()

#### 6.2 What is the ratio of basic, extra and extended assortment in the individual store types?  

In [None]:
fig = px.histogram(weekly_sales_with_store_info.sort_values(by=['Assortment', 'StoreType']), x='Assortment', title='Assortment counts by StoreType', color='StoreType', barmode='group')
fig.show()

fig = px.histogram(weekly_sales_with_store_info.sort_values(by=['Assortment', 'StoreType']), x='Assortment', title='Assortment percentage by StoreType', histnorm='percent', color='StoreType', barmode='group')
fig.show()

#### 6.3 What is the average turnover of the individual assortments? 


In [None]:
used_df = weekly_sales_with_store_info

fig = px.box(used_df.sort_values('Assortment'), x='Assortment', y='Sales')
fig.show()

In [None]:
used_df = weekly_sales_with_store_info

fig = px.box(used_df.sort_values('Assortment'), x='Assortment', y='Sales', color='StoreType')
fig.show()

#### 6.4 How many promotion weeks are there in the individual assortments?


In [None]:
# How many promo weeks are there in each year for each assortment
promo_data = weekly_sales_with_store_info[weekly_sales_with_store_info['IsPromo'] == 1]

# Group by Year, Assortment, and Calendar Week, then count the number of weeks
promo_weeks = promo_data.groupby(['Year', 'Assortment'])['CW'].nunique().reset_index(name='PromoWeeksCount')
promo_weeks

In [None]:
# How many promo weeks are there in each year for each StoreType
promo_data = weekly_sales_with_store_info[weekly_sales_with_store_info['IsPromo'] == 1]

# Group by Year, Assortment, and Calendar Week, then count the number of weeks
promo_weeks = promo_data.groupby(['Year', 'StoreType'])['CW'].nunique().reset_index(name='PromoWeeksCount')
promo_weeks

In [None]:
# How many promo weeks are there in each month
promo_data = weekly_sales_with_store_info[weekly_sales_with_store_info['IsPromo'] == 1]

promo_weeks = promo_data.groupby(['Year', 'Month'])['CW'].nunique().reset_index(name='PromoWeeksCount')
promo_weeks

**Result**
- Every second week for all assortment types.

#### 6.5 Do promotion weeks perform better in individual assortments than in others?

In [None]:
# To be compareble only weeks with >5 open days are used
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['Open'] >= 5)]
fig = px.box(used_df.sort_values('Assortment'), x='Assortment', y='SalesPerOpenDay', color='Promo')
fig.show()
# CustomersPerOpenDay
fig = px.box(used_df.sort_values('Assortment'), x='Assortment', y='CustomersPerOpenDay', color='Promo')
fig.show()
# SalesPerCustomer
fig = px.box(used_df.sort_values('Assortment'), x='Assortment', y='SalesPerCustomer', color='Promo')
fig.show()

median_values = used_df.groupby(['Assortment', 'Promo']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()
median_values['SalesPerOpenDayDiff%'] = median_values.groupby('Assortment')['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values.groupby('Assortment')['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values.groupby('Assortment')['SalesPerCustomer'].pct_change() * 100

median_values_pivot = median_values.pivot_table(index='Assortment', columns='Promo', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot

**Result**
- The promotion weeks are in Assortment a and c three times more effective then in assortment b.

### 8. Storetype  
#### 8.1 How many stores are there of which type?  

In [None]:
fig = px.histogram(weekly_sales_with_store_info.sort_values(by=['StoreType']), x='StoreType', title='StoreType counts')
fig.show()

fig = px.histogram(weekly_sales_with_store_info.sort_values(by=['StoreType']), x='StoreType', title='StoreType percentage', histnorm='percent')
fig.show()

fig = px.histogram(weekly_sales_with_store_info.sort_values(by=['Assortment', 'StoreType']), x='StoreType', title='StoreType counts by Assortment', color='Assortment', barmode='group')
fig.show()

fig = px.histogram(weekly_sales_with_store_info.sort_values(by=['Assortment', 'StoreType']), x='StoreType', title='StoreType percentage by Assortment', histnorm='percent', color='Assortment', barmode='group')
fig.show()

#### 8.2 How many promotion weeks are there in the individual store types? 

In [None]:
# How many promo weeks are there in each year for each StoreType
promo_data = weekly_sales_with_store_info[weekly_sales_with_store_info['IsPromo'] == 1]

# Group by Year, Assortment, and Calendar Week, then count the number of weeks
promo_weeks = promo_data.groupby(['Year', 'StoreType'])['CW'].nunique().reset_index(name='PromoWeeksCount')
promo_weeks

#### 8.3 Do promotion weeks perform better in individual store types than in others?

In [None]:
# To be compareble only weeks with >5 open days are used
used_df = weekly_sales_with_store_info[(weekly_sales_with_store_info['Open'] >= 5)]
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerOpenDay', color='Promo')
fig.show()
# CustomersPerOpenDay
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='CustomersPerOpenDay', color='Promo')
fig.show()
# SalesPerCustomer
fig = px.box(used_df.sort_values('StoreType'), x='StoreType', y='SalesPerCustomer', color='Promo')
fig.show()

median_values = used_df.groupby(['StoreType', 'Promo']).agg({'SalesPerOpenDay': 'median', 'CustomersPerOpenDay': 'median', 'SalesPerCustomer': 'median'}).reset_index()

median_values['SalesPerOpenDayDiff%'] = median_values.groupby('StoreType')['SalesPerOpenDay'].pct_change() * 100
median_values['CustomersPerOpenDayDiff%'] = median_values.groupby('StoreType')['CustomersPerOpenDay'].pct_change() * 100
median_values['SalesPerCustomerDiff%'] = median_values.groupby('StoreType')['SalesPerCustomer'].pct_change() * 100

# pivot table to show the mean values of the SalesPerOpenDay, CustomersPerOpenDay and SalesPerCustomer for each StoreType and Promo
median_values_pivot = median_values.pivot_table(index='StoreType', columns='Promo', values=['SalesPerOpenDay', 'CustomersPerOpenDay', 'SalesPerCustomer', 'SalesPerOpenDayDiff%', 'CustomersPerOpenDayDiff%', 'SalesPerCustomerDiff%']).style.format("{:.2f}")
median_values_pivot

**Result**
- The promotion weeks are in StoreType a, c and d three times more effective then in assortment b.