# Question 1

In [243]:
import pandas as pd
import numpy as np

dat = pd.read_excel('ConcessionSalesData_ForClass.xlsx')
dat.head(5)

Unnamed: 0,food_game,UserID,UseCount,revenue,game_week,special_discount,special_item,FAMILYGROUPNAME,Master_Item,MENUITEMNAME,...,first_week_discount,Discount_HotDog,Discount_SouvCup,Discount_BtlWater,Discount_Peanuts,Discount_Nachos,Discount_Pretzel,Discount_Popcorn,sth_rev_game,total_product_rev_nonSTH
0,BAG PEANUTS_Game 1,3304107,1,4.726207,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.65411
1,BAG PEANUTS_Game 1,3405989,1,4.73,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.65411
2,BAG PEANUTS_Game 1,3302989,1,4.73,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.65411
3,BAG PEANUTS_Game 1,3253641,1,4.5675,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.65411
4,BAG PEANUTS_Game 1,3315665,1,4.726615,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.65411


## Assumptions

- Assume that the preferences of items between the season ticket holders and non season ticket holders are similar. (i.e. demand from non STH can be predictive of demand of STH)
- Assume that the average pricepoint for an item in each game is the weighted average of all actual prices, weighted by the demand.
- Assume the occurence of discounts on 1 item does not depend on occurence of discounts of another item

### Peanuts

In [244]:
peanuts = dat.loc[dat['MENUITEMNAME'] == 'BAG PEANUTS', :]
peanuts.head(5)

Unnamed: 0,food_game,UserID,UseCount,revenue,game_week,special_discount,special_item,FAMILYGROUPNAME,Master_Item,MENUITEMNAME,...,first_week_discount,Discount_HotDog,Discount_SouvCup,Discount_BtlWater,Discount_Peanuts,Discount_Nachos,Discount_Pretzel,Discount_Popcorn,sth_rev_game,total_product_rev_nonSTH
0,BAG PEANUTS_Game 1,3304107,1,4.726207,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.65411
1,BAG PEANUTS_Game 1,3405989,1,4.73,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.65411
2,BAG PEANUTS_Game 1,3302989,1,4.73,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.65411
3,BAG PEANUTS_Game 1,3253641,1,4.5675,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.65411
4,BAG PEANUTS_Game 1,3315665,1,4.726615,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.65411


In [245]:
peanuts.columns

Index(['food_game', 'UserID', 'UseCount', 'revenue', 'game_week',
       'special_discount', 'special_item', 'FAMILYGROUPNAME', 'Master_Item',
       'MENUITEMNAME', 'PRICES', 'actual_discount', 'actual_price',
       'Discount Type', 'Discount Percentage', 'first_week_discount',
       'Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn', 'sth_rev_game', 'total_product_rev_nonSTH'],
      dtype='object')

In [246]:
# Weight prices according to their demand.
# Prices for CL are siginificantly lower than GA / STH
# However the number of CL is also significantly lower than GA / STH
# We want to weight each actual price by the demand of item at that price point
# This negates the class imbalance issues

weights = peanuts.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()
weights['weighted_sums'] = weights['UseCount'] * weights['revenue']
weights['uc2'] = weights['UseCount'] ** 2
weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()
weights['weighted_actual_price'] = weights['weighted_sums'] / weights['uc2']
demand = peanuts.groupby(by = ['game_week'])['UseCount'].sum(numeric_only=True).reset_index()
peanut_demand_price = pd.merge(left = weights, right = demand, on = 'game_week')
peanut_demand_price.drop(labels=['weighted_sums', 'uc2'], axis = 1, inplace=True)

  weights = peanuts.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()
  weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()


In [247]:
peanuts

Unnamed: 0,food_game,UserID,UseCount,revenue,game_week,special_discount,special_item,FAMILYGROUPNAME,Master_Item,MENUITEMNAME,...,first_week_discount,Discount_HotDog,Discount_SouvCup,Discount_BtlWater,Discount_Peanuts,Discount_Nachos,Discount_Pretzel,Discount_Popcorn,sth_rev_game,total_product_rev_nonSTH
0,BAG PEANUTS_Game 1,3304107,1,4.726207,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.654110
1,BAG PEANUTS_Game 1,3405989,1,4.730000,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.654110
2,BAG PEANUTS_Game 1,3302989,1,4.730000,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.654110
3,BAG PEANUTS_Game 1,3253641,1,4.567500,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.654110
4,BAG PEANUTS_Game 1,3315665,1,4.726615,Game 1,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,No,Yes,No,No,No,No,16441.58141,15296.654110
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
699,BAG PEANUTS_Game 8,3315509,1,4.706897,Game 8,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,Yes,Yes,No,No,No,Yes,13059.93456,9784.410704
700,BAG PEANUTS_Game 8,3315665,1,4.500000,Game 8,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,Yes,Yes,No,No,No,Yes,13059.93456,9784.410704
701,BAG PEANUTS_Game 8,3310597,1,4.639535,Game 8,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,Yes,Yes,No,No,No,Yes,13059.93456,9784.410704
702,BAG PEANUTS_Game 8,3500449,1,4.500000,Game 8,STH Discount Only,Yes,SNACKS,20500003,BAG PEANUTS,...,No Discount,Yes,Yes,Yes,No,No,No,Yes,13059.93456,9784.410704


In [248]:
peanut_demand_price_controlled = pd.merge(left=peanut_demand_price, right=peanuts.loc[:, ['game_week', 'total_product_rev_nonSTH', 'Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
        'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']], on = 'game_week', how = 'left').drop_duplicates('game_week')

peanut_demand_price_controlled.drop(labels=['game_week'], axis = 1, inplace=True)

In [249]:
peanut_demand_price_controlled

Unnamed: 0,weighted_actual_price,UseCount,total_product_rev_nonSTH,Discount_HotDog,Discount_SouvCup,Discount_BtlWater,Discount_Nachos,Discount_Pretzel,Discount_Popcorn
0,4.639045,105,15296.65411,Yes,No,Yes,No,No,No
105,2.629261,176,13869.75,No,No,No,Yes,No,No
267,4.640434,94,14704.79921,No,Yes,No,No,Yes,No
361,4.649035,105,11117.79609,Yes,No,Yes,No,No,No
466,4.646859,66,9825.807317,No,Yes,No,No,No,Yes
532,4.630899,73,10036.78187,No,No,Yes,No,Yes,No
605,4.651225,41,10716.29979,Yes,Yes,No,No,No,No
646,4.553264,58,9784.410704,Yes,Yes,Yes,No,No,Yes


In [250]:
from sklearn.preprocessing import LabelEncoder
from copy import deepcopy

le = LabelEncoder()
# df_peanut = pd.get_dummies(peanut_demand_price_controlled, columns=['game_week'])
df_peanut = deepcopy(peanut_demand_price_controlled)
df_peanut.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
        'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']] = df_peanut.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
        'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']].apply(le.fit_transform)

df_peanut['weighted_actual_price'] = np.log(df_peanut['weighted_actual_price'])
df_peanut['UseCount'] = np.log(df_peanut['UseCount'])
# df_peanut.drop(labels='game_week_Game 1', axis = 1, inplace=True)

  df_peanut.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',


In [251]:
df_peanut

Unnamed: 0,weighted_actual_price,UseCount,total_product_rev_nonSTH,Discount_HotDog,Discount_SouvCup,Discount_BtlWater,Discount_Nachos,Discount_Pretzel,Discount_Popcorn
0,1.534509,4.65396,15296.65411,1,0,1,0,0,0
105,0.966703,5.170484,13869.75,0,0,0,1,0,0
267,1.534808,4.543295,14704.79921,0,1,0,0,1,0
361,1.53666,4.65396,11117.79609,1,0,1,0,0,0
466,1.536191,4.189655,9825.807317,0,1,0,0,0,1
532,1.532751,4.290459,10036.78187,0,0,1,0,1,0
605,1.537131,3.713572,10716.29979,1,1,0,0,0,0
646,1.515844,4.060443,9784.410704,1,1,1,0,0,1


### Modeling

In [252]:
import statsmodels.api as sm

X = df_peanut.drop(labels=['UseCount', 'Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
        'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn'], axis = 1)
X = sm.add_constant(X)
y = df_peanut['UseCount']

m_peanut = sm.OLS(y, X).fit()
print('Price elasticity for peanuts is', abs(m_peanut.params[1]))

Price elasticity for peanuts is 1.1656066859956145


### BAVARIAN PRETZEL

In [253]:
# Extrat item
bav_pret = dat.loc[dat['MENUITEMNAME'] == 'BAVARIAN PRETZEL', :]

# Sum demand and revenue grouped by game_week and discount type
weights = bav_pret.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()

# Weighted average of price, weighted on demand
weights['weighted_sums'] = weights['UseCount'] * weights['revenue']
weights['uc2'] = weights['UseCount'] ** 2
weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()
weights['weighted_actual_price'] = weights['weighted_sums'] / weights['uc2']

# Obtain total demand by game
demand = bav_pret.groupby(by = ['game_week'])['UseCount'].sum(numeric_only=True).reset_index()
bav_demand_price = pd.merge(left = weights, right = demand, on = 'game_week')
bav_demand_price.drop(labels=['weighted_sums', 'uc2'], axis = 1, inplace=True)

  weights = bav_pret.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()
  weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()


In [254]:
# Add covariates to df
bav_demand_price_controlled = pd.merge(left=bav_demand_price, right=bav_pret.loc[:, ['game_week', 'total_product_rev_nonSTH', 'Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 
       'Discount_Popcorn']], on = 'game_week', how = 'left').drop_duplicates('game_week')

bav_demand_price_controlled.drop(labels=['game_week'], axis = 1, inplace=True)

le = LabelEncoder()
df_bav = deepcopy(bav_demand_price_controlled)

# Label encoding 
df_bav.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 
       'Discount_Popcorn']] = df_bav.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 
       'Discount_Popcorn']].apply(le.fit_transform)

# Take ln
df_bav['weighted_actual_price'] = np.log(df_bav['weighted_actual_price'])
df_bav['UseCount'] = np.log(df_bav['UseCount'])

  df_bav.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',


In [255]:
# Fit linear regression
X = df_bav.drop(labels=['UseCount', 'Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
        'Discount_Nachos', 'Discount_Peanuts',
       'Discount_Popcorn'], axis = 1)
X = sm.add_constant(X)
y = df_bav['UseCount']

m_bav = sm.OLS(y, X).fit()
print('Price elasticity for bavarian pretzels is', abs(m_bav.params[1]))

Price elasticity for bavarian pretzels is 1.320180005868103


### Nachos

In [256]:
# Extract item
nacho = dat.loc[dat['MENUITEMNAME'] == 'NACHOS', :]

# Sum demand and revenue grouped by game_week and discount type
weights = nacho.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()

# Weighted average of price, weighted on demand
weights['weighted_sums'] = weights['UseCount'] * weights['revenue']
weights['uc2'] = weights['UseCount'] ** 2
weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()
weights['weighted_actual_price'] = weights['weighted_sums'] / weights['uc2']

# Obtain total demand by game
demand = nacho.groupby(by = ['game_week'])['UseCount'].sum(numeric_only=True).reset_index()
nacho_demand_price = pd.merge(left = weights, right = demand, on = 'game_week')
nacho_demand_price.drop(labels=['weighted_sums', 'uc2'], axis = 1, inplace=True)

  weights = nacho.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()
  weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()


In [257]:
# Add covariates to df
nacho_demand_price_controlled = pd.merge(left=nacho_demand_price, right=nacho.loc[:, ['game_week', 'total_product_rev_nonSTH', 'Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts',  'Discount_Pretzel',
       'Discount_Popcorn']], on = 'game_week', how = 'left').drop_duplicates('game_week')

nacho_demand_price_controlled.drop(labels=['game_week'], axis = 1, inplace=True)

# Label encoding
le = LabelEncoder()
df_nacho = deepcopy(nacho_demand_price_controlled)

df_nacho.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Pretzel',
       'Discount_Popcorn']] = df_nacho.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Pretzel',
       'Discount_Popcorn']].apply(le.fit_transform)

# Take ln
df_nacho['weighted_actual_price'] = np.log(df_nacho['weighted_actual_price'])
df_nacho['UseCount'] = np.log(df_nacho['UseCount'])

  df_nacho.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',


In [258]:
# Fit linear regression
X = df_nacho.drop(labels=['UseCount', 'Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Pretzel',
       'Discount_Popcorn'], axis = 1)
X = sm.add_constant(X)
y = df_nacho['UseCount']

m_nacho = sm.OLS(y, X).fit()
print('Price elasticity for nachos is', m_nacho.params[1])

Price elasticity for nachos is -2.3024067051594193


### Souv Pop

In [259]:
# Extract item
souv_pop = dat.loc[dat['MENUITEMNAME'] == 'SOUV POPCORN', :]

# Sum demand and revenue grouped by game_week and discount type
weights = souv_pop.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()

# Weighted average of price, weighted on demand
weights['weighted_sums'] = weights['UseCount'] * weights['revenue']
weights['uc2'] = weights['UseCount'] ** 2
weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()
weights['weighted_actual_price'] = weights['weighted_sums'] / weights['uc2']

# Obtain total demand by game
demand = souv_pop.groupby(by = ['game_week'])['UseCount'].sum(numeric_only=True).reset_index()
souv_pop_demand_price = pd.merge(left = weights, right = demand, on = 'game_week')
souv_pop_demand_price.drop(labels=['weighted_sums', 'uc2'], axis = 1, inplace=True)

  weights = souv_pop.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()
  weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()


In [260]:
# Add covariates to df
souv_pop_demand_price_controlled = pd.merge(left=souv_pop_demand_price, right=souv_pop.loc[:, ['game_week', 'total_product_rev_nonSTH', 'Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel']], on = 'game_week', how = 'left').drop_duplicates('game_week')

souv_pop_demand_price_controlled.drop(labels=['game_week'], axis = 1, inplace=True)

# Label Encoding
le = LabelEncoder()
df_souv_pop = deepcopy(souv_pop_demand_price_controlled)

df_souv_pop.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel']] = df_souv_pop.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel']].apply(le.fit_transform)

# Take ln
df_souv_pop['weighted_actual_price'] = np.log(df_souv_pop['weighted_actual_price'])
df_souv_pop['UseCount'] = np.log(df_souv_pop['UseCount'])

  df_souv_pop.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',


In [261]:
# Fit linear regression
X = df_souv_pop.drop(labels=['UseCount', 'Discount_HotDog', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel'], axis = 1)
X = sm.add_constant(X)
y = df_souv_pop['UseCount']

m_souv_pop = sm.OLS(y, X).fit()
print('Price elasticity for souvenir popcorn is', abs(m_souv_pop.params[1]))

Price elasticity for souvenir popcorn is 3.4572599586704804


### Hot Dog

In [262]:
# Extract Item
hotdog = dat.loc[dat['MENUITEMNAME'] == 'HOT DOG', :]

# Sum demand and revenue grouped by game_week and discount type
weights = hotdog.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()

# Weighted average of price, weighted on demand
weights['weighted_sums'] = weights['UseCount'] * weights['revenue']
weights['uc2'] = weights['UseCount'] ** 2
weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()
weights['weighted_actual_price'] = weights['weighted_sums'] / weights['uc2']

# Obtain demand by week
demand = hotdog.groupby(by = ['game_week'])['UseCount'].sum(numeric_only=True).reset_index()
hotdog_demand_price = pd.merge(left = weights, right = demand, on = 'game_week')
hotdog_demand_price.drop(labels=['weighted_sums', 'uc2'], axis = 1, inplace=True)


  weights = hotdog.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()
  weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()


In [263]:
# Add covariates to df
hotdog_demand_price_controlled = pd.merge(left=hotdog_demand_price, right=hotdog.loc[:, ['game_week', 'total_product_rev_nonSTH', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']], on = 'game_week', how = 'left').drop_duplicates('game_week')

hotdog_demand_price_controlled.drop(labels=['game_week'], axis = 1, inplace=True)

# Label encoding
le = LabelEncoder()
df_hotdog = deepcopy(hotdog_demand_price_controlled)

df_hotdog.loc[:, ['Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']] = df_hotdog.loc[:, ['Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']].apply(le.fit_transform)
# Take ln
df_hotdog['weighted_actual_price'] = np.log(df_hotdog['weighted_actual_price'])
df_hotdog['UseCount'] = np.log(df_hotdog['UseCount'])

  df_hotdog.loc[:, ['Discount_SouvCup', 'Discount_BtlWater',


In [264]:
# Fit linear regression
X = df_hotdog.drop(labels=['UseCount', 'Discount_SouvCup', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn'], axis = 1)
X = sm.add_constant(X)
y = df_hotdog['UseCount']

m_hotdog = sm.OLS(y, X).fit()
print('Price elasticity for hot dog is', abs(m_hotdog.params[1]))

Price elasticity for hot dog is 2.6657797963709333


### Bottled Water (non 1L)

In [265]:
# Extract item
btlwater = dat.loc[dat['MENUITEMNAME'] == 'BTL DEJA BLUE', :]

# Sum of demand and revenue grouped by game week and discount type
weights = btlwater.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()

# Weighted average of price, weighted on demand
weights['weighted_sums'] = weights['UseCount'] * weights['revenue']
weights['uc2'] = weights['UseCount'] ** 2
weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()
weights['weighted_actual_price'] = weights['weighted_sums'] / weights['uc2']

# Obtain total demand for item by week
demand = btlwater.groupby(by = ['game_week'])['UseCount'].sum(numeric_only=True).reset_index()
btlwater_demand_price = pd.merge(left = weights, right = demand, on = 'game_week')
btlwater_demand_price.drop(labels=['weighted_sums', 'uc2'], axis = 1, inplace=True)

  weights = btlwater.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()
  weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()


In [266]:
# Add covariates to df
btlwater_demand_price_controlled = pd.merge(left=btlwater_demand_price, right=btlwater.loc[:, ['game_week', 'total_product_rev_nonSTH', 'Discount_HotDog', 'Discount_SouvCup', 
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']], on = 'game_week', how = 'left').drop_duplicates('game_week')

btlwater_demand_price_controlled.drop(labels=['game_week'], axis = 1, inplace=True)

# Label encoding
le = LabelEncoder()
df_btlwater = deepcopy(btlwater_demand_price_controlled)

df_btlwater.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']] = df_btlwater.loc[:, ['Discount_HotDog', 'Discount_SouvCup', 
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']].apply(le.fit_transform)

# Take ln
df_btlwater['weighted_actual_price'] = np.log(df_btlwater['weighted_actual_price'])
df_btlwater['UseCount'] = np.log(df_btlwater['UseCount'])

  df_btlwater.loc[:, ['Discount_HotDog', 'Discount_SouvCup',


In [267]:
df_btlwater

Unnamed: 0,weighted_actual_price,UseCount,total_product_rev_nonSTH,Discount_HotDog,Discount_SouvCup,Discount_Peanuts,Discount_Nachos,Discount_Pretzel,Discount_Popcorn
0,0.966102,6.926577,45272.22256,1,0,0,0,0,0
744,1.519362,5.602119,38448.61573,0,0,1,1,0,0
1015,1.519889,5.877736,31821.46681,0,1,0,0,1,0
1372,0.968717,6.873164,36379.70159,1,0,0,0,0,0
2135,1.517936,5.429346,21822.45198,0,1,0,0,0,1
2363,0.966434,6.660575,27793.35,0,0,0,0,1,0
3013,1.515445,5.036953,21659.5252,1,1,0,0,0,0
3167,0.930445,6.152733,23320.5,1,1,0,0,0,1


In [268]:
# Fit linear regression
X = df_btlwater.drop(labels=['UseCount', 'Discount_HotDog', 'Discount_SouvCup', 
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn'], axis = 1)
X = sm.add_constant(X)
y = df_btlwater['UseCount']

m_btlwater = sm.OLS(y, X).fit()
print('Price elasticity for bottled water is', abs(m_btlwater.params[1]))

Price elasticity for bottled water is 1.7938935329592356


### Souvenir Soda (32 oz)

In [269]:
# Extract item
souv_soda = dat.loc[dat['MENUITEMNAME'] == 'SOUV CUP 32', :]

# Obtain total revenue and demand by game week and discount type
weights = souv_soda.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()

# Weighted average of price, weighted on demand
weights['weighted_sums'] = weights['UseCount'] * weights['revenue']
weights['uc2'] = weights['UseCount'] ** 2
weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()
weights['weighted_actual_price'] = weights['weighted_sums'] / weights['uc2']

# Obtain total demand of item by game_week
demand = souv_soda.groupby(by = ['game_week'])['UseCount'].sum(numeric_only=True).reset_index()
souv_soda_demand_price = pd.merge(left = weights, right = demand, on = 'game_week')
souv_soda_demand_price.drop(labels=['weighted_sums', 'uc2'], axis = 1, inplace=True)

  weights = souv_soda.groupby(by = ['game_week', 'Discount Type'])['UseCount', 'revenue'].sum(numeric_only=True).reset_index()
  weights = weights.groupby(by = ['game_week'])['weighted_sums', 'uc2'].sum().reset_index()


In [270]:
# Add covariates to df
souv_soda_demand_price_controlled = pd.merge(left=souv_soda_demand_price, right=souv_soda.loc[:, ['game_week', 'total_product_rev_nonSTH', 'Discount_HotDog',  'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']], on = 'game_week', how = 'left').drop_duplicates('game_week')

souv_soda_demand_price_controlled.drop(labels=['game_week'], axis = 1, inplace=True)

# Label encoding
le = LabelEncoder()
df_souv_soda = deepcopy(souv_soda_demand_price_controlled)

df_souv_soda.loc[:, ['Discount_HotDog',  'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']] = df_souv_soda.loc[:, ['Discount_HotDog', 'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn']].apply(le.fit_transform)

# Take ln
df_souv_soda['weighted_actual_price'] = np.log(df_souv_soda['weighted_actual_price'])
df_souv_soda['UseCount'] = np.log(df_souv_soda['UseCount'])

  df_souv_soda.loc[:, ['Discount_HotDog',  'Discount_BtlWater',


In [271]:
# Fit linear regression
X = df_souv_soda.drop(labels=['UseCount', 'Discount_HotDog',  'Discount_BtlWater',
       'Discount_Peanuts', 'Discount_Nachos', 'Discount_Pretzel',
       'Discount_Popcorn'], axis = 1)
X = sm.add_constant(X)
y = df_souv_soda['UseCount']

m_souv_soda = sm.OLS(y, X).fit()
print('Price elasticity for soda is', abs(m_souv_soda.params[1]))

Price elasticity for soda is 1.5063282469765022


In [None]:
d = [m_nacho.params, m_souv_pop.params, m_hotdog.params, m_peanut.params, m_bav.params, m_btlwater.params, m_souv_soda.params]
for i in d:
    s = ""
    x = pd.DataFrame(i).reset_index()
    x.columns = ['index', '0']
    for r in range(len(x)):
        if r == 0:
            s = s + str(round(x.loc[:, '0'][r],5)) + " + "
        elif r == len(x) - 1:
            s = s + str(round(x.loc[:, '0'][r],5)) + " * " + x.loc[:, 'index'][r]
        else:
            s = s + str(round(x.loc[:, '0'][r],5)) + " * " + x.loc[:, 'index'][r] + " + "
    print(s)

        

5.53572 + -2.30241 * weighted_actual_price + 0.00029 * total_product_rev_nonSTH
11.00174 + -3.45726 * weighted_actual_price + -6e-05 * total_product_rev_nonSTH
8.19216 + -2.66578 * weighted_actual_price + 4e-05 * total_product_rev_nonSTH
5.00104 + -1.16561 * weighted_actual_price + 9e-05 * total_product_rev_nonSTH
4.66237 + -1.32018 * weighted_actual_price + 9e-05 * total_product_rev_nonSTH
7.29758 + -1.79389 * weighted_actual_price + 3e-05 * total_product_rev_nonSTH
7.09572 + -1.50633 * weighted_actual_price + 4e-05 * total_product_rev_nonSTH


# Question 2

### Note

Here, we fit a new model for every item to find relationship the between demand of that item and price, discount on other items, and total revenue by non STH. I believe it is a good idea to include weighted actual price and total revenue by non STH since they are likely to affect demand of an item, and we want to control for those effects in order to get a better estimate of the effect of discounts on other items. For every dataframe, the column variable is the variable in question, and the effect column is the effect of that variable (raw unit changes), if true, on demand.

### Nachos

In [272]:
import statsmodels.api as sm
import statsmodels.formula.api as smf

l = ['+C(Discount_HotDog)', '+C(Discount_SouvCup)', '+C(Discount_BtlWater)', '+C(Discount_Popcorn)', '+C(Discount_Pretzel)', '+C(Discount_Peanuts)']
pre_formula = 'UseCount ~ weighted_actual_price + total_product_rev_nonSTH'

df = pd.DataFrame(columns = ['item', 'effect'])

for i in range(len(l)):
    f = pre_formula + l[i]
    m = smf.ols(formula = f, data = nacho_demand_price_controlled).fit()
    x = pd.DataFrame(m.params).reset_index()
    x.columns = ['item', 'effect']
    df.loc[i] = x.loc[1,:]
df

Unnamed: 0,item,effect
0,C(Discount_HotDog)[T.Yes],-1.740051
1,C(Discount_SouvCup)[T.Yes],-20.906603
2,C(Discount_BtlWater)[T.Yes],18.356492
3,C(Discount_Popcorn)[T.Yes],0.158951
4,C(Discount_Pretzel)[T.Yes],-2.31179
5,C(Discount_Peanuts)[T.Yes],452.115505


### Souvenir Popcorn

In [273]:
l = ['+C(Discount_HotDog)', '+C(Discount_SouvCup)', '+C(Discount_BtlWater)', '+C(Discount_Nachos)', '+C(Discount_Pretzel)', '+C(Discount_Peanuts)']
pre_formula = 'UseCount ~ weighted_actual_price + total_product_rev_nonSTH'

df = pd.DataFrame(columns = ['item', 'effect'])

for i in range(len(l)):
    f = pre_formula + l[i]
    m = smf.ols(formula = f, data = souv_pop_demand_price_controlled).fit()
    x = pd.DataFrame(m.params).reset_index()
    x.columns = ['item', 'effect']
    df.loc[i] = x.loc[1,:]
df

Unnamed: 0,item,effect
0,C(Discount_HotDog)[T.Yes],-37.585222
1,C(Discount_SouvCup)[T.Yes],-4.404409
2,C(Discount_BtlWater)[T.Yes],-15.900577
3,C(Discount_Nachos)[T.Yes],-19.272897
4,C(Discount_Pretzel)[T.Yes],19.383032
5,C(Discount_Peanuts)[T.Yes],-19.272897


### Hot Dog

In [274]:
l = ['+C(Discount_Popcorn)', '+C(Discount_SouvCup)', '+C(Discount_BtlWater)', '+C(Discount_Nachos)', '+C(Discount_Pretzel)', '+C(Discount_Peanuts)']
pre_formula = 'UseCount ~ weighted_actual_price + total_product_rev_nonSTH'

df = pd.DataFrame(columns = ['item', 'effect'])

for i in range(len(l)):
    f = pre_formula + l[i]
    m = smf.ols(formula = f, data = hotdog_demand_price_controlled).fit()
    x = pd.DataFrame(m.params).reset_index()
    x.columns = ['item', 'effect']
    df.loc[i] = x.loc[1,:]
df

Unnamed: 0,item,effect
0,C(Discount_Popcorn)[T.Yes],9.363358
1,C(Discount_SouvCup)[T.Yes],-139.522623
2,C(Discount_BtlWater)[T.Yes],175.472541
3,C(Discount_Nachos)[T.Yes],129.300218
4,C(Discount_Pretzel)[T.Yes],-45.608869
5,C(Discount_Peanuts)[T.Yes],129.300218


### Peanuts

In [275]:
l = ['+C(Discount_HotDog)', '+C(Discount_SouvCup)', '+C(Discount_BtlWater)', '+C(Discount_Nachos)', '+C(Discount_Pretzel)', '+C(Discount_Popcorn)']
pre_formula = 'UseCount ~ weighted_actual_price + total_product_rev_nonSTH'

df = pd.DataFrame(columns = ['item', 'effect'])

for i in range(len(l)):
    f = pre_formula + l[i]
    m = smf.ols(formula = f, data = peanut_demand_price_controlled).fit()
    x = pd.DataFrame(m.params).reset_index()
    x.columns = ['item', 'effect']
    df.loc[i] = x.loc[1,:]
df

Unnamed: 0,item,effect
0,C(Discount_HotDog)[T.Yes],-2.678562
1,C(Discount_SouvCup)[T.Yes],-24.759446
2,C(Discount_BtlWater)[T.Yes],18.130807
3,C(Discount_Nachos)[T.Yes],217.112949
4,C(Discount_Pretzel)[T.Yes],1.372313
5,C(Discount_Popcorn)[T.Yes],-6.558809


### Pretzel

In [276]:
l = ['+C(Discount_HotDog)', '+C(Discount_SouvCup)', '+C(Discount_BtlWater)', '+C(Discount_Nachos)', '+C(Discount_Popcorn)', '+C(Discount_Peanuts)']
pre_formula = 'UseCount ~ weighted_actual_price + total_product_rev_nonSTH'

df = pd.DataFrame(columns = ['item', 'effect'])

for i in range(len(l)):
    f = pre_formula + l[i]
    m = smf.ols(formula = f, data = bav_demand_price_controlled).fit()
    x = pd.DataFrame(m.params).reset_index()
    x.columns = ['item', 'effect']
    df.loc[i] = x.loc[1,:]
df

Unnamed: 0,item,effect
0,C(Discount_HotDog)[T.Yes],83.247115
1,C(Discount_SouvCup)[T.Yes],33.927558
2,C(Discount_BtlWater)[T.Yes],-49.512555
3,C(Discount_Nachos)[T.Yes],-2.640803
4,C(Discount_Popcorn)[T.Yes],-124.388777
5,C(Discount_Peanuts)[T.Yes],-2.640803


### Bottled Water

In [277]:
l = ['+C(Discount_HotDog)', '+C(Discount_SouvCup)', '+C(Discount_Popcorn)', '+C(Discount_Nachos)', '+C(Discount_Pretzel)', '+C(Discount_Peanuts)']
pre_formula = 'UseCount ~ weighted_actual_price + total_product_rev_nonSTH'

df = pd.DataFrame(columns = ['item', 'effect'])

for i in range(len(l)):
    f = pre_formula + l[i]
    m = smf.ols(formula = f, data = btlwater_demand_price_controlled).fit()
    x = pd.DataFrame(m.params).reset_index()
    x.columns = ['item', 'effect']
    df.loc[i] = x.loc[1,:]
df

Unnamed: 0,item,effect
0,C(Discount_HotDog)[T.Yes],-34.381519
1,C(Discount_SouvCup)[T.Yes],-62.273519
2,C(Discount_Popcorn)[T.Yes],-108.509136
3,C(Discount_Nachos)[T.Yes],-277.160881
4,C(Discount_Pretzel)[T.Yes],78.987141
5,C(Discount_Peanuts)[T.Yes],-277.160881


### Souvenir Soda

In [278]:
l = ['+C(Discount_HotDog)', '+C(Discount_Popcorn)', '+C(Discount_BtlWater)', '+C(Discount_Nachos)', '+C(Discount_Pretzel)', '+C(Discount_Peanuts)']
pre_formula = 'UseCount ~ weighted_actual_price + total_product_rev_nonSTH'

df = pd.DataFrame(columns = ['item', 'effect'])

for i in range(len(l)):
    f = pre_formula + l[i]
    m = smf.ols(formula = f, data = souv_soda_demand_price_controlled).fit()
    x = pd.DataFrame(m.params).reset_index()
    x.columns = ['item', 'effect']
    df.loc[i] = x.loc[1,:]
df

Unnamed: 0,item,effect
0,C(Discount_HotDog)[T.Yes],-102.10841
1,C(Discount_Popcorn)[T.Yes],-0.052599
2,C(Discount_BtlWater)[T.Yes],65.110369
3,C(Discount_Nachos)[T.Yes],-194.696445
4,C(Discount_Pretzel)[T.Yes],197.906563
5,C(Discount_Peanuts)[T.Yes],-194.696445


# Question 3

From question 1, we can obtain the following

In [279]:
names = ['Nachos', 'Souv Popcorn', "Hot Dog", "Peanuts", "Pretzels", "Bottled Water", "Souv Soda 32oz"]
elastic = pd.DataFrame({'item': names, 'elasticity': [m_nacho.params[1], m_souv_pop.params[1], m_hotdog.params[1], m_peanut.params[1], m_bav.params[1], m_btlwater.params[1], m_souv_soda.params[1]]})
elastic['elasticity'] = abs(elastic['elasticity'])
elastic

Unnamed: 0,item,elasticity
0,Nachos,2.302407
1,Souv Popcorn,3.45726
2,Hot Dog,2.66578
3,Peanuts,1.165607
4,Pretzels,1.32018
5,Bottled Water,1.793894
6,Souv Soda 32oz,1.506328


Looking at the elasticity of items above, we can see that peanuts is the least elastic item, while souvenir popcorn is the most elastic item. With this information, the Bears can pay better attention to their pricing strategy of the items. The pricing can be more flexible for peanuts, but should stay relatively inflexible for souvenir popcorn as a slight increase in price can cause a large decrease in demand. 

Also, with the information presented in question 2, the Bears can look at how the existence of discounts on other items affects the demand of one item. By combining these 2 pieces of information, the Bears can develop a better pricing strategy by optimizing the co-occurence of discounts to maximize demand. Another possible use of this information is that, the Bears can optimize inventory by looking at how the demand might vary by the prices that they are setting. There is a caveat to this though, which is detailed in the weakness section of question 4.

# Question 4

Weakness

- The demand and price is split by game week. By doing so, we are essentially looking at 1 single price point (albiet weighted) for every game week. This means that we are assuming the price varies by game, and thus price affects demand.
- Because of splitting by game week, we only have 8 data points to build a linear regression on and to find the price elasticity. With so few data points, it is hard to find a good estimate of the actual value of the coefficient itself.
- Another weakness is that there are too few sources of variation. For instance, only a tiny fraction of customers are club-level with 20% discount. It is difficult to gauge demand for that 20% discount price point.
- The problem of data leakage can arise in the models built for question 2. One of the predictors is the weighted average price of the item in question, which was reasonable to include in the model as, in question 2 only, we want to look at how each of the discounts affect sales of other items. By including weighted average price along with the existence of discounts of other items, we are able to control for the effect of price on the demand of the item, and look solely at the effect of discounts of other items on the demand of one item. However, if we want to use the models proposed in question 2 to *predict* demand, we cannot use weighted average price because that is weighted on the demand of an item, which we do not have during test time and would cause data leakage. There is the same problem with including total sales by non STH.
- The assumption that the existence of one discount on an item does not depend on discounts of other items is a pretty large assumption.

Solution

- Since the discounts are already only redeemable on the app with QR code, the Bears could send out random discounts to the app holders of varying percentages to gather more data on how demand varies at different price points.
- The Bears should also include the sales of the non STH or CL customers to see if no discount changes the demand. This also adds more data by incorporating demands at original price point.
- For each game, the Bears should also include who they’re playing against. The attendance of the games can vary depending on the excitement of the game, which is likely dictated by the Bears’ opponent of a given game. Attendance can also heavily affect the demand for food items, and should thus be added as a control variable.
