# Bars Menu Creation

Being able to detect price drop throughout the day in a specific bar would be of much help for calculating the `happy_hour` feature.

Note that prices for an item (`title`) may vary with `bar_id`, `order_time` (`day_of_week` and `time_of_day`).  
Events may interfere, we're assuming no event occurred during the period of July 14-21 2019.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv("../data/original_data.csv")

---------------

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1754466 entries, 0 to 1754465
Data columns (total 26 columns):
city                           object
bar_id                         int64
order_id                       int64
order_time                     object
order_item_id                  int64
title                          object
category_id                    float64
beer_brand_id                  int64
beer_serving_type_id           int64
beer_volume                    float64
item_qty                       float64
sales_before_tax               float64
sales_inc_tax                  float64
guest_count                    int64
waiter_id                      float64
country                        object
country_id                     int64
state                          object
state_id                       int64
timezone                       object
bar_type_id                    int64
status                         int64
last_status                    int64
is_bulk           

In [4]:
df.drop(['order_id', 'order_item_id', 'sales_inc_tax', 'guest_count', 'waiter_id', 'country_id', 'state_id', 'timezone',
         'bar_type_id', 'status', 'last_status', 'is_bulk', 'data_availability_status_id'], axis=1, inplace=True)

In [5]:
df['order_time'] = pd.to_datetime(df['order_time'], format='%Y-%m-%d %H:%M:%S.%f')

In [6]:
df['day_of_week'] = df.order_time.apply(lambda ticket: ticket.day_name())
df['time_of_day'] = df.order_time.apply(lambda ticket: ticket.time())

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1754466 entries, 0 to 1754465
Data columns (total 15 columns):
city                    object
bar_id                  int64
order_time              datetime64[ns]
title                   object
category_id             float64
beer_brand_id           int64
beer_serving_type_id    int64
beer_volume             float64
item_qty                float64
sales_before_tax        float64
country                 object
state                   object
bar_type                object
day_of_week             object
time_of_day             object
dtypes: datetime64[ns](1), float64(4), int64(3), object(7)
memory usage: 200.8+ MB


In [8]:
df.head()

Unnamed: 0,city,bar_id,order_time,title,category_id,beer_brand_id,beer_serving_type_id,beer_volume,item_qty,sales_before_tax,country,state,bar_type,day_of_week,time_of_day
0,Trois-Rivières,2182,2019-07-18 09:12:44,BACON,2.0,0,0,0.0,1.0,0.0,Canada,Québec,Casual Dining,Thursday,09:12:44
1,Trois-Rivières,2182,2019-07-18 09:12:44,PAIN BLANC,2.0,0,0,0.0,1.0,0.0,Canada,Québec,Casual Dining,Thursday,09:12:44
2,Trois-Rivières,2182,2019-07-18 09:12:44,GRASSE MATINEE,5.0,0,0,0.0,1.0,6.5,Canada,Québec,Casual Dining,Thursday,09:12:44
3,Trois-Rivières,2182,2019-07-18 09:12:44,BROUILLÉ,2.0,0,0,0.0,1.0,0.0,Canada,Québec,Casual Dining,Thursday,09:12:44
4,Port Stanley,3383,2019-07-18 12:19:29,Pickeral & Chips,2.0,0,0,0.0,1.0,17.99,Canada,Ontario,Bar/Pub,Thursday,12:19:29


Remove "free" dishes:

In [9]:
before_size = len(df)

In [10]:
df.drop(df[df['sales_before_tax'] <= 0].index, inplace=True)

In [11]:
after_size = len(df)

In [12]:
float(after_size) / before_size

0.8097979670167447

In [13]:
print("Got rid of {0:.2f}% of the entries".format(100 * (1 - float(after_size) / before_size)))

Got rid of 19.02% of the entries


Fix each entry to have `item_qty` of 1:
- Divide `beer_volume` by `item_qty`
- Divide `sales_before_tax` by `item_qty`

Remove entries with `item_qty` <= 0:

In [14]:
df.drop(df[df['item_qty'] <= 0].index, inplace=True)

In [15]:
df['beer_volume'] = df['beer_volume'] / df['item_qty']
df['sales_before_tax'] = df['sales_before_tax'] / df['item_qty']
df['item_qty'] = 1

In [16]:
df.head()

Unnamed: 0,city,bar_id,order_time,title,category_id,beer_brand_id,beer_serving_type_id,beer_volume,item_qty,sales_before_tax,country,state,bar_type,day_of_week,time_of_day
2,Trois-Rivières,2182,2019-07-18 09:12:44,GRASSE MATINEE,5.0,0,0,0.0,1,6.5,Canada,Québec,Casual Dining,Thursday,09:12:44
4,Port Stanley,3383,2019-07-18 12:19:29,Pickeral & Chips,2.0,0,0,0.0,1,17.99,Canada,Ontario,Bar/Pub,Thursday,12:19:29
5,Port Stanley,3383,2019-07-18 12:19:29,Lun Steak Sandwich,2.0,0,0,0.0,1,12.99,Canada,Ontario,Bar/Pub,Thursday,12:19:29
6,Port Stanley,3383,2019-07-18 12:19:29,Canadian,1.0,183,1,0.34,1,4.92,Canada,Ontario,Bar/Pub,Thursday,12:19:29
8,Saint-Jean-sur-Richelieu,5130,2019-07-18 12:06:00,BAVETTE BISTRO,2.0,0,0,0.0,1,20.0,Canada,Québec,Restaurant,Thursday,12:06:00


----------

## Food Price Calculation

We start off with calculating a list of prices per title for a bar.  
Having this list, we'd decide on the official price (probably the highest).

In [17]:
from collections import defaultdict

In [18]:
prices = defaultdict(list)

In [19]:
bars = list(set(df['bar_id']))

In [20]:
for bar in bars:
    prices[bar] = defaultdict(list)

In [21]:
for index, order in df.iterrows():
    prices[order['bar_id']][order['title']].append(order['sales_before_tax'])

In [22]:
for bar in bars:
    for item in prices[bar]:
        prices[bar][item] = max(prices[bar][item])

In [23]:
# for bar in bars:
#     for item in prices[bar]:
#         print("Bar: {}, Item: {}, Price: {}".format(bar, item, prices[bar][item]))
#     break

Bar: 2054, Item: ROULEAUX IMPERIAUX, Price: 6.0
Bar: 2054, Item: MAKI-PRINTANIER, Price: 21.95
Bar: 2054, Item: TATAKI, Price: 8.75
Bar: 2054, Item: SOUPE MISO, Price: 3.0
Bar: 2054, Item: POKE SAUMON, Price: 13.0
Bar: 2054, Item: PETONCLE/MANGUE, Price: 8.0
Bar: 2054, Item: SANGRIA PICHET, Price: 17.575757575757574
Bar: 2054, Item: EXTRA SCE L'UNIQUE, Price: 1.0
Bar: 2054, Item: CB TOUR DE LA GASPESIE, Price: 42.96
Bar: 2054, Item: CREVETTE TEMPURA, Price: 7.95
Bar: 2054, Item: SEPT-ÎLES EPICE, Price: 8.09090909090909
Bar: 2054, Item: CB PORTE ENFER, Price: 25.95
Bar: 2054, Item: SUSHI PIZZA, Price: 9.333333333333334
Bar: 2054, Item: ROULEAU HOMARD, Price: 8.95
Bar: 2054, Item: SALADE SAUMON, Price: 17.0
Bar: 2054, Item: CREVETTE COCO, Price: 8.5
Bar: 2054, Item: SANTA, Price: 27.93939393939394
Bar: 2054, Item: ROULEAU TATAKI, Price: 7.95
Bar: 2054, Item: TART SAUMON PLAT, Price: 20.0
Bar: 2054, Item: BAVAROIS LYCHEES, Price: 5.0
Bar: 2054, Item: LIMOMADE, Price: 2.63
Bar: 2054, Item:

-------------

## Beer Price Calculation

As a starter, we'll start with beers (getting warmer).

In [24]:
# basically all features except `city`, `country`, `state`
beer_columns = ['bar_id', 'order_time', 'day_of_week', 'time_of_day', 'title', 'beer_brand_id',
                'beer_serving_type_id', 'beer_volume', 'item_qty', 'sales_before_tax', 'bar_type']

beers = df[df['category_id'] == 1]
beers = beers[beer_columns]
beers.sort_values(by='bar_id', inplace=True)
beers.reset_index(drop=True, inplace=True)