# Measure Bin Guest Count 1 Heuristics Accuracy

Our occasions for bin 1:

1. `Lunch`
2. `Dinner`
3. `Casual Drink and Meal`
4. `Drinking`
5. `Not 1`
6. `Unknown`

We're going to pick specific labeled tables, and see how our heuristics is doing.

In [1]:
data_map = {
    "hockey": "../data/hockey_3_text_processed.csv",
    "valentine": "../data/valentine_3_text_processed.csv",
    "silvester": "../data/silvester_3_text_processed.csv"
}

In [2]:
LUNCH = "LUNCH"
MUNCH = "MUNCH"
DINNER = "DINNER"
DRINKING = "DRINKING"
CASUAL_DRINK = "CASUAL_DRINK"
NOT_1 = "NOT_1"
UNK = "UNK"

The following are taken from the *annotations/occasions_annotations_bin1 notebook.

The annotations are split between dev and test.

### Dev

In [3]:
labeled_tables_dev_map = {
    "hockey": [
        (512690383, CASUAL_DRINK),
        (521702519, DRINKING),
        (521093892, DRINKING),
        (520105316, NOT_1),
        (521783372, DINNER),
        (521769692, CASUAL_DRINK),
        (512852707, CASUAL_DRINK),
        (525538989, CASUAL_DRINK),
        (514327163,LUNCH),
        (520255453,MUNCH),
        (511390190,DINNER),
        (525643037,CASUAL_DRINK),
        (525621948,CASUAL_DRINK),
        (525122551,DINNER),
        (521589076,LUNCH),
        (514577448,DINNER),
        (512769146,DINNER)
    ],
    "valentine": [
        (434780854,DRINKING),
        (434753224,MUNCH),
        (447023765,LUNCH),
        (434766693,CASUAL_DRINK),
        (446799837,DINNER),
        (448086739,DINNER),
        (447551853,CASUAL_DRINK),
        (447019925,LUNCH),
        (447168624,CASUAL_DRINK),
        (447444332,UNK),
        (435140033,CASUAL_DRINK),
        (448121768,DINNER),
        (434399752,CASUAL_DRINK),
        (446906440,NOT_1),
        (447033893,UNK),
        (447052748,UNK),
        (447729251,LUNCH)
    ],
    "silvester": [
        (363436344,DINNER),
        (361891515,NOT_1),
        (361024418,DINNER),
        (360026487,DINNER),
        (359644172,DRINKING),
        (360384742,DINNER),
        (360472971,DINNER),
        (361700721,UNK),
        (362619447,DRINKING),
        (363618332,LUNCH),
        (359433225,UNK),
        (362233587,DINNER),
        (361562102,DRINKING),
        (362692290,DINNER),
        (362879170,LUNCH),
        (363631395,LUNCH),
        (363620521,LUNCH)
    ]
}

### Test

In [4]:
labeled_tables_test_map = {
    "hockey": [
        (520256354,DRINKING),
        (514474566,CASUAL_DRINK),
        (522554624,NOT_1),
        (522777463,CASUAL_DRINK),
        (519619252,CASUAL_DRINK),
        (519598706,DINNER),
        (512856854,DRINKING),
        (520171895,DINNER)
    ],
    "valentine": [
        (434752175,DINNER),
        (447067242,DINNER),
        (435110243,DINNER),
        (434681977,UNK),
        (447595398,DRINKING),
        (447608841,DINNER),
        (447650659,CASUAL_DRINK),
        (447372783,LUNCH)
    ],
    "silvester": [
        (363870533,DRINKING),
        (361923715,MUNCH),
        (361814929,DINNER),
        (363737552,DINNER),
        (363753228,DINNER),
        (363829806,NOT_1),
        (362363734,NOT_1),
        (363270728,CASUAL_DRINK)
    ]
}

------

In [5]:
import pandas as pd

Import the classifier function `1 Classifier`:

In [6]:
from bin_1 import Bin1Classifier as Classifier
classifier = Classifier()

-----

### Results per Table:

In [7]:
tables = ["hockey", "valentine", "silvester"]

In [8]:
results = {}
results_new = {}

In [9]:
from occasion_classifier import shrink_orders_to_table

In [10]:
for table in tables:
    print("Running for", table)
    df_path = data_map[table]
    df = pd.read_csv(df_path)
    labeled_tables = labeled_tables_dev_map[table]
    
    results[table] = []
    for order_id, true_ocassion in labeled_tables:
        orders = df[df.order_id == order_id]
        orders = shrink_orders_to_table(orders)
        pred_occasion = classifier.classify(orders)
        results[table].append((order_id, true_ocassion, pred_occasion))

Running for hockey
Running for valentine
Running for silvester


#### Show the results:

In [11]:
columns = ["order_id", "true_occasion", "pred_occasion"]

In [12]:
def color(data):
    correct = data["true_occasion"] in data["pred_occasion"]
    if correct:
        color = "#58f200"
    else:
        color = "#ee1300"

    return ["background-color: %s" % color] * len(data.values)

In [13]:
table = "hockey"

df = pd.DataFrame(results[table], columns=columns)
df.index += 1
df.style.apply(color, axis=1)
# df.to_csv(str(table) + "_3to5_test_results.csv", index=False)

Unnamed: 0,order_id,true_occasion,pred_occasion
1,512690383,CASUAL_DRINK,CASUAL_DRINK
2,521702519,DRINKING,DRINKING
3,521093892,DRINKING,DRINKING
4,520105316,NOT_1,NOT_1
5,521783372,DINNER,DINNER
6,521769692,CASUAL_DRINK,DINNER
7,512852707,CASUAL_DRINK,CASUAL_DRINK
8,525538989,CASUAL_DRINK,DINNER
9,514327163,LUNCH,LUNCH
10,520255453,MUNCH,MUNCH


MISTAKES:

* 521769692 - I called it a casual drink because the person first ordered a beer, drank for 40 minutes, ordered a salad and then after another hour another beer.
* 525538989 - same for this one, two beers are drank first, and then a cheesbuerger is ordered in the last step.
* 511390190 - this one was unk because there was a single meal ordered that cost 5.9 before tax, a veggie burger. So it didnt pass the rule of a singel large meal for dinner.

In [14]:
table = "valentine"

df = pd.DataFrame(results[table], columns=columns)
df.index += 1
df.style.apply(color, axis=1)
# df.to_csv(str(table) + "_3to5_test_results.csv", index=False)

Unnamed: 0,order_id,true_occasion,pred_occasion
1,434780854,DRINKING,DRINKING
2,434753224,MUNCH,MUNCH
3,447023765,LUNCH,LUNCH
4,434766693,CASUAL_DRINK,CASUAL_DRINK
5,446799837,DINNER,DINNER
6,448086739,DINNER,DINNER
7,447551853,CASUAL_DRINK,CASUAL_DRINK
8,447019925,LUNCH,LUNCH
9,447168624,CASUAL_DRINK,CASUAL_DRINK
10,447444332,UNK,UNK


In [15]:
table = "silvester"

df = pd.DataFrame(results[table], columns=columns)
df.index += 1
df.style.apply(color, axis=1)
# df.to_csv(str(table) + "_3to5_test_results.csv", index=False)

Unnamed: 0,order_id,true_occasion,pred_occasion
1,363436344,DINNER,DINNER
2,361891515,NOT_1,NOT_1
3,361024418,DINNER,DINNER
4,360026487,DINNER,DINNER
5,359644172,DRINKING,DRINKING
6,360384742,DINNER,DINNER
7,360472971,DINNER,NOT_1
8,361700721,UNK,UNK
9,362619447,DRINKING,DRINKING
10,363618332,LUNCH,LUNCH


MISTAKES:
* 360472971 - counted 5 items per meal step, where in reality it should have been three, one was a note and 0 dollars, and another a side for a large meal.


# TEST

In [16]:
tables = ["hockey", "valentine", "silvester"]
results = {}
results_new = {}
for table in tables:
    print("Running for", table)
    df_path = data_map[table]
    df = pd.read_csv(df_path)
    labeled_tables = labeled_tables_test_map[table]
    
    results[table] = []
    for order_id, true_ocassion in labeled_tables:
        orders = df[df.order_id == order_id]
        orders = shrink_orders_to_table(orders)
        pred_occasion = classifier.classify(orders)
        results[table].append((order_id, true_ocassion, pred_occasion))

Running for hockey
Running for valentine
Running for silvester


In [17]:
table = "hockey"

df = pd.DataFrame(results[table], columns=columns)
df.index += 1
df.style.apply(color, axis=1)
# df.to_csv(str(table) + "_3to5_test_results.csv", index=False)

Unnamed: 0,order_id,true_occasion,pred_occasion
1,520256354,DRINKING,DRINKING
2,514474566,CASUAL_DRINK,CASUAL_DRINK
3,522554624,NOT_1,NOT_1
4,522777463,CASUAL_DRINK,CASUAL_DRINK
5,519619252,CASUAL_DRINK,CASUAL_DRINK
6,519598706,DINNER,DINNER
7,512856854,DRINKING,LUNCH
8,520171895,DINNER,DINNER


512856854 - guy ordered two beers right of the bat, then a meal, then another beer so I called it drinking. But based on our rules, if its three beers and a meal then its still casualm

In [18]:
table = "valentine"

df = pd.DataFrame(results[table], columns=columns)
df.index += 1
df.style.apply(color, axis=1)
# df.to_csv(str(table) + "_3to5_test_results.csv", index=False)

Unnamed: 0,order_id,true_occasion,pred_occasion
1,434752175,DINNER,DINNER
2,447067242,DINNER,DINNER
3,435110243,DINNER,DINNER
4,434681977,UNK,UNK
5,447595398,DRINKING,DRINKING
6,447608841,DINNER,CASUAL_DRINK
7,447650659,CASUAL_DRINK,DINNER
8,447372783,LUNCH,LUNCH


447608841 - could also be casual drink, problem is that the order is around 4:30pm, and in the classifier that cant be a dinner, but based on the food I call it dinner.

In [19]:
table = "silvester"

df = pd.DataFrame(results[table], columns=columns)
df.index += 1
df.style.apply(color, axis=1)
# df.to_csv(str(table) + "_3to5_test_results.csv", index=False)

Unnamed: 0,order_id,true_occasion,pred_occasion
1,363870533,DRINKING,DRINKING
2,361923715,MUNCH,MUNCH
3,361814929,DINNER,CASUAL_DRINK
4,363737552,DINNER,DINNER
5,363753228,DINNER,CASUAL_DRINK
6,363829806,NOT_1,NOT_1
7,362363734,NOT_1,NOT_1
8,363270728,CASUAL_DRINK,CASUAL_DRINK
