## Objective
Eats4Life would like to update its menu to include wine suggestions with each of its main entrees (defined by the meat selection). The owner would like to take a Data Analytics approach and explore data he collected over the past several years on main courses (meat) and wine that was ordered with it. Eats4Life is open to listing more than one wine for each main entree, but only if the data supports it. The scope of services requested includes:

- Summary information on the main entrees (meat)
- Wine suggestion(s) for **each** main entree along with supporting information as to why this (these) wines are suggested for the entrée (if you have no suggested wine for a given entrée, provide information as to why this is your suggestion)
- Any other information of interest in terms of customer order habits

## Data Provided
The dataset `orderData.csv` has three columns:

- `orderNo` – identifies each table/party that sat at the restaurant
- `seatNo` – indicates which seat at the table ordered each meal
- `item` – the item that was ordered

The data has been cleaned, so that each order contains three items per individual: a meat, a side, and a wine.

In [18]:
import pandas as pd
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import apriori

In [5]:
df = pd.read_csv('https://raw.githubusercontent.com/sjsimmo2/DataMining-Fall/refs/heads/master/orderData.csv')

In [10]:
print(df.shape)
df.head()

(228699, 3)


Unnamed: 0,orderNo,seatNo,item
0,122314,1,Salmon
1,122314,1,Oyster Bay Sauvignon Blanc
2,122314,1,Bean Trio
3,122314,2,Pork Chop
4,122314,2,Three Rivers Red


## Data Processing

In [31]:
#create a dummy varriable for each item
df_1 = pd.get_dummies(df["item"])*1

#add the original order number to the new df
df_1["orderNo"] = df["orderNo"]
#add the original seat number to the new df
df_1['seatNo'] = df['seatNo']

#group by orderNo and seatNo, then calculates the maximum value for each col
df_1 = df_1.groupby(['orderNo', 'seatNo']).max()

#convert the dummy vars back to boolean 
preprocessed_df = df_1.map(bool)

preprocessed_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Adelsheim Pinot Noir,Bean Trio,Blackstone Merlot,Brancott Pinot Grigio,Caesar Salad,Cantina Pinot Bianco,Duck Breast,Duckhorn Chardonnay,Echeverria Gran Syrah,Filet Mignon,...,Roasted Potatoes,Roasted Root Veg,Salmon,Sea Bass,Seasonal Veg,Single Vineyard Malbec,Swordfish,Three Rivers Red,Total Recall Chardonnay,Warm Goat Salad
orderNo,seatNo,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
122314,1,False,True,False,False,False,False,False,False,False,False,...,False,False,True,False,False,False,False,False,False,False
122314,2,False,False,False,False,True,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
122314,3,False,False,False,False,True,False,False,False,False,False,...,False,False,False,True,False,False,False,False,False,False
122314,4,False,True,False,False,False,False,False,False,False,False,...,False,False,False,True,False,False,False,False,True,False
122314,5,False,True,False,False,False,False,True,False,False,False,...,False,False,False,False,False,False,False,False,False,False
