# Weekly Challenge 08

*Original URL* https://community.alteryx.com/t5/Weekly-Challenge/Challenge-8-Aggregate-Consumer-Purchases/td-p/36735 and [**My Alteryx Approach**](https://github.com/dsmdavid/Alteryx-Weekly-Challenge/tree/master/submitted/sub_Challenge%2308)

## Brief

### Aggregate Consumer Purchases:

For this week’s exercise we will look at customer purchase behavior to decide if we should offer a “Meal Deal” that would add a side and drink to a purchase of pizza or a burger. The incoming data is larger than usual for these exercises so I have packaged the workflow as an Alteryx Package.

**This week’s Objective:**

In order to decide if we should start including a new "Meal Deal" on our menu we want to study the potential impact on recent transactions. Please identify the number and percentage of orders since July 1, 2013 which include the following categories of food: Pizza OR Burger along with a Side and Drink.

 
**Summary of Data:**

Point of Sale data includes the ticket level information, and the lookup table categorizes items into higher level food categories.


Hint:

Don't forget to join to the lookup table and filter by date.
 

As always we look forward to your feedback and suggestions!

In [1]:
import pandas as pd


## Approach I want to follow:
1. Read the data.
1. Filter and join.
1. Transform and assign "Target".

In [2]:
#Load the look up table
df_lookup = pd.read_csv("./08_files/LookupTable.csv")#,dtype=int)
df_lookup.head()

Unnamed: 0,Desc,Type
0,Bacon Burger,Burger
1,Barbecue Chicken - Large,Pizza
2,Barbecue Chicken - Medium,Pizza
3,Barbecue Chicken - Small,Pizza
4,Buffalo Chicken - Large,Pizza


In [3]:
#Read the dataframe
df = pd.read_csv("./08_files/PointOfSale.csv")#,dtype=int)
df.head()

Unnamed: 0,TicketID,Date,MemberID,Desc,Price
0,100004,2013-01-07,,Mozzarella Sticks,7.0
1,100004,2013-01-07,,Jalapeno Poppers,7.0
2,100004,2013-01-07,,Onion Rings,7.0
3,100004,2013-01-07,,Onion Rings,7.0
4,100004,2013-01-07,,Supreme - Small,9.0


In [4]:
# Keep only values with date >= 2013-07-01 and bring the Type info from the lookup:

temp_df = pd.merge(df[df['Date'] >= "2013-07-01"], df_lookup, on='Desc')
temp_df.head()

Unnamed: 0,TicketID,Date,MemberID,Desc,Price,Type
0,102398,2013-07-01,,House Made Potato Chips,3.0,Side
1,102424,2013-07-01,,House Made Potato Chips,3.0,Side
2,102443,2013-07-01,991857.0,House Made Potato Chips,3.0,Side
3,102463,2013-07-01,,House Made Potato Chips,3.0,Side
4,102464,2013-07-01,,House Made Potato Chips,3.0,Side


In [5]:
# Pivot so that elements of the ticket are in a single row. Keep only the first element of a given type, since we are
# only interested in presence/absence.

piv_df = temp_df.pivot_table(values='Desc', index='TicketID', columns='Type', aggfunc='first')
piv_df.head()

Type,Burger,Drink,Pizza,Salad,Side,Soup
TicketID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
102398,,Soda - Small,Meat Grinder - Small,House Salad - Half,House Made Potato Chips,Soup of the Day - Cup
102415,,Soda - Small,Mt. Hawaiian - Large,House Salad - Half,Onion Rings,Soup of the Day - Cup
102418,,Soda - Large,,Garden Fresh - Medium,,Soup of the Day - Cup
102424,Bacon Burger,Soda - Small,The Works - Large,Garden Fresh - Medium,House Made Potato Chips,
102433,,Soda - Large,Supreme - Large,House Salad - Half,Waffle Fries,Soup of the Day - Cup


In [6]:
# Convert it to a boolean for presence/absence
piv_bool = piv_df.applymap(pd.notna)

#Target: (Pizza OR Burger) & (Side & Drink)
piv_bool['target'] = (piv_bool['Burger'] | piv_bool['Pizza']) & piv_bool['Side'] & piv_bool['Drink']

In [7]:
piv_bool.head()

Type,Burger,Drink,Pizza,Salad,Side,Soup,target
TicketID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
102398,False,True,True,True,True,True,True
102415,False,True,True,True,True,True,True
102418,False,True,False,True,False,True,False
102424,True,True,True,True,True,False,True
102433,False,True,True,True,True,True,True


In [8]:
# Summarize
values = {}
values['total'] = len(piv_bool)
values['target'] = piv_bool['target'].sum()
values['percentage'] = 100 * values['target'] / values['total']

In [9]:
print(values)

{'total': 15497, 'target': 10964, 'percentage': 70.74917726011486}


## Condensed approach:

In [10]:
import time
t1 = time.time()
import pandas as pd

In [11]:
#Input data
df_lookup = pd.read_csv("./08_files/LookupTable.csv")#,dtype=int)
df = pd.read_csv("./08_files/PointOfSale.csv")#,dtype=int)

#Join, filter, pivot, convert to bool
piv_bool = pd.merge(df[df['Date'] >= "2013-07-01"], df_lookup, on='Desc').pivot_table(values='Desc', index='TicketID', columns='Type', aggfunc='first').applymap(pd.notna)
#Apply target definition
piv_bool['target'] = (piv_bool['Burger'] | piv_bool['Pizza']) & piv_bool['Side'] & piv_bool['Drink']

# Summarize
values = {}
values['total'] = len(piv_bool)
values['target'] = piv_bool['target'].sum()
values['percentage'] = 100 * values['target'] / values['total']

print(values)


{'total': 15497, 'target': 10964, 'percentage': 70.74917726011486}


In [12]:
t2 = time.time()
t2-t1

0.5724692344665527