## Introduction

This is my solution using Python's library Pandas for Case Study #1 of Danny Ma's 8 Weeks SQL Challenge. For more details on this challenge, please refer to the README. The rationale of this was to check my answers for SQL and at the same time serves as a comparison between the two.

## Case Study Questions

Each of the following case study questions can be answered using a single SQL statement:

1. What is the total amount each customer spent at the restaurant?
2. How many days has each customer visited the restaurant?
3. What was the first item from the menu purchased by each customer?
4. What is the most purchased item on the menu and how many times was it purchased by all customers?
5. Which item was the most popular for each customer?
6. Which item was purchased first by the customer after they became a member?
7. Which item was purchased just before the customer became a member?
8. What is the total items and amount spent for each member before they became a member?
9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?

First, import pandas and create the data into three dataframes - sales, members, menu

In [1]:
import pandas as pd 
sales_data = [['A', '2021-01-01', '1'],
  ['A', '2021-01-01', '2'],
  ['A', '2021-01-07', '2'],
  ['A', '2021-01-10', '3'],
  ['A', '2021-01-11', '3'],
  ['A', '2021-01-11', '3'],
  ['B', '2021-01-01', '2'],
  ['B', '2021-01-02', '2'],
  ['B', '2021-01-04', '1'],
  ['B', '2021-01-11', '1'],
  ['B', '2021-01-16', '3'],
  ['B', '2021-02-01', '3'],
  ['C', '2021-01-01', '3'],
  ['C', '2021-01-01', '3'],
  ['C', '2021-01-07', '3']]


sales = pd.DataFrame(sales_data, columns = ["customer_id", "order_date", "product_id"])


members_data = [['A', '2021-01-07'],
  ['B', '2021-01-09']]

members = pd.DataFrame(members_data, columns = ["customer_id", "join_date"])

menu_data = [['1', 'sushi', 10],
  ['2', 'curry', 15],
  ['3', 'ramen', 12]]

menu = pd.DataFrame(menu_data, columns = ["product_id", "product_name", "price"])


As the date variables in sales and members were imported as string, we have to convert them into datetime format for manipulation later on. 
This can be done using the to_datetime function in pandas.

In [2]:
sales.order_date = pd.to_datetime(sales.order_date)
members.join_date = pd.to_datetime(members.join_date)

## Solution

#### Q1 : What is the total amount each customer spent at the restaurant? 

In [3]:
sales_menu = pd.merge(sales,menu, left_on=sales.product_id, right_on= menu.product_id)
sales_menu.groupby(by = ['customer_id']).sum()

Unnamed: 0_level_0,price
customer_id,Unnamed: 1_level_1
A,76
B,74
C,36


Ans: Customer A spent $76, 
Customer B spent $74 and 
Customer C spent $36 at the restaurant.

#### Q2: How many days has each customer visited the restaurant?


In [4]:
len(sales.order_date.unique())
sales.groupby(by = ['customer_id']).count()

(sales.groupby('customer_id')['order_date'].nunique())

customer_id
A    4
B    6
C    2
Name: order_date, dtype: int64

Or, equivalently, in table form using 

In [5]:
sales[['customer_id','order_date']].groupby(['customer_id']).agg([ 'nunique'])


Unnamed: 0_level_0,order_date
Unnamed: 0_level_1,nunique
customer_id,Unnamed: 1_level_2
A,4
B,6
C,2


Ans: Customer A visited the restaurant on 4 days, Customer B on 6 days and Customer C on 2 days.

#### Q3: What was the first item from the menu purchased by each customer? 

In [6]:
sales_menu['ranking'] = sales_menu.groupby('customer_id')['order_date'].rank(method = 'min', ascending = True)

sales_menu[sales_menu.ranking == sales_menu.ranking.min()][['customer_id', 'order_date','product_name' ]].sort_values(by = ['customer_id'])

Unnamed: 0,customer_id,order_date,product_name
0,A,2021-01-01,sushi
3,A,2021-01-01,curry
5,B,2021-01-01,curry
12,C,2021-01-01,ramen
13,C,2021-01-01,ramen


Ans: Customer A bought curry and sushi as the first item, Customer B bought curry, and Customer C ramen.

#### Q4: What is the most purchased item on the menu and how many times was it purchased by all customers?
 

In [7]:
most_purchased_item_df = pd.DataFrame(sales_menu[['product_id_x', 'product_name']].groupby(['product_name']).agg('count'))
most_purchased_item_df.rename(columns = {'product_id_x':'Number of times purchased'}, inplace= True)
most_purchased_item_df[most_purchased_item_df['Number of times purchased'] == most_purchased_item_df['Number of times purchased'].max()]


Unnamed: 0_level_0,Number of times purchased
product_name,Unnamed: 1_level_1
ramen,8


The table above shows the number of times each item on the menu was purchased by all customers. <br>
Ans: Ramen is the most purchased item on the menu and it was purchased 8 times by all customers.

##### Q5 : Which item was the most popular for each customer? 

In [8]:
sales_menu_grouped = sales_menu.groupby(['customer_id','product_name']).agg('count', )
sales_menu_grouped['ranking_item'] = sales_menu_grouped.groupby('customer_id')['order_date'].rank(method = 'min', ascending = False)
sales_menu_grouped[sales_menu_grouped.ranking_item== 1][['order_date' ]]


Unnamed: 0_level_0,Unnamed: 1_level_0,order_date
customer_id,product_name,Unnamed: 2_level_1
A,ramen,3
B,curry,2
B,ramen,2
B,sushi,2
C,ramen,3


Ans: The most popular item for Customer A and C is ramen. For customer B, he enjoys ramen, sushi and curry all the same.

#### Q6. Which item was purchased first by the customer after they became a member?


In [9]:
# sales_menu.drop(columns = ['key_0'], inplace = True)
sales_menu_members = sales_menu.merge(members, on='customer_id', how = 'left')
sales_after_membership = sales_menu_members[sales_menu_members.order_date >=sales_menu_members.join_date]
sorted_sales_after_membership = sales_after_membership[['customer_id', 'order_date','product_name' ]].sort_values(by = ['order_date'])
sorted_sales_after_membership.groupby('customer_id').head(1)


Unnamed: 0,customer_id,order_date,product_name
4,A,2021-01-07,curry
2,B,2021-01-11,sushi


Ans: <br>
    After Customer A became a member, curry was the first item purchased. <br>
    After Customer B became a member, sushi was the first item purchased.


#### Q7. Which item was purchased just before the customer became a member?

In [10]:
sales_before_membership = sales_menu_members[sales_menu_members.order_date <sales_menu_members.join_date]
sales_before_membership['ranking'] = sales_before_membership.groupby('customer_id')['order_date'].rank(method = 'min', ascending = False)
sales_before_membership[sales_before_membership.ranking == sales_before_membership.ranking.min()][['customer_id', 'order_date','join_date','product_name' ]].sort_values(by = ['customer_id'])


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,customer_id,order_date,join_date,product_name
0,A,2021-01-01,2021-01-07,sushi
3,A,2021-01-01,2021-01-07,curry
1,B,2021-01-04,2021-01-09,sushi


Ans: Customer A purchased sushi and curry while Customer B purchased sushi just before they became members.

#### Q8. What is the total items and amount spent for each member before they became a member?


In [11]:
sales_before_membership[['customer_id', 'product_id_x', 'price']].groupby(['customer_id']).agg([ 'nunique', 'sum'])


Unnamed: 0_level_0,product_id_x,product_id_x,price,price
Unnamed: 0_level_1,nunique,sum,nunique,sum
customer_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
A,2,12,2,25
B,2,122,2,40


Ans: Customer A spent $25 on 2 items while Customer B spent $40 on 2 items before they became members.

#### Q9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier — how many points would each customer have?

In [12]:
sales_menu.loc[sales_menu.product_name == "sushi", 'points'] = 20 * sales_menu.price
sales_menu.loc[sales_menu.product_name == "curry", 'points'] = 10* sales_menu.price
sales_menu.loc[sales_menu.product_name == "ramen", 'points']= 10* sales_menu.price
sales_menu.groupby('customer_id')[['customer_id','points']].sum()


Unnamed: 0_level_0,points
customer_id,Unnamed: 1_level_1
A,860.0
B,940.0
C,360.0


Ans: With the multiplier point system in place, Customer A, B and C would have 860 points, 940 points and 360 points respectively.

#### Q10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?

In [13]:
sales_menu_members['promo_end_date'] =   pd.Timedelta(days=6) + sales_menu_members.join_date
sales_menu_members.loc[(sales_menu_members.product_name=="sushi"), 'promo_points'] = 2*10*sales_menu_members.price
sales_menu_members.loc[(sales_menu_members.product_name!="sushi"),'promo_points' ] = 10*sales_menu_members.price
sales_menu_members.loc[(sales_menu_members.order_date > sales_menu_members.promo_end_date)&(sales_menu_members.product_name!="sushi"), 'promo_points'] = 10*sales_menu_members.price
sales_menu_members.loc[(sales_menu_members.join_date <= sales_menu_members.order_date) &(sales_menu_members.order_date<= sales_menu_members.promo_end_date), 'promo_points'] = 2*10*sales_menu_members.price

jan_sales = sales_menu_members[(sales_menu_members.order_date< '2021-02-01') & (sales_menu_members.customer_id != "C")]
jan_sales[['customer_id', 'promo_points']].groupby('customer_id').sum()
# jan_sales.sort_values('customer_id')

Unnamed: 0_level_0,promo_points
customer_id,Unnamed: 1_level_1
A,1370.0
B,820.0


Ans: Customer A has 1370 points and Customer B has 820  at the end of January.