# Introduction
Anny seriously loves Japanese food so in the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods: **sushi, curry and ramen.**

Anny’s Diner is in need of your assistance to help the restaurant stay afloat - the restaurant has captured some very basic data from their few months of operation but have no idea how to use their data to help them run the business.

## Problem_statement
Anny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers.

He plans on using these insights to help him decide whether he should expand the existing customer loyalty program - additionally he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.

Danny has provided you with a sample of his overall customer data due to privacy issues - but he hopes that these examples are enough for you to write fully functioning pandas code  to help him answer his questions!

Anny has shared with you 3 key datasets for this case study:

- sales
- menu
- members

## Bring in the necessary libraries for your work. Import the tools and resources needed to accomplish your tasks.

In [None]:
# importing the required libraries
import numpy as np
import pandas as pd

## Import the necessary data for analysis. Bring in the information that you need to examine and draw insights from.

In [None]:
# importing the required data
sales = pd.read_csv(r"C:\Users\THINKPAD\Desktop\PANDAS_CASE_STUDY\Case_study_1\sales.csv")
menu = pd.read_csv(r"C:\Users\THINKPAD\Desktop\PANDAS_CASE_STUDY\Case_study_1\menu.csv")
members = pd.read_csv(r"C:\Users\THINKPAD\Desktop\PANDAS_CASE_STUDY\Case_study_1\members.csv")

# Explore the details of all datasets by checking their information

In [None]:
sales

Unnamed: 0,customer_id,order_date,product_id
0,A,2021-01-01,1
1,A,2021-01-01,2
2,A,2021-01-07,2
3,A,2021-01-10,3
4,A,2021-01-11,3
5,A,2021-01-11,3
6,B,2021-01-01,2
7,B,2021-01-02,2
8,B,2021-01-04,1
9,B,2021-01-11,1


In [None]:
sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   customer_id  15 non-null     object
 1   order_date   15 non-null     object
 2   product_id   15 non-null     int64 
dtypes: int64(1), object(2)
memory usage: 492.0+ bytes


In [None]:
menu

Unnamed: 0,product_id,product_name,price
0,1,sushi,10
1,2,curry,15
2,3,ramen,12


In [None]:
menu.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   product_id    3 non-null      int64 
 1   product_name  3 non-null      object
 2   price         3 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 204.0+ bytes


In [None]:
members

Unnamed: 0,customer_id,join_date
0,A,2021-01-07
1,B,2021-01-09


In [None]:
members.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   customer_id  2 non-null      object
 1   join_date    2 non-null      object
dtypes: object(2)
memory usage: 164.0+ bytes


### Make sure that each type of information (like numbers or dates) is stored in the correct way. This helps ensure that the data is accurate and ready for analysis, making your work more reliable and meaningful.


In [None]:
sales['order_date']=pd.to_datetime(sales['order_date'])

In [None]:
sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   customer_id  15 non-null     object        
 1   order_date   15 non-null     datetime64[ns]
 2   product_id   15 non-null     int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 492.0+ bytes


In [None]:
members['join_date'] = members['join_date'].apply(pd.to_datetime)

In [None]:
members.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   customer_id  2 non-null      object        
 1   join_date    2 non-null      datetime64[ns]
dtypes: datetime64[ns](1), object(1)
memory usage: 164.0+ bytes


### 1. What is the total amount each customer spent at the restaurant?

In [None]:
# Merge the DataFrames
merged_df = pd.merge(sales, menu, on='product_id') # inner join
merged_df

Unnamed: 0,customer_id,order_date,product_id,product_name,price
0,A,2021-01-01,1,sushi,10
1,B,2021-01-04,1,sushi,10
2,B,2021-01-11,1,sushi,10
3,A,2021-01-01,2,curry,15
4,A,2021-01-07,2,curry,15
5,B,2021-01-01,2,curry,15
6,B,2021-01-02,2,curry,15
7,A,2021-01-10,3,ramen,12
8,A,2021-01-11,3,ramen,12
9,A,2021-01-11,3,ramen,12


In [None]:

# Merge the DataFrames
merged_df = pd.merge(sales, menu, on='product_id') # inner join

# Perform the aggregation
result_df = merged_df.groupby('customer_id')['price'].sum().reset_index()

#renamming
result_df = result_df.rename(columns={'customer_id': 'customer_id', 'price': 'total_sales'})

# Sort the result DataFrame by customer_id
result_df = result_df.sort_values(by='customer_id')

# Display the result
result_df

Unnamed: 0,customer_id,total_sales
0,A,76
1,B,74
2,C,36


- Customer A spent \$76
- Customer B spent \$74
- Customer C spent $36


### 2.How many days has each customer visited the restaurant?


In [None]:
# Perform the aggregation
visit_count = sales.groupby('customer_id')['order_date'].nunique().reset_index()

# Rename the columns to match the SQL query
visit_count = visit_count.rename(columns={'customer_id': 'customer_id', 'order_date': 'visit_count'})

visit_count


Unnamed: 0,customer_id,visit_count
0,A,4
1,B,6
2,C,2


### insights
- Customer A visited 4 times.
- Customer B visited 6 times.
- Customer C visited 2 times.

### 3. What was the first item from the menu purchased by each customer?

In [None]:
# Merge the DataFrames
merged_df = pd.merge(sales, menu, on='product_id')
merged_df

Unnamed: 0,customer_id,order_date,product_id,product_name,price
0,A,2021-01-01,1,sushi,10
1,B,2021-01-04,1,sushi,10
2,B,2021-01-11,1,sushi,10
3,A,2021-01-01,2,curry,15
4,A,2021-01-07,2,curry,15
5,B,2021-01-01,2,curry,15
6,B,2021-01-02,2,curry,15
7,A,2021-01-10,3,ramen,12
8,A,2021-01-11,3,ramen,12
9,A,2021-01-11,3,ramen,12


In [None]:
# Use DENSE_RANK() equivalent in Pandas
merged_df['rank'] = merged_df.groupby('customer_id')['order_date'].rank(method='dense')
merged_df

Unnamed: 0,customer_id,order_date,product_id,product_name,price,rank
0,A,2021-01-01,1,sushi,10,1.0
1,B,2021-01-04,1,sushi,10,3.0
2,B,2021-01-11,1,sushi,10,4.0
3,A,2021-01-01,2,curry,15,1.0
4,A,2021-01-07,2,curry,15,2.0
5,B,2021-01-01,2,curry,15,1.0
6,B,2021-01-02,2,curry,15,2.0
7,A,2021-01-10,3,ramen,12,3.0
8,A,2021-01-11,3,ramen,12,4.0
9,A,2021-01-11,3,ramen,12,4.0


In [None]:
# Filter rows where rank is 1
first_purchase_df = merged_df[merged_df['rank'] == 1][['customer_id', 'product_name']]

first_purchase_df

Unnamed: 0,customer_id,product_name
0,A,sushi
3,A,curry
5,B,curry
12,C,ramen
13,C,ramen


In [None]:
# Group by customer_id and get the first purchased item
first_purchase_df = first_purchase_df.groupby('customer_id').first().reset_index()
first_purchase_df


Unnamed: 0,customer_id,product_name
0,A,sushi
1,B,curry
2,C,ramen


In [None]:
# Merge the DataFrames
merged_df = pd.merge(sales, menu, on='product_id')

# Use DENSE_RANK() equivalent in Pandas
merged_df['rank'] = merged_df.groupby('customer_id')['order_date'].rank(method='dense')

# Filter rows where rank is 1
first_purchase_df = merged_df[merged_df['rank'] == 1][['customer_id', 'product_name']]

# Optionally, drop the 'rank' column if you don't need it in the final result
first_purchase_df = first_purchase_df.drop(columns='rank', errors='ignore')

# Group by customer_id and get the first purchased item
first_purchase_df = first_purchase_df.groupby('customer_id').first().reset_index()
first_purchase_df

Unnamed: 0,customer_id,product_name
0,A,sushi
1,B,curry
2,C,ramen


- Customer A’s first order are curry and sushi.
- Customer B’s first order is curry.
- Customer C’s first order is ramen.

## Difference between rank and dense rank

## Rank:

Rank is a simple method of assigning ranks to values where tied values receive the same rank, and the next value receives a rank incremented by the number of tied values.
For example, if two values are tied for the second-highest value, they both receive a rank of 2, and the next distinct value receives a rank of 4 (skipping 3).
EX:
- Values: 10, 20, 30, 30, 40

-    Rank:     1,  2,  3,  3,  5

## Dense Rank:

Dense rank, on the other hand, does not skip ranks for tied values. Tied values all receive the same rank, and the next distinct value receives the next rank without any gaps.
It is "dense" in the sense that there are no gaps in the sequence of ranks.

Ex:
- Values:      10, 20, 30, 30, 40

- Dense Rank:  1,  2,  3,  3,  4

### 4.What is the most purchased item on the menu and how many times was it purchased by all customers?

In [None]:
sales['product_id'].value_counts()

product_id
3    8
2    4
1    3
Name: count, dtype: int64

In [None]:
menu

Unnamed: 0,product_id,product_name,price
0,1,sushi,10
1,2,curry,15
2,3,ramen,12


In [None]:
result_df = (
    pd.merge(sales, menu, on='product_id')
    .groupby('product_name')
    .size()
    .reset_index(name='most_purchased_item')
    .sort_values(by='most_purchased_item', ascending=False)
    .head(1)
)
result_df

Unnamed: 0,product_name,most_purchased_item
1,ramen,8


In [None]:
pd.merge(sales, menu, on='product_id').groupby('product_name').size().reset_index(name='most_purchased_item').sort_values(by='most_purchased_item', ascending=False).head(1)


Unnamed: 0,product_name,most_purchased_item
1,ramen,8


In [None]:
merged = pd.merge(sales, menu, on='product_id')

g = merged.groupby('product_name')

g.size()


product_name
curry    4
ramen    8
sushi    3
dtype: int64

- The most purchased item on the menu is ramen which is 8 times. Yummy!

In [None]:
sales.groupby('product_id').get_group(3)['customer_id'].value_counts()

customer_id
A    3
C    3
B    2
Name: count, dtype: int64

In [None]:
most_popular_df = (
    pd.merge(menu, sales, on='product_id')
    .groupby(['customer_id', 'product_name'])
    #.size()
    #.reset_index(name='order_count')
)
most_popular_df.size().reset_index(name='order_count')

Unnamed: 0,customer_id,product_name,order_count
0,A,curry,2
1,A,ramen,3
2,A,sushi,1
3,B,curry,2
4,B,ramen,2
5,B,sushi,2
6,C,ramen,3


### 5.Which item was the most popular for each customer?

In [None]:
most_popular_df = (
    pd.merge(menu, sales, on='product_id')
    .groupby(['customer_id', 'product_name'])
    .size()
    .reset_index(name='order_count')
)

most_popular_df['rank'] = (
    most_popular_df.groupby('customer_id')['order_count']
    .rank(method='dense', ascending=False)
)

result_df = most_popular_df[most_popular_df['rank'] == 1][['customer_id', 'product_name', 'order_count']]

result_df

Unnamed: 0,customer_id,product_name,order_count
1,A,ramen,3
3,B,curry,2
4,B,ramen,2
5,B,sushi,2
6,C,ramen,3


In [None]:
most_popular_df =  pd.merge(menu, sales, on='product_id').groupby(['customer_id', 'product_name']).size().reset_index(name='order_count')

most_popular_df['rank'] = most_popular_df.groupby('customer_id')['order_count'].rank(method='dense', ascending=False)



result_df = most_popular_df[most_popular_df['rank'] == 1][['customer_id', 'product_name', 'order_count']]

result_df


Unnamed: 0,customer_id,product_name,order_count
1,A,ramen,3
3,B,curry,2
4,B,ramen,2
5,B,sushi,2
6,C,ramen,3


- Customer A and C’s favourite item is ramen.
- Customer B enjoys all items on the menu. He/she is a true foodie.

In [None]:
# Merge members and sales
merged_df = pd.merge(members, sales, on='customer_id')
merged_df

Unnamed: 0,customer_id,join_date,order_date,product_id
0,A,2021-01-07,2021-01-01,1
1,A,2021-01-07,2021-01-01,2
2,A,2021-01-07,2021-01-07,2
3,A,2021-01-07,2021-01-10,3
4,A,2021-01-07,2021-01-11,3
5,A,2021-01-07,2021-01-11,3
6,B,2021-01-09,2021-01-01,2
7,B,2021-01-09,2021-01-02,2
8,B,2021-01-09,2021-01-04,1
9,B,2021-01-09,2021-01-11,1


### 6.Which item was purchased first by the customer after they became a member?

In [None]:
# Merge members and sales
merged_df = pd.merge(members, sales, on='customer_id')

# Filter rows where the order date is greater than the join date
merged_df = merged_df[merged_df['order_date'] > merged_df['join_date']]

# Sort by customer_id and order_date
merged_df = merged_df.sort_values(by=['customer_id', 'order_date'])

# Group by customer_id and select the first row for each group
first_purchase_df = merged_df.groupby('customer_id').first().reset_index()

# Merge with menu to get the product_name
result_df = pd.merge(first_purchase_df, menu, on='product_id')[['customer_id', 'product_name']]

result_df

Unnamed: 0,customer_id,product_name
0,A,ramen
1,B,sushi


- Customer A’s first order as a member is ramen.
- Customer B’s first order as a member is sushi.

### 7.Which item was purchased just before the customer became a member?

In [None]:
# Merge members and sales
merged_df = pd.merge(members, sales, on='customer_id')

# Filter rows where the order date is before the join date
merged_df = merged_df[merged_df['order_date'] < merged_df['join_date']]

# Sort by customer_id and order_date in descending order
merged_df = merged_df.sort_values(by=['customer_id', 'order_date'], ascending=[True, False])

# Group by customer_id and select the first row for each group (last purchase before joining)
last_purchase_df = merged_df.groupby('customer_id').first().reset_index()

# Merge with menu to get the product_name
result_df = pd.merge(last_purchase_df, menu, on='product_id')[['customer_id', 'product_name']]

result_df

Unnamed: 0,customer_id,product_name
0,A,sushi
1,B,sushi


- Both customers’ last order before becoming members are sushi. That must have been a really good sushi!

### 8.What is the total items and amount spent for each member before they became a member?

In [None]:
# Merge sales and members on customer_id and filter based on the condition
merged_df = pd.merge(sales, members, on='customer_id')
merged_df = merged_df[merged_df['order_date'] < merged_df['join_date']]

# Merge with menu to get product information
merged_df = pd.merge(merged_df, menu, on='product_id')

# Group by customer_id and calculate total items and total sales
result_df = merged_df.groupby('customer_id').agg(
    total_items=pd.NamedAgg(column='product_id', aggfunc='count'),
    total_sales=pd.NamedAgg(column='price', aggfunc='sum')
).reset_index()

# Sort by customer_id
result_df = result_df.sort_values(by='customer_id')

result_df

Unnamed: 0,customer_id,total_items,total_sales
0,A,2,25
1,B,3,40


In [None]:
# Merge sales and members on customer_id and filter based on the condition
merged_df = pd.merge(sales, members, on='customer_id')
merged_df = merged_df[merged_df['order_date'] < merged_df['join_date']]

# Merge with menu to get product information
merged_df = pd.merge(merged_df, menu, on='product_id')

# Group by customer_id and calculate total items and total sales
result_df = merged_df.groupby('customer_id').agg(
    total_items=('product_id', 'count'),
    total_sales=('price', 'sum')
).reset_index()
result_df

Unnamed: 0,customer_id,total_items,total_sales
0,A,2,25
1,B,3,40


- Customer A spent \$25 on 2 items.
- Customer B spent \$40 on 3 items.

### 9.If each \$1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?

In [None]:
points_df = menu.copy()
points_df['points'] = points_df['price'] * 20
points_df.loc[points_df['product_id'] != 1, 'points'] = points_df['price'] * 10

# Merge sales and points_df on product_id
merged_df = pd.merge(sales, points_df, on='product_id')

# Group by customer_id and calculate total points
result_df = merged_df.groupby('customer_id').agg(
    total_points=('points', 'sum')
).reset_index()

# Sort by customer_id
result_df = result_df.sort_values(by='customer_id')

result_df

Unnamed: 0,customer_id,total_points
0,A,860
1,B,940
2,C,360


- The total points for Customers A, B and C are $860, $940 and $360

In [None]:
points_df = menu.copy()
# Merge sales and points_df on product_id
merged_df = pd.merge(sales, points_df, on='product_id')
merged_df
merged_df['points']=merged_df.apply(lambda m_p: m_p['price']*20 if m_p['product_name']=='sushi' else m_p['price']*10,axis=1)
merged_df
# Group by customer_id and calculate total points
result_df = merged_df.groupby('customer_id').agg(
    total_points=pd.NamedAgg(column='points', aggfunc='sum')
).reset_index()

# Sort by customer_id
result_df = result_df.sort_values(by='customer_id')

result_df

Unnamed: 0,customer_id,total_points
0,A,860
1,B,940
2,C,360


### 10.In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?

In [None]:
merged_df = pd.merge(sales, menu, on='product_id')
merged_df = pd.merge(merged_df, members, on='customer_id')

# Filter rows based on the specified conditions
filtered_df = merged_df[
    (merged_df['order_date'].dt.month == 1) &
    (merged_df['order_date'].dt.year == 2021)
].copy()  # Create a copy to avoid SettingWithCopyWarning

# Calculate points for each row based on conditions using .loc
filtered_df.loc[:, 'points'] = (
    ((filtered_df['order_date'] - filtered_df['join_date']).dt.days.between(0, 6)) |
    (filtered_df['product_name'] == 'sushi')
) * filtered_df['price'] * 10 * 2 + (~(((filtered_df['order_date'] - filtered_df['join_date']).dt.days.between(0, 6)) | (filtered_df['product_name'] == 'sushi'))) * filtered_df['price'] * 10

# Group by customer_id and calculate total points
#result_df = filtered_df.groupby('customer_id')['points'].sum().reset_index()

#result_df
filtered_df

Unnamed: 0,customer_id,order_date,product_id,product_name,price,join_date,points
0,A,2021-01-01,1,sushi,10,2021-01-07,200
1,A,2021-01-01,2,curry,15,2021-01-07,150
2,A,2021-01-07,2,curry,15,2021-01-07,300
3,A,2021-01-10,3,ramen,12,2021-01-07,240
4,A,2021-01-11,3,ramen,12,2021-01-07,240
5,A,2021-01-11,3,ramen,12,2021-01-07,240
6,B,2021-01-04,1,sushi,10,2021-01-09,200
7,B,2021-01-11,1,sushi,10,2021-01-09,200
8,B,2021-01-01,2,curry,15,2021-01-09,150
9,B,2021-01-02,2,curry,15,2021-01-09,150


In [None]:
merged_df = pd.merge(sales, menu, on='product_id')
merged_df = pd.merge(merged_df, members, on='customer_id')
#merged_df
filtered_df = merged_df[
    (merged_df['order_date'].dt.month == 1) &
    (merged_df['order_date'].dt.year == 2021)
].copy()
filtered_df
x= filtered_df[filtered_df['order_date']>=filtered_df['join_date']]
x['days_dif'] = x['order_date']-x['join_date']
x

#x['points']=x.apply(lambda m_p: m_p['price']*20 if m_p['product_name']=='sushi' else m_p['price']*10,axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  x['days_dif']=x['order_date']-x['join_date']


Unnamed: 0,customer_id,order_date,product_id,product_name,price,join_date,days_dif
2,A,2021-01-07,2,curry,15,2021-01-07,0 days
3,A,2021-01-10,3,ramen,12,2021-01-07,3 days
4,A,2021-01-11,3,ramen,12,2021-01-07,4 days
5,A,2021-01-11,3,ramen,12,2021-01-07,4 days
7,B,2021-01-11,1,sushi,10,2021-01-09,2 days
10,B,2021-01-16,3,ramen,12,2021-01-09,7 days


In [None]:
x['points']=x.apply(lambda m_p: m_p['price']*20 if ( (m_p['product_name']=='sushi') |( m_p['days_dif'].days<7 )) else m_p['price']*10,axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  x['points']=x.apply(lambda m_p: m_p['price']*20 if ( (m_p['product_name']=='sushi') |( m_p['days_dif'].days<7 )) else m_p['price']*10,axis=1)


In [None]:
x

Unnamed: 0,customer_id,order_date,product_id,product_name,price,join_date,days_dif,points
2,A,2021-01-07,2,curry,15,2021-01-07,0 days,300
3,A,2021-01-10,3,ramen,12,2021-01-07,3 days,240
4,A,2021-01-11,3,ramen,12,2021-01-07,4 days,240
5,A,2021-01-11,3,ramen,12,2021-01-07,4 days,240
7,B,2021-01-11,1,sushi,10,2021-01-09,2 days,200
10,B,2021-01-16,3,ramen,12,2021-01-09,7 days,120


In [None]:
x.groupby('customer_id')['points'].sum()

customer_id
A    1020
B     320
Name: points, dtype: int64