<a href="https://colab.research.google.com/github/azizamirsaidova/datadive/blob/main/Times_%26_Bayesian.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Analyze the times of the day products are commonly purchased.**

In [3]:
import pandas as pd

In [4]:
df = pd.read_csv("final.csv")

In [5]:
df['DAYPART_DESC'].value_counts()

LUNCH        229752
DINNER       144889
PM CHILL     102881
AM CHILL      60835
BREAKFAST     47974
Name: DAYPART_DESC, dtype: int64

In [6]:
daypart_by_freq_product = df.groupby(["DAYPART_DESC"])["MAIN_CATEGORY_DESC"].count().to_frame()
daypart_by_freq_product

Unnamed: 0_level_0,MAIN_CATEGORY_DESC
DAYPART_DESC,Unnamed: 1_level_1
AM CHILL,60835
BREAKFAST,47974
DINNER,144889
LUNCH,229752
PM CHILL,102881


In [7]:
daypart_by_freq_product.reset_index(inplace=True)

In [8]:
import plotly.express as px
fig = px.pie(daypart_by_freq_product, values='MAIN_CATEGORY_DESC', names='DAYPART_DESC', title='Part of the day that main category products are purchased the most', color_discrete_sequence=px.colors.sequential.RdBu)
fig.show()

In [9]:
pd.set_option("display.max_rows", 200)

In [10]:
category_by_daypart = df.groupby(["DAYPART_DESC", 'MAIN_CATEGORY_DESC'])['MAIN_CATEGORY_DESC'].count().to_frame()

In [11]:
category_by_daypart = category_by_daypart.rename(columns = {'MAIN_CATEGORY_DESC':'Cat_desc'})

In [12]:
category_by_daypart.reset_index(inplace=True)

In [13]:
fig = px.bar(category_by_daypart, x = 'DAYPART_DESC', y = 'Cat_desc', color = 
    'MAIN_CATEGORY_DESC', barmode = 'stack')
fig.update_layout(title = "Frequently purchased main category products based on the part of the day",
     xaxis_title = 'Dayparts', yaxis_title = 'Count of products', 
     width = 1000, height = 800)
fig.show()

Extras and Sides Cheese are frequently purchased throughout the day. To understand, what particular items customers purchased unders the sides category, we can further analyze the product descr column. 

In [14]:
df[['DAY','MAIN_CATEGORY_DESC']].value_counts()

DAY         MAIN_CATEGORY_DESC   
2020-09-11  Extras & Side Choices    17842
2020-09-22  Extras & Side Choices    17499
2020-09-02  Extras & Side Choices    17159
2020-09-18  Extras & Side Choices    17097
2020-09-04  Extras & Side Choices    16793
                                     ...  
2020-09-22  Unassigned                   1
2020-09-20  Panera Grocery               1
2020-09-13  Other Bakery                 1
2020-09-05  Cream Cheese                 1
2020-09-24  Unassigned                   1
Length: 841, dtype: int64

In [15]:
sales_by_day_name = df.groupby(["DAY", 'MAIN_CATEGORY_DESC'])['MAIN_CATEGORY_DESC'].count().to_frame()
sales_by_day_name = sales_by_day_name.rename(columns = {'MAIN_CATEGORY_DESC':'Count'})
sales_by_day_name.reset_index(inplace=True)
sales_by_day_name

Unnamed: 0,DAY,MAIN_CATEGORY_DESC,Count
0,2020-09-02,Add On,18
1,2020-09-02,"Baguettes, Demis, Rolls",38
2,2020-09-02,"Bars, Cookies, Brownies",543
3,2020-09-02,Bowls,38
4,2020-09-02,Breakfast Sandwiches,244
...,...,...,...
836,2020-09-24,Soup,524
837,2020-09-24,Sweet Goods Gifting,83
838,2020-09-24,Take Home Soup,15
839,2020-09-24,U Pick 2,3845


In [16]:
sales_by_day = df.groupby(["DAY"])['MAIN_CATEGORY_DESC'].count().to_frame()
sales_by_day.reset_index(inplace=True)
sales_by_day

Unnamed: 0,DAY,MAIN_CATEGORY_DESC
0,2020-09-02,31593
1,2020-09-03,1722
2,2020-09-04,31523
3,2020-09-05,527
4,2020-09-06,20893
5,2020-09-07,20224
6,2020-09-08,30988
7,2020-09-09,27137
8,2020-09-10,27840
9,2020-09-11,33860


In [17]:
fig = px.line(sales_by_day, x="DAY", y="MAIN_CATEGORY_DESC")
fig.update_layout(title = "The purchase of main category products throughtout the September",
     xaxis_title = 'Day', yaxis_title = 'Total')
fig.show()

In [18]:
fig = px.line(sales_by_day_name, x="DAY", y="Count", color = 'MAIN_CATEGORY_DESC')
fig.update_traces(textposition="bottom right")
fig.update_layout(title = "The purchase of main category products throughtout the September by item",
     xaxis_title = 'Day', yaxis_title = 'Total')
fig.show()

While we initially focused our analysis to the common purchased items, we can investigate further by analyzing the least purchased items by going through product descr, and comments columns.

# **Text Summarization**

Since we found that Extras & Side Choices are mostly purchased items, lets analyze how customers feel about their purchase by using Text Summarization technique. We will be using gensim to summarize few text from the comments column. 

In [19]:
extras_and_sides = df[df['MAIN_CATEGORY_DESC'].isin(['Extras & Side Choices'])]

In [20]:
extras_and_sides = extras_and_sides[['MAIN_CATEGORY_DESC', 'COMMENTS']]

In [27]:
#only analyze the few rows in comments column
few_text = extras_and_sides['COMMENTS'].iloc[2:23]

In [28]:
original_text = few_text.str.cat(sep=', ')

In [29]:
original_text

"I eat at Panera everyday for breakfast because I love the breakfast sandwiches and the coffee subscription. Can't say enough about the employees. They are the real reason I eat here every day. I also purchase the 20% discount gift card. Tomorrow I get a free ???? bagel in the am and a free half sandwich in pm. Thank you, I love the coffee club, It has been at least 6 months since my last visit to Panera Bread.  I really miss those outings.  The Panera staff have always been excellent.  Establishment Clean & well run.   \n\nMy last two visits to your stores have been a pleasant surprise.  Your staff have been pleasant, and handling issues as they arise in a professional, pleasant  manner.  \n\nI look forward to returning again SOON., Love the friendly nature and personal service at this location!, I got excellent service at this location.  The cashier was so professional and pleasant.  The whole staff was excellent. Also, the food was really good!, My order was wrong and not the first 

In [34]:
text = str(original_text)

In [38]:
import gensim
from gensim.summarization import summarize

#To get more summaries, we can increase the size. Without word_count, we will get the shortest summary about the text
short_summary = summarize(text, word_count=40)
print(short_summary)

2                  As always, great food and great service!
4         Fast, accurate and friendly people taking orde...
5         I eat at Panera everyday for breakfast because...
7                                    I love the coffee club
586325    Very fast curbside delivery.
586326    The delivery was quick.


# **Calculate the Sales Conversion using Bayesian probability.**

In [22]:
df['MAIN_CATEGORY_DESC'].value_counts()

Extras & Side Choices          304382
U Pick 2                        83456
No History                      29450
Sandwiches                      16055
Coffee & Hot Tea                15849
Individual Bagels               15186
Bubbler and Fountain Drinks     13985
Salads                          13967
Catering                        13069
Soup                            10655
Bars, Cookies, Brownies         10130
Bulk Bagels                      9604
Pastries                         8159
Breakfast Sandwiches             4621
Muffins & Muffies                4016
Family Feast                     3996
Cream Cheese                     3373
Descriptors                      2977
Pasta                            2851
Frozen Drinks & Smoothies        2729
Scones                           2165
Souffles                         1860
Kids                             1819
Espresso Beverages               1781
Cream Cheese Tubs                1662
Sweet Goods Gifting              1563
Juice, Bottl

In [23]:
df['MAIN_SUB_CATEGORY_CD'].value_counts

<bound method IndexOpsMixin.value_counts of 0         5020.0
1         3980.0
2         4680.0
3         3300.0
4         4660.0
           ...  
586326    4660.0
586327    4730.0
586328    3060.0
586329    3340.0
586330    3330.0
Name: MAIN_SUB_CATEGORY_CD, Length: 586331, dtype: float64>