#### Data: 
The data consists of transactions of a bakery over a period of time.
Here we are analysing the data given to improve the services by knowing at what time of the day which services have to be maintained or improved.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
import warnings
warnings.filterwarnings("ignore")
init_notebook_mode(connected=True)
pd.set_option('display.max_columns', 5000)

Reading the CSV file to pandas DataFrame

In [2]:
data = pd.read_csv("D://breakfast.csv")

In [3]:
data.head()

Unnamed: 0,Date,Time,Transaction,Item
0,2016-10-30,09:58:11,1,Bread
1,2016-10-30,10:05:34,2,Scandinavian
2,2016-10-30,10:05:34,2,Scandinavian
3,2016-10-30,10:07:57,3,Hot chocolate
4,2016-10-30,10:07:57,3,Jam


In [4]:
data.dtypes

Date           object
Time           object
Transaction     int64
Item           object
dtype: object

### DataPreprocessing

Converting the Time and Date to datetime64[ns]

In [5]:
data['Time'] = pd.to_datetime(data['Time'])
data['Date'] = pd.to_datetime(data['Date'],format='%Y-%m-%d')

Removing any NONE values

In [6]:
#In this particular data the Item column consists of null values as NONE.
filternone = (data['Item']!="NONE")
data = data[filternone]

### Now let's check on which part of day more transactions are taking place.

In [7]:
data['hour'] = data['Time'].dt.hour

In [8]:
morning = data[(data['hour']>=9 ) & data['hour']<12].dropna()#Getting all the transactions in the morning

In [9]:
afternoon = data[(data['hour']>=12) & (data['hour']<17)]#Getting all the transactions in during the afternoon

In [10]:
evening = data[(data['hour']>=17) & (data['hour']<19)]#Getting all the transactions during evening.

In [11]:
night = data[(data['hour']>=19) & (data['hour']<=23)]#Getting all the transactions during the night

In [12]:
#Counting the no of transactions in each part of the day and storing as a Pandas Series.
transactions_day_times = [len(morning['Item'].unique()),len(afternoon['Item'].unique()),len(evening['Item'].unique()),len(night['Item'].unique())]

In [13]:
#Plotting the Graph
data1 = [go.Bar(
            x=['Morning','Afternoon','Evening','Night'],
            y=transactions_day_times
    )]
layout = go.Layout(
    title='Transactions Based on part of the day',
)

fig1 = go.Figure(data=data1, layout=layout)

iplot(fig1, filename='basic-bar')
plt.show()

We can see that the number of transactions during Morning are more when compared to other parts of the Day.

## Checking at which hour more transactions occur.

In [14]:
#Counting the number of transactions per hour.
tran_perhr = data['hour'].value_counts().reset_index()
tran_perhr.columns=['hour','Transaction_count']

In [15]:
tran_perhr =tran_perhr.drop([17])

In [16]:
#Plotting the graph "The number of transactions per hour"
data2 = [go.Bar(
            x=tran_perhr['hour'],
            y=tran_perhr['Transaction_count']
    )]
layout = go.Layout(
    title='Transactions_per_Hour',
)

fig2 = go.Figure(data=data2, layout=layout)

iplot(fig2, filename='basic-bar')

From the above graph we can conclude that more transactions occur during 11:00AM

### Finding the most selling Items in the Bakery

In [17]:
#This gives the top10 most selling items.
mostselling_items = data['Item'].value_counts()[:10]
mostselling_items.columns = ['Item','Count']
mostselling_items.head()

Coffee    5471
Bread     3325
Tea       1435
Cake      1025
Pastry     856
Name: Item, dtype: int64

#### Let us see which item occupies what portion of sales.

In [18]:
other_items_count = data.Item.count() - mostselling_items.sum()#Deleting the count top 10 items

In [19]:
item_list = mostselling_items.append(pd.Series([other_items_count], index=['Others']))

In [20]:
item_list

Coffee           5471
Bread            3325
Tea              1435
Cake             1025
Pastry            856
Sandwich          771
Medialuna         616
Hot chocolate     590
Cookies           540
Brownie           379
Others           5499
dtype: int64

In [21]:
fig3 = {
  "data": [
    {
      "values": item_list.values.tolist(),
      "labels": item_list.index.tolist(),
      "domain": {"x": [0, .5]},
      "name": "Top 10 Items",
      "hoverinfo":"label+percent",
      "type": "pie"
    },],
  "layout": {
        "title":"Top 10 Most Popular Items",
    }
}
iplot(fig3,filename='donut')

As we can see that 'cofffee' is the most selling as a single item, whihc is then followed by bread, Tea and so on.

Now let us see the relative occurance of other items with respect to coffee.

In [22]:
#Getting all the transactions whihc contains coffee
tran_with_coffee = data[data['Item'] == 'Coffee']["Transaction"].tolist()
alongside_coffee = data[data['Transaction'].isin(tran_with_coffee)]

In [23]:
alongside_coffee_counts = alongside_coffee['Item'].value_counts()[:11]
others = alongside_coffee.Item.count() - alongside_coffee_counts.sum()
alongside_coffee_counts = alongside_coffee_counts.append(pd.Series([others], index=['Others']))
alongside_coffee_counts.columns = ['Item','Count']

In [24]:
alongside_coffee_counts

Coffee           5471
Bread             923
Cake              540
Tea               482
Pastry            474
Sandwich          421
Medialuna         345
Hot chocolate     293
Cookies           283
Toast             224
Alfajores         200
Others           2172
dtype: int64

Let us remove index coffee as we are calculating the conditional probabality that occurance of coffee will lead
occurance of other item.

In [25]:
no_trans_coffee = alongside_coffee_counts['Coffee']#no of Transactions with Coffee.
alongside_coffee_counts  = (alongside_coffee_counts.drop(['Coffee']))

Now finding the conditional probability

In [26]:
items = alongside_coffee_counts.index.tolist()
values = alongside_coffee_counts.values.tolist()

In [27]:
#Fing the conditional probability of a item when coffe is bought.
probabilities = []
for value in range(len(items)):
    cal = values[value]/no_trans_coffee
    probabilities.append(cal)
    

In [28]:
probabilities.pop()
data_dict={"Count":probabilities}
items.pop()

'Others'

In [29]:
prob_dict = pd.DataFrame(data_dict,index=items)

In [30]:
prob_dict #The conditional probabalities of occurance of particular item w.r.t Coffee.

Unnamed: 0,Count
Bread,0.168708
Cake,0.098702
Tea,0.088101
Pastry,0.086639
Sandwich,0.076951
Medialuna,0.06306
Hot chocolate,0.053555
Cookies,0.051727
Toast,0.040943
Alfajores,0.036556


In [31]:
data_plot = [go.Bar(
            x=prob_dict.index.tolist(),
            y=prob_dict['Count'].tolist()
    )]
layout = go.Layout(
    title='Occurance of Item respect to occurance of Coffee',
)
fig4 = go.Figure(data=data_plot, layout=layout)
iplot(fig4, filename='basic-bar')

##### We can observe that mostly bread is brought whenever coffee is brought.
And the bread has more probability that it would be bought along with coffee

#### From the above analysis we know that Coffee is mostly bought we shall find during which time mostly it is bought.

In [32]:
#Getting all the transactions whihc has coffee
coffee_time = data[data['Item'] == "Coffee"]

In [33]:
coffee_time_count = coffee_time['hour'].value_counts().reset_index()
coffee_time_count.columns=['hour','Transaction_count']

In [34]:
data3 = [go.Bar(
            x=coffee_time_count['hour'],
            y=coffee_time_count['Transaction_count']
    )]
layout = go.Layout(
    title='No of coffee based on Hour',
)

fig5 = go.Figure(data=data3, layout=layout)
iplot(fig5)

#### As we can see that mostly coffee are being sold during 11:00AM

## No of Transactions occuring over the months

In [35]:
## Counting the number of items in each transaction
df1 = data.groupby(['Date', 'Transaction']).size().reset_index()
df1.columns = ['Date', 'Transaction', 'count']
df1.head()

Unnamed: 0,Date,Transaction,count
0,2016-10-30,1,1
1,2016-10-30,2,2
2,2016-10-30,3,3
3,2016-10-30,4,1
4,2016-10-30,5,3


In [36]:
#Counting the number of transactions w.r.t days.
df = df1.groupby('Date', as_index=True)['count'].sum().reset_index()
df.columns = ['Date', 'Transaction']
df.head()

Unnamed: 0,Date,Transaction
0,2016-10-30,170
1,2016-10-31,199
2,2016-11-01,150
3,2016-11-02,164
4,2016-11-03,189


In [37]:
data4 = [go.Scatter(x=df.Date, y=df['Transaction'])]
layout = go.Layout(
    title='No of transactions over months',
)
fig6 = go.Figure(data=data4, layout=layout)
iplot(fig6)

## Summary:
1. The most sold item is coffee. Hence increasing the number of coffee machines can make the bakery provide smooth service to the customers.
2. Mostly bread is bought with coffee hence placing the bread near to the coffee pickup points or at coffee machines will result in increase in sales of bread.
3. Coffee is mostly bought during 11:00Am of the day hence it would also be better to appoint extra baristas for smooth services.

Looking into these solutions may increase the sales in bakery