# A manager's analysis of an unspecified coffee chain in the United States

Questions that will be queried in this analysis:
* Which branch has the highest traffic?
* Which product sells the most in the Californian market?
* Which location makes the most profit?
* Is there a corellation between the traffic and profits?
* Which locations are meeting their quotas?
* Is the marketing spending making a return on investment?
* Which product is the most profitable in matter of price?

# Prerequisits: Importing the necessary libraries, database, and checking for nulls

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.io as pio
pio.renderers.default = "kaggle"
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/coffee-chain-sales-dataset/Coffee_Chain_Sales .csv


In [2]:
# Import the Database
df=pd.read_csv('/kaggle/input/coffee-chain-sales-dataset/Coffee_Chain_Sales .csv')
df.head(10)

Unnamed: 0,Area Code,Cogs,DifferenceBetweenActualandTargetProfit,Date,Inventory Margin,Margin,Market_size,Market,Marketing,Product_line,...,Product,Profit,Sales,State,Target_cogs,Target_margin,Target_profit,Target_sales,Total_expenses,Type
0,303,51,-35,10/1/2012,503,71,Major Market,Central,46,Leaves,...,Lemon,-5,122,Colorado,30,60,30,90,76,Decaf
1,970,52,-24,10/1/2012,405,71,Major Market,Central,17,Leaves,...,Mint,26,123,Colorado,30,60,50,90,45,Decaf
2,409,43,-22,10/2/2012,419,64,Major Market,South,13,Leaves,...,Lemon,28,107,Texas,30,60,50,90,36,Decaf
3,850,38,-15,10/3/2012,871,56,Major Market,East,10,Leaves,...,Darjeeling,35,94,Florida,40,60,50,100,21,Regular
4,562,72,6,10/4/2012,650,110,Major Market,West,23,Leaves,...,Green Tea,56,182,California,20,60,50,80,54,Regular
5,712,0,-29,10/5/2012,430,43,Small Market,Central,0,Beans,...,Decaf Espresso,31,43,Iowa,0,60,60,60,12,Decaf
6,860,47,-29,10/6/2012,375,64,Small Market,East,15,Beans,...,Decaf Espresso,21,111,Connecticut,30,60,50,90,43,Decaf
7,918,27,-39,10/7/2012,859,39,Small Market,South,7,Beans,...,Decaf Irish Cream,21,66,Oklahoma,30,60,60,90,18,Decaf
8,775,31,-43,10/8/2012,1000,37,Small Market,West,9,Beans,...,Decaf Irish Cream,7,68,Nevada,30,60,50,90,30,Decaf
9,435,40,-23,10/9/2012,881,59,Small Market,West,11,Beans,...,Decaf Espresso,37,99,Utah,20,60,60,80,22,Decaf


In [3]:
df.info

<bound method DataFrame.info of       Area Code  Cogs  DifferenceBetweenActualandTargetProfit       Date  \
0           303    51                                     -35  10/1/2012   
1           970    52                                     -24  10/1/2012   
2           409    43                                     -22  10/2/2012   
3           850    38                                     -15  10/3/2012   
4           562    72                                       6  10/4/2012   
...         ...   ...                                     ...        ...   
1057        775   250                                     133  8/23/2015   
1058        971    88                                      48  8/24/2015   
1059        775   294                                    -285  8/25/2015   
1060        503   134                                      80  8/26/2015   
1061        435    20                                     -22  8/27/2015   

      Inventory Margin  Margin   Market_size   Market  

In [4]:
# Check for nulls
df.isnull().sum()

Area Code                                 0
Cogs                                      0
DifferenceBetweenActualandTargetProfit    0
Date                                      0
Inventory Margin                          0
Margin                                    0
Market_size                               0
Market                                    0
Marketing                                 0
Product_line                              0
Product_type                              0
Product                                   0
Profit                                    0
Sales                                     0
State                                     0
Target_cogs                               0
Target_margin                             0
Target_profit                             0
Target_sales                              0
Total_expenses                            0
Type                                      0
dtype: int64

# Qustion 1: Which branch has the highest traffic?

In [5]:
#Check which state has the biggest traffic
busy = df.groupby('State', as_index=False)['Sales'].sum().sort_values('Sales', ascending=False)
fig1=px.bar(busy, x='State', y='Sales' ,title='Biggest Traffic By State')
fig1.show()

# Question 2: Which product sells the most in the Californian market?

In [6]:
#Check which drink is most popular in the Calli market
drinkcal = df[df['State']=='California'].groupby(['State', 'Product'], as_index=False)['Sales'].sum().sort_values('Sales', ascending=False)
fig2=px.bar(drinkcal,x='Product',y='Sales',title='Most Popular Drinks in California')
fig2.show()

# Question 3: Which location makes the most profit?

In [7]:
#Check which states makes the highest actual profit
fig3=px.box(df, x='State', y='Profit', title='Profit by State')
fig3.show()

# Question 4: Is there a corellation between the traffic and profits?

In [8]:
#Is there correlation between the profits and traffic?
fig4 = px.scatter(df, x='Sales', y='Profit',color='Market',title='Correlation between Traffic and Profit')
fig4.show()
correlation = df['Sales'].corr(df['Profit'])
print(f"Correlation between Sales and Profit: {correlation:.2f}")

Correlation between Sales and Profit: 0.80


# Question 5: Which locations are meeting their quotas?

In [9]:
#Load the Quotas
quotas= df.groupby('State').agg({'Profit': 'sum','Target_profit': 'sum','DifferenceBetweenActualandTargetProfit': 'sum'}).reset_index()
quotas['Met_Quota'] = quotas['DifferenceBetweenActualandTargetProfit'] > 0
fig5 = px.bar(quotas,x='State',y='DifferenceBetweenActualandTargetProfit',color='Met_Quota',color_discrete_map={
        True: 'green',
        False: 'red'
    },
    title='Difference Between Actual and Target Profit by State',hover_data={'Profit': True,'Target_profit': True,
        'DifferenceBetweenActualandTargetProfit': True
    }
)

fig5.update_layout(yaxis_title='Profit vs Target (Difference)')
fig5.show()

# Question 6: Is the marketing spending making a return on investment?

In [10]:
#Check for the ROI
fig6= px.scatter(df,x='Marketing', y='Sales',title='Marketing Spend vs. Sales with Regression Line', trendline='ols')
fig6.show()
total_sales = df['Sales'].sum()
total_marketing = df['Marketing'].sum()
roi = (total_sales - total_marketing) / total_marketing
print(f"Overall ROI: {roi:.2f}")

Overall ROI: 5.28


# Question 7: Which product is the most profitable in matter of price?

In [11]:
#Summarize the most profitable product
product_summary = df.groupby('Product').agg({'Profit': 'sum','Sales': 'sum'}).reset_index()
product_summary['Profit_Margin'] = product_summary['Profit'] / product_summary['Sales']
top_product = product_summary.loc[product_summary['Profit_Margin'].idxmax()]
print("Most profitable product (by profit margin):")
print(top_product[['Product', 'Profit_Margin']])
fig7 = px.bar(
    product_summary.sort_values('Profit_Margin', ascending=False),
    x='Product',
    y='Profit_Margin',
    title='Profit Margin by Product',
    labels={'Profit_Margin': 'Profit Margin (%)'})
fig7.update_traces(hovertemplate="<b>%{x}</b><br>Profit Margin: %{y:.2%}<extra></extra>")
fig7.show()

Most profitable product (by profit margin):
Product          Regular Espresso
Profit_Margin            0.509786
Name: 12, dtype: object


# Conclusions
* California is our larget and busiest market. Consider investing in more locations.
* Californians have a higher preference to Colombian coffee. Consider changing the price by a few cents to increase revenue.
* New York is our most profitable region. However, there are also many losses in profits there. Aditionally, The Nevada branch is showing considerable outliers in losses. Consider instpecting the branches to gain further inquiries to proceed.
* There's a strong correlation between high traffic and high profit.
* It appears that half of the branches aren't meeting their quotas. Consider inquiring with local managers to recieve explanation.
* The marketing is making a return investment of 5 dollars (rounded down) for every dollar we spend on it. Consider investing more into marketing in the areas that have yet to meet their quota.
* Our most profitable product is the Regular Espresso. Consider offering a sale to regular consumers to insure frequent patronage.