<div style="display:flex">

<div>

<h1 style="font-weight:bold">Plotly</h1>

<p style="line-height:2">
<a href="https://plotly.com/python/" target="_blank">Plotly</a>'s Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts. Plotly.py is free and open source and you can view the source, report issues or contribute on GitHub.
</p><br>

<span style="font-weight: bold; font-size: 18px;">Basic Charts</span><br>

<ol style="line-height:2">
    <li><b>Bar Graph :</b> Uses vertical or horizontal bars to represent data; ideal for comparing quantities across different categories.<br></li>
    <li><b>Line Graph :</b> Connects data points with lines; useful for showing trends over time.<br></li>
    <li><b>Scatter Plot :</b> Plots points on a two-dimensional grid; used to identify relationships between two variables.<br></li>
    <li><b>Pie Chart :</b> Shows proportions of a whole as slices of a pie; suitable for displaying percentage or proportional data.<br></li>
    <li><b>Histogram :</b> Similar to a bar graph but groups numbers into ranges; great for showing frequency distributions.<br></li>
    <li><b>Box Plot :</b> Displays the distribution of data based on a five-number summary; useful for highlighting outliers and understanding data spread.<br></li>
    <li><b>Area Chart :</b> Similar to a line graph but with the area below the line filled in; used for tracking cumulative totals over time.<br></li>
    <li><b>Heatmap :</b> Uses colors to represent data values in a matrix; useful for visualizing patterns or correlations.<br></li>
    <li><b>Bubble Chart :</b> A variation of a scatter plot, where the size of the bubble represents an additional variable; good for comparing multiple data points simultaneously.<br></li>
</ol><br>

</div>

<div style="display:flex;  justify-content: center; align-items: center; padding:40px; height: 60%">
<img src="https://raw.githubusercontent.com/cldougl/plot_images/add_r_img/plotly_2017.png"/>
</div>

</div>

In [1]:
# Install plotly with pip if you haven't already
! pip install plotly
! pip install numpy


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
! pip install --upgrade nbformat


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


--------

### Import relevant libraries

In [3]:
import os
import pandas as pd
import numpy as np
import plotly.graph_objects as go

### Load the datasets

In [4]:
# Coffee Shop Data
# ---------------------------
# It contains the transaction data of customers for a coffee shop, including the customer and product information.
cf_shop = pd.read_csv('coffee_shop_sales.csv')
print("-- Starbucks transtaction in California Outlet --")
print("Columns: ", cf_shop.columns)
display(cf_shop.head())
print("\n\n")


# Starbucks Outlets Data
# ---------------------------
# It contains the location data of Starbucks outlets around the world.
st_world = pd.read_csv('starbucks_world.csv')
print("-- Starbucks outlets around the World --")
print("Columns: ", st_world.columns)
display(st_world)

-- Starbucks transtaction in California Outlet --
Columns:  Index(['transaction_id', 'transaction_date', 'transaction_time',
       'sales_outlet_id', 'customer_id', 'customer_first-name',
       'customer_email', 'customer_since', 'gender', 'birthdate', 'instore_yn',
       'order', 'product_id', 'product', 'product_category', 'quantity',
       'unit_price', 'promo_item_yn', 'outlet', 'transaction', 'gross'],
      dtype='object')


Unnamed: 0,transaction_id,transaction_date,transaction_time,sales_outlet_id,customer_id,customer_first-name,customer_email,customer_since,gender,birthdate,...,order,product_id,product,product_category,quantity,unit_price,promo_item_yn,outlet,transaction,gross
0,7,2019-04-01,12:04:43,3,558,Melissa Johnson,Luke@eget.net,2018-06-19,F,1983-02-25,...,1,52,Traditional Blend Chai Rg,Tea,1,2.5,N,"Astoria, Long Island City",2019-04-01 12:04:43,2.5
1,11,2019-04-01,15:54:39,3,781,Luke Patel,Herrod@Maecenas.us,2018-11-02,N,1991-07-29,...,1,27,Brazilian Lg,Coffee,2,3.5,N,"Astoria, Long Island City",2019-04-01 15:54:39,7.0
2,19,2019-04-01,14:34:59,3,788,Hilel Ballard,Rajah@risus.org,2018-12-30,N,1995-02-23,...,1,46,Serenity Green Tea Rg,Tea,2,2.5,N,"Astoria, Long Island City",2019-04-01 14:34:59,5.0
3,32,2019-04-01,16:06:04,3,683,Zephr Zimmerman,Dacey@in.net,2019-03-04,F,1999-02-06,...,1,23,Our Old Time Diner Blend Rg,Coffee,2,2.5,N,"Astoria, Long Island City",2019-04-01 16:06:04,5.0
4,33,2019-04-01,19:18:37,3,99,Orlando Shields,Ivory@scelerisque.us,2017-10-01,M,1967-01-29,...,1,34,Jamaican Coffee River Sm,Coffee,1,2.45,N,"Astoria, Long Island City",2019-04-01 19:18:37,2.45





-- Starbucks outlets around the World --
Columns:  Index(['country', 'population', 'outlet', 'starbucks_per_million_inhabitants',
       'code'],
      dtype='object')


Unnamed: 0,country,population,outlet,starbucks_per_million_inhabitants,code
0,Argentina,41446246,73,1.76,ARG
1,Aruba,102911,3,29.15,ABW
2,Australia,23130900,24,1.04,AUS
3,Austria,8473786,17,2.01,AUT
4,"Bahamas, The",377374,9,23.85,BHS
...,...,...,...,...,...
58,Turkey,74932641,213,2.84,TUR
59,United Arab Emirates,9346129,106,11.34,ARE
60,United Kingdom,64097085,770,12.01,GBR
61,United States,316128839,11851,37.49,USA


In [5]:
# Change strings to pandas datetime
cf_shop['transaction'] = pd.to_datetime(cf_shop['transaction_date'] + ' ' + cf_shop['transaction_time'])
cf_shop['customer_since'] = pd.to_datetime(cf_shop['customer_since'])
cf_shop['birthdate'] = pd.to_datetime(cf_shop['birthdate'])

--------


<b>Question 1:</b> Which countries have the most number of Starbucks outlets?

In [6]:
# sort the countries based on the frequency of Starbucks outlets
most_outlets = st_world[['country', 'outlet']].sort_values('outlet', ascending=False).head(10)

# Calculate the percentage compared to the total number of Starbucks outlets
most_outlets['percentage'] = (most_outlets['outlet'] / st_world.outlet.sum()) * 100

# Create the figure
fig = go.Figure()

# Add the bar chart
fig.add_trace(
    go.Bar(
        x=most_outlets['country'],
        y=most_outlets['percentage'],
))

# Customize the graph
fig.update_layout(
    title='Popular Countries based on Starbucks outlets',
    xaxis_title='Country',
    yaxis_title="% of Outlet",
    yaxis_ticksuffix="%",
    xaxis_tickangle=-45,
    showlegend=False
)

# Show the plot
fig.show()

In [7]:
# sort the countries based on the frequency of Starbucks outlets
most_outlets = st_world[['country', 'outlet']].sort_values('outlet', ascending=True).tail(10)

# Calculate the percentage compared to the total number of Starbucks outlets
most_outlets['percentage'] = (most_outlets['outlet'] / st_world.outlet.sum()) * 100

# Create the figure
fig = go.Figure()

# Add the bar chart
fig.add_trace(
    go.Bar(
        y=most_outlets['country'],
        x=most_outlets['percentage'],
        orientation='h'
))

# Customize the graph
fig.update_layout(
    title='Popular Countries based on Starbucks outlets',
    xaxis_title='Country',
    yaxis_title="% of Outlet",
    yaxis_ticksuffix="%",
    xaxis_tickangle=-45,
    showlegend=False
)

# Show the plot
fig.show()

<b>Question 2:</b> Which is the most popular item in Coffee Shop?

In [8]:
# find the popular items in coffee shop
popular_items = cf_shop.groupby('product')['quantity'].sum().sort_values(ascending=True).tail(10)

# Create the figure
fig = go.Figure()

# Add the horizontal bar chart
fig.add_trace(go.Bar(
    x=popular_items.values,
    y=popular_items.index,
    orientation='h',
    marker=dict(
        color='rgba(50, 171, 96, 0.6)',
    )
))

# Customize the graph
fig.update_layout(
    title='Popular Items in Coffee Shop',
    xaxis_title='Quantity',
    yaxis_title='Product',
    xaxis=dict(type='log')
)

# Show the figure
fig.show()

<b>Question 3:</b> How has the sales of the coffee shop changed over time?

In [9]:
# get monthly sales data
monthly_sales = cf_shop.resample('D', on='transaction')['unit_price'].sum()

# Create the figure
fig = go.Figure()

# Add the line chart
fig.add_trace(go.Scatter(
    x=monthly_sales.index,
    y=monthly_sales.values,
    mode='lines+markers',
    marker=dict(
        color='rgba(50, 171, 96, 0.6)',
    )
))

# Customize the graph
fig.update_layout(
    title='Monthly Sales of Coffee Shop',
    xaxis_title='Date',
    yaxis_title='Total Gross',
    yaxis_tickprefix="$",
)

# Show the figure
fig.show()

<b>Question 4:</b> Do customers that spend more on coffee also explore more items?

In [10]:
# Customer spending 
filtered_customers = cf_shop[cf_shop['customer_first-name'].notnull()]      # Removed customers that have no first name
customer_spending = filtered_customers.groupby('customer_id').agg({'gross': 'sum', 'product_id': 'nunique'}).reset_index().sort_values('product_id', ascending=True)

# Create the figure
fig = go.Figure()

# Create a scatter plot
fig.add_trace(go.Scatter(
    x=customer_spending.product_id,
    y=customer_spending.gross,
    mode='markers',
    text=customer_spending.customer_id,
    marker=dict(
        color='rgba(50, 171, 96, 0.6)',
        size=12
    )
))

# Customize the graph
fig.update_layout(
    title='Customer Spending VS Number of Products',
    xaxis_title='Number of Products',
    xaxis=dict(type='category'),
    yaxis_title='Total Gross',
    yaxis_tickprefix="$"
)


<b>Question 5:</b> Do customers revist the shop? If so how many times?

In [11]:
# All customer visits
customer_visits = filtered_customers.groupby('customer_id')['transaction_id'].nunique()

# Create a figure
fig = go.Figure()

# Create a histogram
fig.add_trace(go.Histogram(
    x=customer_visits.values,
    marker=dict(
        color='rgba(50, 171, 96, 0.6)',
    )
))

# Customize the graph
fig.update_layout(
    title='Number of Visits per Customer',
    xaxis_title='Number of Visits',
    yaxis_title='Number of Customers',
)

# Show the figure
fig.show()

--------

### Extra

In [12]:
# Create hover text
hover_text = [f"{country}<br>Outlets: {outlet}" for country, outlet in zip(st_world['country'], st_world['outlet'])]

# Create the map
fig = go.Figure(
    data=go.Choropleth(
        locations = st_world['code'],
        z = np.log(st_world['outlet']),             # Using log scale to make the color difference more visible
        text = hover_text,
        hovertemplate='%{text}<extra></extra>',     # Include hover text
        colorscale = 'Blues',
        marker_line_color='darkgray',
        marker_line_width=0.5,
        colorbar_title = 'Log Scale\n',
        zmin=0,
    )
)

# Customizing the layout
fig.update_layout(
    # set the size
    width=1000,
    height=600,
    title_text='Starbucks Outlets around the World',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    )
)

# Show the plot
fig.show()